Hello and welcome to this Python and data science tutorial series, my name is Henry Mbugua and I will be taking you through the various aspect and new answers to this data science tutorial series. The first thing we are going to cover is to understand what is data science.
What is Data Science?
Data science is all about finding insight from the available data in a specific problem domain created by digitization of everything. A good use case of data science would be:
- Hotels – can use data science to predict how many customers they are likely to get during the weekend and make a plan on how to handle food inventory and avoid loss.
- Insurance – a company can use data science to detect fraud activity, suspicious links and subtle behavior pattern using multiple techniques. Usually, insurance companies use statistical models for efficient fraud detection, these models rely on the previous case of fraudulent activity.
The insight we get from data science is used to solve business problems.
What is Python?
While doing data science, we need some type of programming language and in this case, we will be using python. Of course, there is another programming language called R that is used for data science. Why do we want to use python? Here are a few reasons:
- Python has rich tools from mathematics and statistical perspective.
- Python is becoming programming of choice in data science.
- Python is open source, compared to SAS.
- Availability or packages thus you don’t have to reinvent the wheel.
Getting Started with Python
The first thing we need to do is to install python on our machines, there are a lot of ways to install python depending on the Operating system you are using. In this tutorial series, we will use Anaconda. Anaconda Distribution is an open source environment manager for Python/R data science and machine learning on Linux, Windows and Mac OS. To install anaconda, click on this: Anaconda Download. You will see the following screen:
Select your Operating System and I recommend you download the Python 3.7 version package, which is the latest at the time of writing this tutorial. Great, now that we have installed our python of choice, let’s look at the libraries available.
Python Libraries for Data Analysis
Python is a very easy language to learn, and there are some basic things you can do link adding, multiply, the print statement like “Hello world without the use of any Library. But if you want to perform data analysis, you will need to import some specific libraries. Here are some of the libraries:
- Pandas – is a library that provides a high performance, easy to use data structure and data analysis tool. Learn more about Pandas Library.
- Numpy – can be used as an efficient multi-dimensional container for generic data. It’s a mathematical library. Learn more about Numpy Library
- Scipy – provide an ecosystem of open source software for mathematics, engineering, and science. It’s built on top of Numpy. Learn more about Scipy Library
- Matplotlib – is a plotting library used for visualization purpose. Learn more about Matplotlib Library
- Scikit learn – is used to perform all machine learning activities. Learn more about Scikit Library
Additional libraries you might need, beautiful soup, and tensor flow for AI-related stuff.
With that, we conclude this lesson, in our next lesson we will have a practical example of how to apply Python to data analysis. See you in lesson 2.