Python for Data Science

Python for Data Science is a port of R for Data Science into Python. We are keeping Garrett Grolemund and Hadley Wickham’s writing and examples as much as possible while demonstrating Python instead of R. We have focused on pandas and Altair in our Python code snippets.

This book will teach you how to do data science with Python: You’ll learn how to get your data into Python, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with Python. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.

Installing and Importing Packages

We want to install the following three packages;

We can get packages installed for this course using one of the two methods below.

Using your terminal

# default way
pip install numpy pandas scikit-learn

If you are using a Mac

# Mac method with Python 2 and 3 installed
pip3 install numpy pandas scikit-learn

Using your interactive Python (Jupyter server)

import sys
!{sys.executable} -m pip install numpy pandas scikit-learn