Skip to content

7. Python Data Analysis

Ty Shaikh edited this page Aug 17, 2018 · 12 revisions

Learn Python modules that make data analysis and visualization easy.

Table of Contents

1. NumPy

2. matplotlib

3. pandas

4. Assignments

5. Projects


1. NumPy

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

NumPy Tutorial

Read through the first 2 sections of the NumPy User Guide


2. matplotlib

matplotlib is a python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. matplotlib can be used in python scripts, the python and ipython shell, web application servers, and six graphical user interface toolkits.

matplotlib tries to make easy things easy and hard things possible. You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc, with just a few lines of code.

matplotlib Tutorials

Get Started - Nicolas P. Rougier matplotlib tutorial

If you enjoy charting, explore Seaborn and their official tutorial. It is an advanced charting library built on top of matplotlib.


3. pandas

pandas is an open source library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

We will use pandas extensively throughout the rest of the Foundations and main curriculum. It's use in data science is comparable to Microsoft Excel in the business world.

pandas Tutorials

Get Started - 10 Minutes to pandas

Optional - pandas Cookbook


4. Assignments

  1. Complete the tutorials for NumPy, matplotlib and pandas.
  2. For more background, go through this set of Jupyter Notebooks

5. Assignments

Complete the notebooks in order of difficulty.

Easy

  1. College Majors - Notebook and Data
  2. Police Killings - Notebook and Data (encoded in ISO-8859-1)

Medium

  1. Bangalore Weather - Notebook and Data (tab separated file)

Hard

  1. Thanksgiving Dinner - Notebook and Data (encoded in Latin-1)
Clone this wiki locally