Skip to content

An introduction to data science using Python and Pandas with Jupyter notebooks

License

Notifications You must be signed in to change notification settings

cuttlefishh/python-for-data-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIOC 209: Python for Data Analysis

This course focuses on analyzing data of all types using the Python programming language. No programming experience is necessary.

We start with an introduction (or refresher) to the command line. We then cover the fundamentals of Python and its data types, followed by the data analysis packages Numpy and Pandas, and plotting packages Matplotlib and Seaborn, plus statistics and interactive visualization.

Jupyter (IPython) notebooks are used throughout. Conda is used for package management and virtual environments. All notebooks are in Python 3 unless otherwise noted.

Instructor

  • Luke Thompson, Ph.D.
  • Lecturer at SIO; Research Associate at NOAA
  • Office hours by appointment
  • Email: luke@ucsd.edu

Meetings

  • Quarter: Winter 2017-18
  • Meeting time: Tu/Th 9:00-10:20
  • First and last day of class: January 9-March 15 (20 lessons)
  • Location and door code: ask instructor

Online Content

Textbooks

  • Learn Python 3 the Hard Way by Zed Shaw (Addison-Wesley) -- Step-by-step introduction to Python with no prior knowledge assumed; includes appendix Command Line Crash Course.
  • Learning Python 3rd Edition by Mark Lutz (O'Reilly) -- Optional; more traditional introduction to Python as a computer language.
  • Python for Data Analysis 2nd Edition by Wes McKinney (O'Reilly) -- Manual focused on Pandas, the popular Python package for data analysis, by its creator. GitHub page: https://github.com/wesm/pydata-book.

Note: O'Reilly Media titles are free to UCSD affiliates with Safari Books Online.

Additional Materials

Command Line Resources

Python Resources

IPython Resources from Cyrille Rossant

Data Analysis Resources

Course Philosophy

  1. Just like anything else, you learn Python by doing. With a few exceptions, you're not going to break your computer by trying new commands. So just try it and see what happens. Print output of commands. Print values of variables. Kick the thing until it works.
  2. When you don't know how to do something, google it. You'll be amazed by the solutions you'll find to do thing x if you google "python thing x".
  3. Learn keyboard shortcuts, as many as you can. Tab-complete in the shell and IPython/Jupyter!
  4. Remember Zed's sage wisdom:
    • Practice every day.
    • Don't over-do it. Slow and steady wins the race.
    • It's alright to be totally lost at first.
    • When you get stuck, get more information.
    • Try to solve it yourself first.

Assignments

Weekly assignments

Weekly take-home assignments will follow the course schedule, reinforcing skills with exercises to analyze and visualize scientific data. Assignments will given out on Thursdays and will be due the following Thursday, using TritonEd.

Final Project

You will choose a data set of your own or provided in one of the texts and write a Python program (or set of Python programs or mixture of .ipynb and .py/.sh scripts) to carry out a revealing data analysis. Have a look at Shaw Ex43-52 and McKinney Ch10-12 for more ideas.

Requirements:

  • Submit your project as either: a Jupyter notebook (or collection of notebooks), a Python script (or collection of scripts), or a combination of the two.
  • Use at least three (3) application-specific packages, such as:
    • Data analysis and plotting: pandas, matplotlib, seaborn
    • Statistics and modeling: statsmodels, scikit-learn
    • Bioinformatics: scikit-bio, biopython
    • Climate science: cdms, iris
  • Use at least three (3) user-defined functions.
  • Optional: Create user-defined modules and classes for use in your code.

Note: There are no midterm or final exams.

Schedule Overview

Schedule is subject to change.

We will start with an introduction to the command line in Week 1, so that everyone is familiar with basic Unix commands.

Weeks 2-4 will be an introduction to programming using Python. The main text will be Shaw's Learn Python 3 the Hard Way. For those with experience in a programming language other than Python, Lutz's Learning Python will provide a more thorough introduction to programming Python.

Also in Weeks 2-4, we will learn to use IPython and IPython Notebooks (also called Jupyter), a much richer Python experience than the Unix command line or Python interpreter.

In Weeks 5-10, we'll work through McKinney's Python for Data Analysis, which is all about analyzing data, doing statistics, and making pretty plots (you may find that Python can emulate much of the functionality of R and MATLAB).

Detailed Schedule

  • Course material is available as .md or .ipynb files by clicking on the lesson number below.
  • In addition to doing the readings, please follow along writing code (this is integral to the Shaw readings), and do any Study Drills (Shaw) and Chapter Quizzes (Lutz).
Lesson Title Readings Topics Assignment
1 Overview -- Introductions and overview of course Pre-course survey; Acquire texts
2 Command Line Part I Shaw: Introduction,
Exercise 0,
Appendix A
Command line crash course; Text editors Assignment 1: Basic Shell Commands
3 Command Line Part II -- Advanced commands in the bash shell --
4 Conda, IPython, and Jupyter Notebooks Install: Miniconda 3 Conda tutorial including Conda environments, Python packages, and PIP, Python and IPython in the command line, Jupyter notebook tutorial and Python crash course Assignment 2: Bash, Conda, IPython, and Jupyter
5 Python Basics, Strings, Printing Shaw: Ex1-10; Lutz: Ch1-7 Python scripts, error messages, printing strings and variables, strings and string operations, numbers and mathematical expressions, getting help with commands and Ipython --
6 Taking Input, Reading and Writing Files, Functions Shaw: Ex11-26; Lutz: Ch9,14-17 Taking input, reading files, writing files, functions Assignment 3: Python Fundamentals I
7 Logic, Loops, Lists, Dictionaries, and Tuples Shaw: Ex27-39; Lutz: Ch8-13 Logic and loops, lists and list comprehension, tuples, dictionaries, other types --
8 Python Review and IPython McKinney: Ch1-3 Review of Python commands, IPython review -- enhanced interactive Python shells with support for data visualization, distributed and parallel computation and a browser-based notebook with support for code, text, mathematical expressions, inline plots and other rich media Assignment 4: Python Fundamentals II
9 Regular Expressions Grep tutorials: Drew's Grep Tutorial, Linux Grep Tutorial; Python Regular Expressions Tutorial Regular expression syntax, Command-line tools: grep, sed, awk, perl -e, Python examples: built-in and re module --
10 Numpy, Pandas and Matplotlib Crashcourse -- Numpy, Pandas, and Matplotlib overview Assignment 5: Regular Expressions
11 Pandas Basics McKinney: Ch4-5 Intro to NumPy and Pandas: ndarray, Series, DataFrame, index, columns, dtypes, info, describe, read_csv, head, tail, loc, iloc, ix, to_datetime --
12 Pandas Advanced McKinney: Ch6-8 Data Analysis with Pandas: concat, append, merge, join, set_option, stack, unstack, transpose, dot-notation, values, apply, lambda, sort_index, sort_values, to_csv, read_csv, isnull Assignment 6: Pandas Fundamentals
13 Plotting with Matplotlib McKinney: Ch9; J.R. Johansson: Matplotlib 2D and 3D plotting in Python Matplotlib tutorial from J.R. Johansson --
14 Plotting with Seaborn Seaborn Tutorial Seaborn tutorial from Michael Waskom Assignment 7: Matplotlib and Seaborn
15 Pandas Time Series McKinney: Ch11 Time series data in Pandas --
16 Pandas Group Operations McKinney: Ch10 groupby, melt, pivot, inplace=True, reindex Assignment 8: Pandas Advanced
17 Statistics Packages -- Statitics capabilities of Pandas, Numpy, Scipy, and Scikit-bio --
18 Interactive Visualization with Bokeh Bokeh User Guide Quickstart guide to making interactive HTML and notebook plots with Bokeh Assignment 9: Statistics and Interactive Visualization
19 Modules and Classes Lutz: TBA Packaging your code so you and others can use it again --
20 Git and GitHub -- Sharing your code in a public GitHub repository Final Project

About

An introduction to data science using Python and Pandas with Jupyter notebooks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published