pandas is a Python library for doing data analysis. It's really fast and lets you do exploratory work incredibly quickly.
The goal of this cookbook is to give you some concrete examples for getting started with pandas. The docs are really comprehensive. However, I've often had people tell me that they have some trouble getting started, so these are examples with real-world data, and all the bugs and weirdness that that entails.
I'm working with 3 datasets right now
- 311 calls in New York
- How many people were on Montréal's bike paths in 2012
- Montreal's weather for 2012, hourly
Let me know if you have suggestions, or about bugs you find.
It comes with batteries (data) included, so you can try out all the examples right away.
- Chapter 1: Reading from a CSV
- Chapter 2: Selecting data & finding the most common complaint type
- Chapter 3: Which borough has the most noise complaints? (or, more selecting data)
- Chapter 4: Find out on which weekday people bike the most with groupby and aggregate
- Chapter 5: Combining dataframes and scraping Canadian weather data
- Chapter 6: String operations! Which month was the snowiest?
- Chapter 7: Cleaning up messy data
You'll need an up-to-date version of IPython Notebook (>= 1.0) and pandas (>=0.12) for this to work properly
You can get these using pip
:
pip install ipython pandas numpy
Alternatively, I use and recommend Anaconda, which will give you everything you need. It's free and open source.
Once you have pandas and IPython, you can get going!
git clone https://github.com/jvns/pandas-cookbook.git
cd pandas-cookbook/cookbook
ipython notebook --pylab inline
A tab should open up in your browser at http://localhost:8888
Happy pandas!
- Joining dataframes
- Using stack/unstack