title: Python for Social Scientists author: name: Renee Chu twitter: "@reneighbor" output: slides.html controls: true
--
--
People learning to code who have completed a Python workshop or an online class
--
- python
- matplotlib
- numpy
- text editor
--
--
- Econ major, liberal arts college
- No coding at school
- First job in sales and support
- Learned coding through workshops (Railsbridge and PyLadies) and online (Stanford Engineering Everywhere, Learn Python the Hard Way)
--
Pick a project you're passionate about.
--
- Interesting
- Widely available
- Easier setup than web
--
Should development resources be spent on family planning or fighting disease?
<iframe width="560" height="315" src="http://www.tubechop.com/watch/2507136" frameborder="0" allowfullscreen></iframe>--
- Childhood mortality across years
- Births per woman across years
--
- Thousands of indicators
- 214 Countries
- 1960-2012
- http://data.worldbank.org/indicator
--
--
Who has more mobile phone subscriptions per 100 people:
- Finland
- United States
--
Source: World Bank, http://data.worldbank.org/indicator/IT.CEL.SETS.P2
--
Who has more mobile phone subscriptions per 100 people:
- Finland
- United States
- El Salvador
--
--
- US: $59.00/mo
- Finland: $40.10/mo
- India: $12.90/mo
Unbundled total package (voice+sms+data) plans available to individual consumers. (No El Salvador data available.) Open Technology Initiative (http://newamerica.net/publications/policy/an_international_comparison_of_cell_phone_plans_and_prices)
--
Tigo El Salvador "Basic" mobile postpaid, per min:
- Between Tigo: $0.08
- Other networks: $0.13
- Landlines: $0.13
- To USA/Canda: $0.09
http://www.tigo.com.sv/planes-pospago
--
- Import CSV data into Python
- Find a MatPlotLib example
- Pipe our CSV data into MatPlotLib
--
--
In your folder for personal projects:
git clone https://github.com/reneighbor/python-for-social-scientists.git
--
python_for_social_scientists/
- data/
-
- fertility.csv
-
- childhood_deaths.csv
- read_data.py
- chart_csv.py
--
-
Open up the folder
-
Open up "read_data.py"
-
Type:
print "Hello world"
--
cd Projects/personal-projects/programming-for-social-scientists
python read_data.py
You should see "Hello World" spit back at you
--
[Erase the print statement]
import csv
- libraries are a bunch of functions and helpers written by other people
--
import csv
csvfile = open('data/childhood_deaths.csv', 'rU')
--
import csv
csvfile = open('data/childhood_deaths.csv', 'rU')
reader = csv.DictReader(csvfile)
- DictReader lets you traverse the contents the csv file like a dictionary
--
{
'Country Name': 'Finland',
'Country Code': 'FIN',
'2007': '114.924474',
'2008': '128.4719884',
'2009': '144.1530224'
}
--
{
'2007': '114.924474',
'2009': '144.1530224'
'Country Name': 'Finland',
'2008': '128.4719884',
'Country Code': 'FIN',
}
--
import csv
csvfile = open('data/childhood_deaths', 'rU')
reader = csv.DictReader(csvfile)
for row in reader:
print row
--
Run it! In your terminal:
cd Projects/personal-projects/programming-for-social-scientists
python read_data.py
--
import csv
csvfile = open('data/childhood_deaths', 'rU')
reader = csv.DictReader(csvfile)
for row in reader:
if row['Country Name'] == "Finland":
print row
--
--
--
- Not sorted
- Data in strings
- Every row
--
--
http://matplotlib.org/gallery.html
--
- First half is drawing the rects, second is labels
- Values the list
--
- Create a new file "mens_womens.py" inside python_for_social_scientists
- In your terminal run "python mens_womens.py"
- Do you get the chart?
- Edit the values for "menMeans" and "womensMeans". Do you see a change?
--
- Look at basic_chart.py
--
python basic_chart.py
--
How do we get the other countries data?
- In basic_chart.py add 2 series, el_salvador_data and usa_data
- Uncomment the commented-out lines
- Run it again
--
- open chart-csv.py
- What needs to be done in order to extract data?
--
What needs to be done in order to extract data?
- Turn data from strings into ints
- Stip out county names
- Sort by year
--
-- Break
--
--
- How would you edit chart_csv to graph multiple countries?
--
- run extractData() on 2 other countries
- comment out the rectangle-drawing for those countries
--
What do we need to do to compare mortality against fertility?
--
What do we need to do to compare mortality against fertility?
- Take in 2 CSV readers
- Take in only one country
- rename "rects" to be indicators instead of countries
--
programming_for_social_scientists compare_indicators_starter.py
- Re-arranged to accomediate two comparison-drawing functions and a main()
--
--
correlation != causation
--
- Imported a CSV and turned it into a dict
- Went to MatPlotLib and found a bar chart to borrow
- Drew a series comparing countries from one CSV
- Drew a series comparing indicators from two CSVs
--
- Pick a data set that interests you.
- Write code to visualize it.
- Teach us something!