Skip to content

Commit 8467995

Browse files
committed
Create Data-wrangling.ipynb
1 parent 86bbb44 commit 8467995

File tree

1 file changed

+181
-0
lines changed

1 file changed

+181
-0
lines changed
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Goal\n",
8+
"My goal is to visualize various aspect of the `COVID-19` pandemic. In this notebook I describe how the data is acquired and processed."
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"metadata": {},
14+
"source": [
15+
"# Data sources"
16+
]
17+
},
18+
{
19+
"cell_type": "markdown",
20+
"metadata": {},
21+
"source": [
22+
"| Link | Source |\n",
23+
"-------|---------\n",
24+
"| https://github.com/CSSEGISandData/COVID-19 | JHU CSSE |\n",
25+
"| [GDP per capita PPP](https://data.worldbank.org/indicator/NY.GDP.PCAP.PP.CD) | The World Bank\n",
26+
"| [Population](https://data.worldbank.org/indicator/SP.POP.TOTL) | The World Bank\n",
27+
"| [Urban Population](https://data.worldbank.org/indicator/SP.URB.TOTL.IN.ZS) | The World Bank\n",
28+
"| [Population living in slums](https://data.worldbank.org/indicator/EN.POP.SLUM.UR.ZS) | The World Bank\n",
29+
"| [Rural population](https://data.worldbank.org/indicator/SP.RUR.TOTL.ZS) | The World Bank\n",
30+
"| [Life expectancy at birth](https://data.worldbank.org/indicator/SP.DYN.LE00.IN) | The World Bank\n",
31+
"| [Current healthcare expenditure](https://data.worldbank.org/indicator/SH.XPD.CHEX.GD.ZS) | The World Bank\n",
32+
"| https://datahub.io/JohnSnowLabs/country-and-continent-codes-list | Datahub"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"The process of obtaining the data has been automated. See the `src/data` directory."
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"metadata": {},
45+
"source": [
46+
"# Data wrangling"
47+
]
48+
},
49+
{
50+
"cell_type": "markdown",
51+
"metadata": {},
52+
"source": [
53+
"## COVID-19"
54+
]
55+
},
56+
{
57+
"cell_type": "markdown",
58+
"metadata": {},
59+
"source": [
60+
"### Original data"
61+
]
62+
},
63+
{
64+
"cell_type": "markdown",
65+
"metadata": {},
66+
"source": [
67+
"This dataset is downloaded from a `repository` on `github`.\n",
68+
"The data about `COVID-19` cases is in `.csv` files where each region has a seperate row. We group the data by country and store each country in a different column. Cases that happened on boats are removed from the data.\n",
69+
"\n",
70+
"See the script `src/features/make_cases.py` for details."
71+
]
72+
},
73+
{
74+
"cell_type": "markdown",
75+
"metadata": {},
76+
"source": [
77+
"### Derived data"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"From the original data about `COVID-19` cases we calculate what follows:\n",
85+
"\n",
86+
"* `mortality rate = dead / confirmed`\n",
87+
"* `active cases = confirmed - recovered - dead`. \n",
88+
"\n",
89+
"We also extract a list of countries and apply the differencing operator to `confirmed` to extract the `daily change in cases` for each country."
90+
]
91+
},
92+
{
93+
"cell_type": "markdown",
94+
"metadata": {},
95+
"source": [
96+
"## World Bank data"
97+
]
98+
},
99+
{
100+
"cell_type": "markdown",
101+
"metadata": {},
102+
"source": [
103+
"The data from the World Bank is downloaded using the `wbdata` library. The data includes is `Life expectancy` and `GDP per capita` to name a few. We extract the last known value of an indicator for a given county.\n",
104+
"\n",
105+
"See the script `src/features/make_world_bank.py` for details."
106+
]
107+
},
108+
{
109+
"cell_type": "markdown",
110+
"metadata": {},
111+
"source": [
112+
"## Continents"
113+
]
114+
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": [
119+
"In order to analyse the data by continent, we download a list of countries with continents and a list of countries with their respective 3 letter codes.\n",
120+
"\n",
121+
"See the script `src/features/make_continent.py` for details."
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"metadata": {},
127+
"source": [
128+
"# Summary"
129+
]
130+
},
131+
{
132+
"cell_type": "markdown",
133+
"metadata": {},
134+
"source": [
135+
"After preparing, cleaning and joining the downloaded datasets we store newly created `.csv` files in `data/processed` directory for further use. Here is table with a brief description of the contents of each file."
136+
]
137+
},
138+
{
139+
"cell_type": "markdown",
140+
"metadata": {},
141+
"source": [
142+
"| name | description |\n",
143+
"|------|-------------|\n",
144+
"| active_cases.csv | Calculation: `confirmed` - `recovered` - `dead`\n",
145+
"| confirmed_cases.csv | Time series of confirmed cases from JHU CSSE.\n",
146+
"| confirmed_cases_daily_change.csv | Daily change in confirmed cases, derived from JHU CSSE.\n",
147+
"| confirmed_cases_since_t0.csv | Reindexed time series of confirmed cases.\n",
148+
"| continents.csv | Countries mapped to continents.\n",
149+
"| coordinates.csv | Country coordinates.\n",
150+
"| country_stats.csv | Newest available case data by county.\n",
151+
"| country_to_continent.csv | A mapping of countries to continents.\n",
152+
"| dead_cases.csv | Time series of fatalities from JHU CSSE.\n",
153+
"| mortality_rate.csv | Calculation: `dead` / `confirmed`, derived from JHU CSSE.\n",
154+
"| recovered_cases.csv | Time series of recovered cases from JHU CSSE.\n",
155+
"| world_bank.csv | Socioeconomic from the World Bank merged with data about covid.\n",
156+
"| world_bank_codes.csv | 3 letter country codes from the World Bank."
157+
]
158+
}
159+
],
160+
"metadata": {
161+
"kernelspec": {
162+
"display_name": "Python 3",
163+
"language": "python",
164+
"name": "python3"
165+
},
166+
"language_info": {
167+
"codemirror_mode": {
168+
"name": "ipython",
169+
"version": 3
170+
},
171+
"file_extension": ".py",
172+
"mimetype": "text/x-python",
173+
"name": "python",
174+
"nbconvert_exporter": "python",
175+
"pygments_lexer": "ipython3",
176+
"version": "3.8.2"
177+
}
178+
},
179+
"nbformat": 4,
180+
"nbformat_minor": 2
181+
}

0 commit comments

Comments
 (0)