This homework assignment will test basic operations in a COVID-19 US data set.
The dataset we will be working with is the NYTimes Open COVID-19 case and death data by US County.
You can download the CSV file here: https://github.com/nytimes/covid-19-data/blob/master/us-counties.csv
The NYTimes provides up to date open data on case counts nad death for COVID-19 through their GitHub Repository at: https://github.com/nytimes/covid-19-data
You are to create a Jupyter Notebook that calculates the following items for the data:
- A DataFrame that contains the top 10 Counties data by total deaths over the time period.
- A DataFrame that contains the top 5 States data by total deaths over the time period.
- Calculate the incidence (the number of new cases per day... i.e. the difference between the cases on a day - the cases on a previous day) for the top 10 Counties.
- Calculate the peak incidence for the 10 counties (the day with the highest number of new cases).
- Plot an incidence curve for the County with the highest peak.
- Plot a vertical bar graph with County as the category and the peak incidence as the measure.
The Notebook is started for you:
-
Fork this repository
-
Create a virtual env
python -m venv venv
- Install the modules needed
source ./venv/bin/activate
pip install -r requirements.txt
- Open Jupyter Notebook
jupyter notebook
-
From the menu, open the blank notebook and begin the assignment
-
When completed, submit a pull request with the reviewer as
shots47s
.