Skip to content

Commit 4443f45

Browse files
committed
Port R docs vignettes to epidatpy
1 parent 896be6a commit 4443f45

File tree

6 files changed

+281
-3
lines changed

6 files changed

+281
-3
lines changed

Makefile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,11 @@ format:
2424
test:
2525
env/bin/pytest .
2626

27-
docs:
27+
doc:
2828
env/bin/sphinx-build -b html docs docs/_build
2929
env/bin/python -m webbrowser -t "docs/_build/index.html"
3030

31-
clean_docs:
31+
clean_doc:
3232
rm -rf docs/_build
3333

3434
clean_build:
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
Getting started with epidatpy
2+
=============================
3+
4+
The epidatpy package provides access to all the endpoints of the `Delphi Epidata
5+
API <https://cmu-delphi.github.io/delphi-epidata/>`_, and can be used to make
6+
requests for specific signals on specific dates and in select geographic
7+
regions.
8+
9+
Setup
10+
-----
11+
12+
**Installation**
13+
14+
You can install the stable version of this package from PyPi:
15+
16+
>>> pip install epidatpy
17+
18+
Or if you want the development version, install from GitHub:
19+
20+
>>> pip install -e "git+https://github.com/cmu-delphi/epidatpy.git#egg=epidatpy"
21+
22+
**API Keys**
23+
24+
The Delphi API requires a (free) API key for full functionality. While most
25+
endpoints are available without one, there are
26+
`limits on API usage for anonymous users <https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html>`_,
27+
including a rate limit.
28+
29+
To generate your key,
30+
`register for a pseudo-anonymous account <https://api.delphi.cmu.edu/epidata/admin/registration_form>`_.
31+
32+
*Note* that private endpoints (i.e. those prefixed with ``pvt_``) require a
33+
separate key that needs to be passed as an argument. These endpoints require
34+
specific data use agreements to access.
35+
36+
Basic Usage
37+
-----------
38+
39+
Fetching data from the Delphi Epidata API is simple. Suppose we are
40+
interested in the ``covidcast``
41+
`endpoint <https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html>`_,
42+
which provides access to a
43+
`wide range of data <https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html>`_
44+
on COVID-19. Reviewing the endpoint documentation, we see that we
45+
`need to specify <https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html#constructing-api-queries>`_
46+
a data source name, a signal name, a geographic level, a time resolution, and
47+
the location and times of interest.
48+
49+
The ``pub_covidcast`` function lets us access the ``covidcast`` endpoint:
50+
51+
>>> from epidatpy import EpiDataContext, EpiRange
52+
>>> epidata = EpiDataContext(use_cache=True, cache_max_age_days=1)
53+
>>> # Obtain the most up-to-date version of the smoothed covid-like illness (CLI)
54+
>>> # signal from the COVID-19 Trends and Impact survey for the US
55+
>>> apicall = epidata.pub_covidcast(
56+
... data_source = "fb-survey",
57+
... signals = "smoothed_cli",
58+
... geo_type = "nation",
59+
... time_type = "day",
60+
... geo_values = "us",
61+
... time_values = EpiRange(20210405, 20210410))
62+
EpiDataCall(endpoint=covidcast/, params={'data_source': 'fb-survey', 'signals': 'smoothed_cli', 'geo_type': 'nation', 'time_type': 'day', 'geo_values': 'us', 'time_values': '20210405-20210410'})
63+
64+
``pub_covidcast`` returns an ``EpiDataCall``, which can be further converted into different output formats - such as a Pandas DataFrame:
65+
66+
>>> data = apicall.df()
67+
>>> data.head()
68+
source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size
69+
0 fb-survey smoothed_cli nation us day 2021-04-05 2021-04-10 5 0.675832 0.014826 244046 <NA> 0 0 0
70+
1 fb-survey smoothed_cli nation us day 2021-04-06 2021-04-11 5 0.690687 0.014998 242979 <NA> 0 0 0
71+
2 fb-survey smoothed_cli nation us day 2021-04-07 2021-04-12 5 0.690664 0.015023 242153 <NA> 0 0 0
72+
3 fb-survey smoothed_cli nation us day 2021-04-08 2021-04-13 5 0.706503 0.015236 241380 <NA> 0 0 0
73+
4 fb-survey smoothed_cli nation us day 2021-04-09 2021-04-14 5 0.724306 0.015466 240256 <NA> 0 0 0
74+
75+
Each row represents one observation in the US on one
76+
day. The geographical abbreviation is given in the ``geo_value`` column, the date in
77+
the ``time_value`` column. Here `value` is the requested signal -- in this
78+
case, the smoothed estimate of the percentage of people with COVID-like
79+
illness, based on the symptom surveys, and ``stderr`` is its standard error.
80+
81+
The Epidata API makes signals available at different geographic levels,
82+
depending on the endpoint. To request signals for all states instead of the
83+
entire US, we use the ``geo_type`` argument paired with ``*`` for the
84+
``geo_values`` argument. (Only some endpoints allow for the use of ``*`` to
85+
access data at all locations. Check the help for a given endpoint to see if
86+
it supports ``*``.)
87+
88+
>>> apicall = epidata.pub_covidcast(
89+
... data_source = "fb-survey",
90+
... signals = "smoothed_cli",
91+
... geo_type = "state",
92+
... time_type = "day",
93+
... geo_values = "*",
94+
... time_values = EpiRange(20210405, 20210410))
95+
EpiDataCall(endpoint=covidcast/, params={'data_source': 'fb-survey', 'signals': 'smoothed_cli', 'geo_type': 'state', 'time_type': 'day', 'geo_values': '*', 'time_values': '20210405-20210410'})
96+
>>> apicall.df.head()
97+
source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size
98+
0 fb-survey smoothed_cli state ak day 2021-04-05 2021-04-10 5 0.736883 0.275805 720.0 <NA> 0 0 0
99+
1 fb-survey smoothed_cli state al day 2021-04-05 2021-04-10 5 0.796627 0.137734 3332.1117 <NA> 0 0 0
100+
2 fb-survey smoothed_cli state ar day 2021-04-05 2021-04-10 5 0.561916 0.131108 2354.9911 <NA> 0 0 0
101+
3 fb-survey smoothed_cli state az day 2021-04-05 2021-04-10 5 0.62283 0.105354 4742.2778 <NA> 0 0 0
102+
4 fb-survey smoothed_cli state ca day 2021-04-05 2021-04-10 5 0.444169 0.040576 21382.3806 <NA> 0 0 0
103+
104+
We can fetch a subset of states by listing out the desired locations:
105+
106+
>>> apicall = epidata.pub_covidcast(
107+
... data_source = "fb-survey",
108+
... signals = "smoothed_cli",
109+
... geo_type = "state",
110+
... time_type = "day",
111+
... geo_values = "pa,ca,fl",
112+
... time_values = EpiRange(20210405, 20210410))
113+
EpiDataCall(endpoint=covidcast/, params={'data_source': 'fb-survey', 'signals': 'smoothed_cli', 'geo_type': 'state', 'time_type': 'day', 'geo_values': 'pa,ca,fl', 'time_values': '20210405-20210410'})
114+
>>> apicall.df.head()
115+
source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size
116+
0 fb-survey smoothed_cli state ca day 2021-04-05 2021-04-10 5 0.444169 0.040576 21382.3806 <NA> 0 0 0
117+
1 fb-survey smoothed_cli state fl day 2021-04-05 2021-04-10 5 0.690415 0.058204 16099.0005 <NA> 0 0 0
118+
2 fb-survey smoothed_cli state pa day 2021-04-05 2021-04-10 5 0.715758 0.072999 10894.0057 <NA> 0 0 0
119+
3 fb-survey smoothed_cli state ca day 2021-04-06 2021-04-11 5 0.45604 0.04127 21176.3902 <NA> 0 0 0
120+
4 fb-survey smoothed_cli state fl day 2021-04-06 2021-04-11 5 0.730692 0.059907 15975.0007 <NA> 0 0 0
121+
122+
We can also request data for a single location at a time, via the ``geo_values`` argument.
123+
124+
>>> apicall = epidata.pub_covidcast(
125+
... data_source = "fb-survey",
126+
... signals = "smoothed_cli",
127+
... geo_type = "state",
128+
... time_type = "day",
129+
... geo_values = "pa,ca,fl",
130+
... time_values = EpiRange(20210405, 20210410))
131+
EpiDataCall(endpoint=covidcast/, params={'data_source': 'fb-survey', 'signals': 'smoothed_cli', 'geo_type': 'state', 'time_type': 'day', 'geo_values': 'pa', 'time_values': '20210405-20210410'})
132+
>>> apicall.df.head()
133+
source signal geo_type geo_value time_type time_value issue lag value stderr sample_size direction missing_value missing_stderr missing_sample_size
134+
0 fb-survey smoothed_cli state pa day 2021-04-05 2021-04-10 5 0.715758 0.072999 10894.0057 <NA> 0 0 0
135+
1 fb-survey smoothed_cli state pa day 2021-04-06 2021-04-11 5 0.69321 0.070869 10862.0055 <NA> 0 0 0
136+
2 fb-survey smoothed_cli state pa day 2021-04-07 2021-04-12 5 0.685934 0.070654 10790.0054 <NA> 0 0 0
137+
3 fb-survey smoothed_cli state pa day 2021-04-08 2021-04-13 5 0.681511 0.071394 10731.0044 <NA> 0 0 0
138+
4 fb-survey smoothed_cli state pa day 2021-04-09 2021-04-14 5 0.709416 0.072162 10590.0049 <NA> 0 0 0
139+
140+
Getting versioned data
141+
----------------------
142+
143+
The Epidata API stores a historical record of all data, including corrections
144+
and updates, which is particularly useful for accurately backtesting
145+
forecasting models. To fetch versioned data, we can use the ``as_of``
146+
argument:
147+
148+
>>> apicall = epidata.pub_covidcast(
149+
... data_source = "fb-survey",
150+
... signals = "smoothed_cli",
151+
... geo_type = "state",
152+
... time_type = "day",
153+
... geo_values = "pa,ca,fl",
154+
... time_values = EpiRange(20210405, 20210410),
155+
... as_of = "2021-06-01")
156+
157+
Plotting
158+
--------
159+
160+
Because the output data is a standard Pandas DataFrame, we can easily plot
161+
it using any of the available Python libraries:
162+
163+
>>> data.plot(x="time_value", y="value", title="Smoothed CLI from Facebook Survey", xlabel="Date", ylabel="CLI")
164+
165+
.. image:: images/Figure_1.png
166+
:width: 800
167+
:alt: Smoothed CLI from Facebook Survey
168+
169+
Finding locations of interest
170+
-----------------------------
171+
172+
Most data is only available for the US. Select endpoints report other countries at the national and/or regional levels. Endpoint descriptions explicitly state when they cover non-US locations.
173+
174+
For endpoints that report US data, see the
175+
`geographic coding documentation <https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html>`_
176+
for available geographic levels.
177+
178+
International data
179+
------------------
180+
181+
International data is available via
182+
183+
- ``pub_dengue_nowcast`` (North and South America)
184+
- ``pub_ecdc_ili`` (Europe)
185+
- ``pub_kcdc_ili`` (Korea)
186+
- ``pub_nidss_dengue`` (Taiwan)
187+
- ``pub_nidss_flu`` (Taiwan)
188+
- ``pub_paho_dengue`` (North and South America)
189+
- ``pvt_dengue_sensors`` (North and South America)
190+
191+
Finding data sources and signals of interest
192+
--------------------------------------------
193+
194+
Above we used data from `Delphi’s symptom surveys <https://delphi.cmu.edu/covid19/ctis/>`_,
195+
but the Epidata API includes numerous data streams: medical claims data, cases
196+
and deaths, mobility, and many others. This can make it a challenge to find
197+
the data stream that you are most interested in.
198+
199+
The Epidata documentation lists all the data sources and signals available
200+
through the API for `COVID-19 <https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html>`_
201+
and for `other diseases <https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters>`_.

docs/images/Figure_1.png

23.7 KB
Loading

docs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,3 +74,5 @@ Contents
7474

7575
epidatpy
7676

77+
getting_started_with_epidatpy
78+

docs_smoke_test.py

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
from epidatpy import CovidcastEpidata, EpiDataContext, EpiRange
2+
import pandas as pd
3+
4+
epidata = EpiDataContext(use_cache=True, cache_max_age_days=1)
5+
apicall = epidata.pub_covidcast(
6+
data_source = "fb-survey",
7+
signals = "smoothed_cli",
8+
geo_type = "nation",
9+
time_type = "day",
10+
geo_values = "us",
11+
time_values = EpiRange(20210405, 20210410))
12+
print(apicall)
13+
14+
pd.set_option('display.max_columns', None)
15+
pd.set_option('display.max_rows', None)
16+
pd.set_option('display.width', 1000)
17+
18+
data = apicall.df()
19+
print(data.head())
20+
21+
apicall2 = epidata.pub_covidcast(
22+
data_source = "fb-survey",
23+
signals = "smoothed_cli",
24+
geo_type = "state",
25+
time_type = "day",
26+
geo_values = "*",
27+
time_values = EpiRange(20210405, 20210410))
28+
print(apicall2)
29+
30+
data2 = apicall2.df()
31+
print(data2.head())
32+
33+
apicall3 = epidata.pub_covidcast(
34+
data_source = "fb-survey",
35+
signals = "smoothed_cli",
36+
geo_type = "state",
37+
time_type = "day",
38+
geo_values = "pa,ca,fl",
39+
time_values = EpiRange(20210405, 20210410))
40+
print(apicall3)
41+
42+
data3 = apicall3.df()
43+
print(data3.head())
44+
45+
apicall4 = epidata.pub_covidcast(
46+
data_source = "fb-survey",
47+
signals = "smoothed_cli",
48+
geo_type = "state",
49+
time_type = "day",
50+
geo_values = "pa",
51+
time_values = EpiRange(20210405, 20210410))
52+
print(apicall4)
53+
54+
data4 = apicall4.df()
55+
print(data4.head())
56+
57+
apicall5 = epidata.pub_covidcast(
58+
data_source = "fb-survey",
59+
signals = "smoothed_cli",
60+
geo_type = "state",
61+
time_type = "day",
62+
geo_values = "pa",
63+
time_values = EpiRange(20210405, 20210410),
64+
as_of = "2021-06-01")
65+
print(apicall5)
66+
67+
data5 = apicall5.df()
68+
print(data5.head())
69+
70+
# requires matplotlib
71+
import matplotlib.pyplot as plt
72+
73+
data.plot(x="time_value", y="value", title="Smoothed CLI from Facebook Survey", xlabel="Date", ylabel="CLI")
74+
plt.subplots_adjust(bottom=.2)
75+
plt.show()

epidatpy/_covidcast.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ def define_covidcast_fields() -> List[EpidataFieldInfo]:
7272
EpidataFieldInfo("lag", EpidataFieldType.int),
7373
EpidataFieldInfo("value", EpidataFieldType.float),
7474
EpidataFieldInfo("stderr", EpidataFieldType.float),
75-
EpidataFieldInfo("sample_size", EpidataFieldType.int),
75+
EpidataFieldInfo("sample_size", EpidataFieldType.text),
7676
EpidataFieldInfo("direction", EpidataFieldType.float),
7777
EpidataFieldInfo("missing_value", EpidataFieldType.int),
7878
EpidataFieldInfo("missing_stderr", EpidataFieldType.int),

0 commit comments

Comments
 (0)