Skip to content

Commit

Permalink
Merge pull request #1508 from cmu-delphi/release/delphi-epidata-4.1.25
Browse files Browse the repository at this point in the history
Release Delphi Epidata 4.1.25
  • Loading branch information
minhkhul authored Jul 29, 2024
2 parents aa3dedb + 704e898 commit f431baf
Show file tree
Hide file tree
Showing 39 changed files with 210 additions and 340 deletions.
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 4.1.24
current_version = 4.1.25
commit = False
tag = False

Expand Down
2 changes: 2 additions & 0 deletions .git-blame-ignore-revs
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,5 @@ b9ceb400d9248c8271e8342275664ac5524e335d
07ed83e5768f717ab0f9a62a9209e4e2cffa058d
# style(black): format wiki acquisition
923852eafa86b8f8b182d499489249ba8f815843
# lint: trailing whitespace changes
81179c5f144b8f25421e799e823e18cde43c84f9
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
closes|addresses <!--list issues closed or partially-addressed by this PR -->
addresses issue(s) #ISSUE <!--list issue(s) associated with this PR -->

### Summary:

Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/update_gdocs_data.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ jobs:
restore-keys: |
${{ runner.os }}-pipd-
- name: Install Dependencies
run: pip install -r requirements.dev.txt
run: |
pip -V
python -m pip install pip==22.0.2
pip install -r requirements.dev.txt
- name: Update Docs
run: inv update-gdoc
- name: Create pull request into dev
Expand Down
2 changes: 1 addition & 1 deletion dev/local/setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = Delphi Development
version = 4.1.24
version = 4.1.25

[options]
packages =
Expand Down
37 changes: 21 additions & 16 deletions docs/api/covidcast-signals/covid-act-now.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,13 @@ grand_parent: COVIDcast Main Endpoint
* **Time type:** day (see [date format docs](../covidcast_times.md))
* **License:** [CC BY-NC](../covidcast_licensing.md#creative-commons-attribution-noncommercial)

The COVID Act Now (CAN) data source provides COVID-19 testing statistics, such as positivity rates and total tests performed.
The county-level positivity rates and test totals are pulled directly from CAN.
While CAN provides this data potentially from multiple sources, we only use data sourced from the
The [COVID Act Now (CAN)](https://covidactnow.org/) data source provides COVID-19 testing statistics, such as positivity rates and total tests performed.
The county-level positivity rates and test totals are pulled directly from CAN using [their API](https://covidactnow.org/data-api).
While CAN provides this data potentially from multiple sources, we only use data that CAN sources from the
[CDC's COVID-19 Integrated County View](https://covid.cdc.gov/covid-data-tracker/#county-view).

Delphi's mirror of the CAN data was deactivated in December 2021 (last issue 2021-12-10) in favor of the [DSEW CPR data](./dsew-cpr.md), which reports the same information under the `covid_naat_pct_positive_7dav` signal.


| Signal | Description |
|--------------------------------|----------------------------------------------------------------|
Expand All @@ -34,9 +36,9 @@ While CAN provides this data potentially from multiple sources, we only use data

## Estimation

The quantities received from CAN / CDC are the county-level positivity rate and total tests,
which are based on the counts of PCR specimens tested.
In particular, they are also already smoothed with a 7-day-average.
We receive county-level positivity rate and total tests from CAN, originating from the CDC.
These quantiles are based on the counts of PCR specimens tested.
They are also already smoothed with a 7-day-average.

For a fixed location $$i$$ and time $$t$$, let $$Y_{it}$$ denote the number of PCR specimens
tested that have a positive result. Let $$N_{it}$$ denote the total number of PCR specimens tested.
Expand Down Expand Up @@ -79,38 +81,41 @@ $$

### Smoothing

No additional smoothing is done to avoid double-smoothing, since the data pulled from CAN / CDC
No additional smoothing is done to avoid double-smoothing, since the CAN data
is already smoothed with a 7-day-average.

## Limitations

Estimates for geographical levels beyond counties may be inaccurate due to how aggregations
are done on smoothed values instead of the raw values. Ideally we would aggregate raw values
Estimates for geographical levels beyond counties may be inaccurate because our aggregations
are performed on smoothed values instead of the raw values.
Ideally we would aggregate raw values
then smooth, but the raw values are not accessible in this case.

The positivity rate here should not be interpreted as the population positivity rate as
The reported test positivity rate should not be interpreted as the population positivity rate as
the testing performed are typically not randomly sampled, especially for early data
with lower testing volumes.

A few counties, most notably in California, are also not covered by this data source.

Entries with zero total tests performed are also suppressed, even if it was actually the case that
Entries with zero total tests performed are suppressed, even if it was actually the case that
no tests were performed for the day.

## Lag and Backfill

The lag for these signals varies depending on the reporting patterns of individual counties.
Most counties have their latest data report with a lag of 2 days, while others can take 9 days
or more in the case of California counties.
or more, as is the case with California counties.

These signals are also backfilled as backlogged test results could get assigned to older 7-day timeframes.
Most recent test positivity rates do not change substantially with backfill (having a median delta of close to 0).
However, most recent total tests performed is expected to increase in later data revisions (having a median increase of 7%).
Revisions are sometimes made to the data. For example, backlogged test results can get assigned to past dates.
The majority of recent test positivity rates do not change substantially with backfill (having a median delta of close to 0).
However, the majority of recent total tests performed is expected to increase in later data revisions (having a median increase of 7%).
Values more than 5 days in the past are expected to remain fairly static (with total tests performed
having a median increase of 1% of less), as most major revisions have already occurred.

## Source and Licensing

County-level testing data is scraped by CAN from the
County-level testing data is scraped by [CAN](https://covidactnow.org/) from the
[CDC's COVID-19 Integrated County View](https://covid.cdc.gov/covid-data-tracker/#county-view),
and made available through [CAN's API](https://covidactnow.org/tools).

The data is made available under a [CC BY-NC](../covidcast_licensing.md#creative-commons-attribution-noncommercial) license.
2 changes: 1 addition & 1 deletion docs/api/covidcast-signals/hhs.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Department of Health & Human Services
parent: Data Sources and Signals
parent: Inactive Signals
grand_parent: COVIDcast Main Endpoint
---

Expand Down
6 changes: 3 additions & 3 deletions docs/epidata_development.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ $ [sudo] make test pdb=1
$ [sudo] make test test=repos/delphi/delphi-epidata/integrations/acquisition
```

You can read the commands executed by the Makefile [here](../dev/local/Makefile).
You can read the commands executed by the Makefile [here](https://github.com/cmu-delphi/delphi-epidata/blob/dev/dev/local/Makefile).

## Rapid Iteration and Bind Mounts

Expand Down Expand Up @@ -87,8 +87,8 @@ You can test your changes manually by:

What follows is a worked demonstration based on the `fluview` endpoint. Before
starting, make sure that you have the `delphi_database_epidata`,
`delphi_web_epidata`, and `delphi_redis` containers running; if you don't, see
the Makefile instructions above.
`delphi_web_epidata`, and `delphi_redis` containers running (with `docker ps`);
if you don't, see the Makefile instructions above.

First, let's insert some fake data into the `fluview` table:

Expand Down
6 changes: 5 additions & 1 deletion docs/symptom-survey/publications.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ Pandemic"](https://www.pnas.org/topic/548) in *PNAS*:

Research publications using the survey data include:

- C.K. Ettman, E. Badillo-Goicoechea, E.A. Stuart (2024). [Financial
strain, schooling modality and mental health of US adults living
with children during the COVID-19 pandemic](https://doi.org/10.1136/jech-2023-221672).
*Journal of Epidemiology & Community Health*.
- K. Sasse, R. Mahabir, O. Gkountouna, A. Crooks, A. Croitoru (2024).
[Understanding the determinants of vaccine hesitancy in the United
States: A comparison of social surveys and social media](https://doi.org/10.1371/journal.pone.0301488).
Expand All @@ -41,7 +45,7 @@ Research publications using the survey data include:
- Z. Yang, R. Krishnan, and B. Li (2024). [The interplay between individual
mobility, health risk, and economic choice: A holistic model for COVID-19
policy intervention](https://doi.org/10.1287/ijds.2023.0013). *INFORMS
Journal on Data Science*.
Journal on Data Science* 3 (1), 6-27.
- A. Srivastava, J. M. Ramirez, S. Díaz-Aranda, J. Aguilar, A. F. Anta, A. Ortega,
and R. E. Lillo (2024). [Nowcasting temporal trends using indirect surveys](https://doi.org/10.1609/aaai.v38i20.30242).
In *Proceedings of the 38th AAAI Conference on Artificial Intelligence* 38,
Expand Down
22 changes: 21 additions & 1 deletion integrations/client/test_delphi_epidata.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

# standard library
import time
import json
from json import JSONDecodeError
from requests.models import Response
from unittest.mock import MagicMock, patch

# first party
Expand Down Expand Up @@ -306,6 +306,26 @@ def test_sandbox(self, get, post):
Epidata.debug = False
Epidata.sandbox = False

@patch('requests.get')
def test_version_check(self, get):
"""Test that the _version_check() function correctly logs a version discrepancy."""
class MockJson:
def __init__(self, content, status_code):
self.content = content
self.status_code = status_code
def raise_for_status(self): pass
def json(self): return json.loads(self.content)
get.reset_mock()
get.return_value = MockJson(b'{"info": {"version": "0.0.1"}}', 200)

Epidata._version_check()

captured = self.capsys.readouterr()
output = captured.err.splitlines()
self.assertEqual(len(output), 1)
self.assertIn("Client version not up to date", output[0])
self.assertIn("\'latest_version\': \'0.0.1\'", output[0])

def test_geo_value(self):
"""test different variants of geo types: single, *, multi."""

Expand Down
2 changes: 1 addition & 1 deletion src/acquisition/covid_hosp/common/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

# first party
import delphi.operations.secrets as secrets
from delphi.epidata.common.logger import get_structured_logger
from delphi_utils import get_structured_logger

Columndef = namedtuple("Columndef", "csv_name sql_name dtype")

Expand Down
3 changes: 1 addition & 2 deletions src/acquisition/covidcast/csv_importer.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,9 @@
import pandas as pd

# first party
from delphi_utils import Nans
from delphi_utils import get_structured_logger, Nans
from delphi.utils.epiweek import delta_epiweeks
from delphi.epidata.common.covidcast_row import CovidcastRow
from delphi.epidata.common.logger import get_structured_logger

DataFrameRow = NamedTuple('DFRow', [
('geo_id', str),
Expand Down
2 changes: 1 addition & 1 deletion src/acquisition/covidcast/csv_to_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from delphi.epidata.acquisition.covidcast.csv_importer import CsvImporter, PathDetails
from delphi.epidata.acquisition.covidcast.database import Database, DBLoadStateException
from delphi.epidata.acquisition.covidcast.file_archiver import FileArchiver
from delphi.epidata.common.logger import get_structured_logger
from delphi_utils import get_structured_logger


def get_argument_parser():
Expand Down
20 changes: 10 additions & 10 deletions src/acquisition/covidcast/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

# first party
import delphi.operations.secrets as secrets
from delphi.epidata.common.logger import get_structured_logger
from delphi_utils import get_structured_logger
from delphi.epidata.common.covidcast_row import CovidcastRow


Expand Down Expand Up @@ -117,28 +117,28 @@ def insert_or_update_batch(self, cc_rows: List[CovidcastRow], batch_size=2**20,
get_structured_logger("insert_or_update_batch").fatal(err_msg)
raise DBLoadStateException(err_msg)

# NOTE: `value_update_timestamp` is hardcoded to "NOW" (which is appropriate) and
# NOTE: `value_update_timestamp` is hardcoded to "NOW" (which is appropriate) and
# `is_latest_issue` is hardcoded to 1 (which is temporary and addressed later in this method)
insert_into_loader_sql = f'''
INSERT INTO `{self.load_table}`
(`source`, `signal`, `time_type`, `geo_type`, `time_value`, `geo_value`,
`value_updated_timestamp`, `value`, `stderr`, `sample_size`, `issue`, `lag`,
`value_updated_timestamp`, `value`, `stderr`, `sample_size`, `issue`, `lag`,
`is_latest_issue`, `missing_value`, `missing_stderr`, `missing_sample_size`)
VALUES
(%s, %s, %s, %s, %s, %s,
UNIX_TIMESTAMP(NOW()), %s, %s, %s, %s, %s,
(%s, %s, %s, %s, %s, %s,
UNIX_TIMESTAMP(NOW()), %s, %s, %s, %s, %s,
1, %s, %s, %s)
'''

# all load table entries are already marked "is_latest_issue".
# if an entry in the load table is NOT in the latest table, it is clearly now the latest value for that key (so we do nothing (thanks to INNER join)).
# if an entry *IS* in both load and latest tables, but latest table issue is newer, unmark is_latest_issue in load.
fix_is_latest_issue_sql = f'''
UPDATE
`{self.load_table}` JOIN `{self.latest_view}`
USING (`source`, `signal`, `geo_type`, `geo_value`, `time_type`, `time_value`)
SET `{self.load_table}`.`is_latest_issue`=0
WHERE `{self.load_table}`.`issue` < `{self.latest_view}`.`issue`
UPDATE
`{self.load_table}` JOIN `{self.latest_view}`
USING (`source`, `signal`, `geo_type`, `geo_value`, `time_type`, `time_value`)
SET `{self.load_table}`.`is_latest_issue`=0
WHERE `{self.load_table}`.`issue` < `{self.latest_view}`.`issue`
'''

# TODO: consider handling cc_rows as a generator instead of a list
Expand Down
2 changes: 1 addition & 1 deletion src/acquisition/covidcast/file_archiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
import shutil

# first party
from delphi.epidata.common.logger import get_structured_logger
from delphi_utils import get_structured_logger

class FileArchiver:
"""Archives files by moving and compressing."""
Expand Down
2 changes: 1 addition & 1 deletion src/client/delphi_epidata.R
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Epidata <- (function() {
# API base url
BASE_URL <- getOption('epidata.url', default = 'https://api.delphi.cmu.edu/epidata/')

client_version <- '4.1.24'
client_version <- '4.1.25'

auth <- getOption("epidata.auth", default = NA)

Expand Down
2 changes: 1 addition & 1 deletion src/client/delphi_epidata.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
}
})(this, function (exports, fetchImpl, jQuery) {
const BASE_URL = "https://api.delphi.cmu.edu/epidata/";
const client_version = "4.1.24";
const client_version = "4.1.25";

// Helper function to cast values and/or ranges to strings
function _listitem(value) {
Expand Down
30 changes: 27 additions & 3 deletions src/client/delphi_epidata.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

from aiohttp import ClientSession, TCPConnector, BasicAuth

__version__ = "4.1.24"
__version__ = "4.1.25"

_HEADERS = {"user-agent": "delphi_epidata/" + __version__ + " (Python)"}

Expand All @@ -43,8 +43,6 @@ class Epidata:
BASE_URL = "https://api.delphi.cmu.edu/epidata"
auth = None

client_version = __version__

debug = False # if True, prints extra logging statements
sandbox = False # if True, will not execute any queries

Expand All @@ -54,6 +52,25 @@ def log(evt, **kwargs):
kwargs['timestamp'] = time.strftime("%Y-%m-%d %H:%M:%S %z")
return sys.stderr.write(str(kwargs) + "\n")

# Check that this client's version matches the most recent available. This
# is intended to run just once per program execution, on initial module load.
# See the bottom of this file for the ultimate call to this method.
@staticmethod
def _version_check():
try:
request = requests.get('https://pypi.org/pypi/delphi-epidata/json', timeout=5)
latest_version = request.json()['info']['version']
except Exception as e:
Epidata.log("Error getting latest client version", exception=str(e))
return

if latest_version != __version__:
Epidata.log(
"Client version not up to date",
client_version=__version__,
latest_version=latest_version
)

# Helper function to cast values and/or ranges to strings
@staticmethod
def _listitem(value):
Expand Down Expand Up @@ -692,3 +709,10 @@ async def async_make_calls(param_combos):
future = asyncio.ensure_future(async_make_calls(param_list))
responses = loop.run_until_complete(future)
return responses



# This should only run once per program execution, on initial module load,
# as a result of how Python's module system works:
# https://docs.python.org/3/reference/import.html#the-module-cache
Epidata._version_check()
2 changes: 1 addition & 1 deletion src/client/packaging/npm/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"name": "delphi_epidata",
"description": "Delphi Epidata API Client",
"authors": "Delphi Group",
"version": "4.1.24",
"version": "4.1.25",
"license": "MIT",
"homepage": "https://github.com/cmu-delphi/delphi-epidata",
"bugs": {
Expand Down
2 changes: 1 addition & 1 deletion src/client/packaging/pypi/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 4.1.24
current_version = 4.1.25
commit = False
tag = False

Expand Down
9 changes: 9 additions & 0 deletions src/client/packaging/pypi/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,15 @@
All notable future changes to the `delphi_epidata` python client will be documented in this file.
The format is based on [Keep a Changelog](http://keepachangelog.com/).

## [4.1.25] - 2024-07-29

### Includes
- https://github.com/cmu-delphi/delphi-epidata/pull/1456
- https://github.com/cmu-delphi/delphi-epidata/pull/1497

### Changed
- Added a one-time check which logs a warning when the newest client version does not match the client version in use.

## [4.1.24] - 2024-07-09

### Includes
Expand Down
Loading

0 comments on commit f431baf

Please sign in to comment.