Skip to content

clean up early google symptom data #1616

Open
@aysim319

Description

@aysim319

Google symptoms signals all go back to 2017 now, but all the smoothed signals have a non-contiguous first entry on Aug 15 2017 (issue Aug 20 2017). There isn't enough data in the raw version to have calculated this. The Aug 20 issue is also before the earliest issue date seen in the raw data, although the raw and smoothed values for Aug 15 match.

Image

The reason behind is that the smoother in google symptom will skip the smoothing function if there isn't enough data, but still pass data instead of dropping them

https://github.com/cmu-delphi/covidcast-indicators/blob/454ac565d0a0f2b5cf557e4efb2278c278c528a9/_delphi_utils_python/delphi_utils/smooth.py#L194-L196

If we drop smoothed data from August 17-20 (inclusively) would solve this problem. In the epidata documentation it's already mentioned that the smoothed data have a earilest date of 08/21/2017.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions