Skip to content

Commit d345b57

Browse files
authored
Merge pull request #815 from cmu-delphi/doc_google_symptoms_omicron
documentation for google symptoms new signals
2 parents 4296704 + 0d15c83 commit d345b57

File tree

1 file changed

+54
-33
lines changed

1 file changed

+54
-33
lines changed

docs/api/covidcast-signals/google-symptoms.md

Lines changed: 54 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,8 @@ grand_parent: COVIDcast Epidata API
99

1010
* **Source name:** `google-symptoms`
1111
* **Earliest issue available:** November 30, 2020
12-
* **Number of data revisions since May 19, 2020:** 0
13-
* **Date of last change:** Never
12+
* **Number of data revisions since May 19, 2020:** 1
13+
* **Date of last change:** January 20, 2022
1414
* **Available for:** county, MSA, HRR, state, HHS, nation (see [geography coding docs](../covidcast_geography.md))
1515
* **Time type:** day (see [date format docs](../covidcast_times.md))
1616
* **License:** To download or use the data, you must agree to the Google [Terms of Service](https://policies.google.com/terms)
@@ -19,23 +19,45 @@ grand_parent: COVIDcast Epidata API
1919

2020
This data source is based on the [COVID-19 Search Trends symptoms
2121
dataset](http://goo.gle/covid19symptomdataset). Using
22-
this search data, we estimate the volume of searches mapped to symptoms related
23-
to COVID-19 such as _anosmia_ (lack of smell) and _ageusia_(lack of taste). The
24-
resulting daily dataset for each region shows the relative frequency of searches
25-
for each symptom. The signals are measured in arbitrary units that are
26-
normalized for overall search users in the region and scaled by the maximum value of the normalized
27-
popularity within a geographic region across a specific time range. **Thus,
28-
values are NOT comparable across geographic regions**. Larger numbers represent
29-
increased releative popularity of symptom-related searches.
22+
this search data, we estimate the volume of searches mapped to symptom sets related
23+
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals in the same location but NOT across geographic regions**. For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`. However, we cannot compare `s01_smoothed_search` between states. Larger numbers represent increased relative popularity of symptom-related searches.
24+
25+
#### Symptom sets
26+
27+
* _s01_: Cough, Phlegm, Sputum, Upper respiratory tract infection
28+
* _s02_: Nasal congestion, Post nasal drip, Rhinorrhea, Sinusitis, Rhinitis, Common cold
29+
* _s03_: Fever, Hyperthermia, Chills, Shivering, Low grade fever
30+
* _s05_: Shortness of breath, Wheeze, Croup, Pneumonia, Asthma, Crackles, Acute bronchitis, Bronchitis
31+
* _s06_: Anosmia, Dysgeusia, Ageusia
32+
* _s08_: Laryngitis, Sore throat, Throat irritation
33+
* _scontrol_: Type 2 diabetes, Urinary tract infection, Hair loss, Candidiasis, Weight gain
34+
35+
The symptoms were combined in sets that showed positive correlation with cases, especially after Omicron was declared a variant of concern by the WHO. Note that symptoms in _scontrol_ are not COVID-19 related, and this symptom set can be used as a negative control.
36+
37+
Until January 20, 2022, we had separate signals for symptoms Anosmia, Ageusia, and their sum.
3038

3139
| Signal | Description |
3240
| --- | --- |
33-
| `anosmia_raw_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users <br/> **Earliest date available:** 2020-02-13 |
34-
| `anosmia_smoothed_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average <br/> **Earliest date available:** 2020-02-20 |
35-
| `ageusia_raw_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users <br/> **Earliest date available:** 2020-02-13 |
36-
| `ageusia_smoothed_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average <br/> **Earliest date available:** 2020-02-20 |
37-
| `sum_anosmia_ageusia_raw_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users <br/> **Earliest date available:** 2020-02-13 |
38-
| `sum_anosmia_ageusia_smoothed_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average <br/> **Earliest date available:** 2020-02-20 |
41+
| `s01_raw_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
42+
| `s01_smoothed_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 |
43+
| `s02_raw_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
44+
| `s02_smoothed_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 |
45+
| `s03_raw_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
46+
| `s03_smoothed_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 |
47+
| `s05_raw_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
48+
| `s05_smoothed_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 |
49+
| `s06_raw_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
50+
| `s06_smoothed_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 |
51+
| `s08_raw_search` | The average of Google search volume for related searches of symptom set _s08_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
52+
| `s08_smoothed_search` | The average of Google search volume for related searches of symptom set _s08_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-01-07 |
53+
| `scontrol_raw_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-01-01 |
54+
| `scontrol_smoothed_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
55+
| `anosmia_raw_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 |
56+
| `anosmia_smoothed_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 |
57+
| `ageusia_raw_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 |
58+
| `ageusia_smoothed_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 |
59+
| `sum_anosmia_ageusia_raw_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 |
60+
| `sum_anosmia_ageusia_smoothed_search` | The sum of Google search volume for anosmia and ageusia related searches, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 |
3961

4062

4163
## Table of Contents
@@ -45,22 +67,22 @@ increased releative popularity of symptom-related searches.
4567
{:toc}
4668

4769
## Estimation
48-
The `sum_anosmia_ageusia_raw_search` signals are simply the raw sum of the
49-
values of `anosmia_raw_search` and `ageusia_raw_search`, but not the union of
50-
anosmia and ageusia related searches. This is because the data volume is
51-
calculated based on search queries. A single search query can be mapped to more
52-
than one symptom. Currently, Google does not provide _intersection/union_
70+
Each signal is the average of the
71+
values of search trends for each symptom in the symptom set. For example, `s06_raw_search` is the average of the search trend values of anosmia, ageusia, and dysgeusia. Note that this is different from the union of
72+
anosmia, ageusia, and dysgeusia related searches divided by 3, because the data volume for each symptom is calculated based on search queries. A single search query can be mapped to more than one symptom. Currently, Google does not provide _intersection/union_
5373
data. Users should be careful when considering such signals.
5474

75+
For each symptom set: when search trends for all symptoms are missing, the signal is reported as missing. When search trends are available for at least one of the symptoms, we fill the missing trends for other symptoms with 0 and compute the average. We use this approach because the missing observations in the Google Symptoms search trends dataset do not occur randomly; they represent low popularity and are censored for quality and/or privacy reasons. The same approach is used for smoothed signals. A 7 day moving average is used, and missing raw signals are filled with 0 as long as there is at least one day available within the 7 day window.
76+
77+
78+
5579
## Geographical Aggregation
56-
The state-level and county-level `raw_search` signals for specific symptoms such
57-
as _anosmia_ and _ageusia_ are taken directly from the [COVID-19 Search Trends
80+
The state-level and county-level `raw_search` signals for each symptoms set are the average of its individual symptoms search trends, taken directly from the [COVID-19 Search Trends
5881
symptoms
59-
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset)
60-
without changes.
82+
dataset](https://github.com/google-research/open-covid-19-data/tree/master/data/exports/search_trends_symptoms_dataset).
6183

6284
We aggregate county and state data to other geographic levels using
63-
population-weighted averaging.
85+
population-weighted averaging.
6486

6587
| Source level | Aggregated level |
6688
| ------------ | ---------------- |
@@ -80,9 +102,9 @@ Each update will usually extend the coverage to within three days of the day of
80102
As a result the delay can range from 3 to 10 days or even more. We check for
81103
updates every day and provide the most up-to-date data.
82104

83-
## Limitations
105+
## Limitations
84106
When daily volume in a region does not meet quality or privacy thresholds, set
85-
by Google, no daily value is reported. Weekly data may be available from Google
107+
by Google, no daily value is reported. Weekly data may be available from Google
86108
in these cases, but we do not yet support importation using weekly data.
87109

88110
Google uses differential privacy, which adds artificial noise to the raw
@@ -91,15 +113,14 @@ quality of results.
91113

92114
Google normalizes and scales time series values to determine the relative
93115
popularity of symptoms in searches within each geographical region individually.
94-
This means that the resulting values of symptom popularity are **NOT**
95-
comparable across geographic regions.
116+
This means that the resulting values of symptom set popularity are **NOT**
117+
comparable across geographic regions, while the values of different symptom sets are comparable within the same location.
96118

97-
More details about the limitations of this dataset are available in [Google's Search
119+
More details about the limitations of this dataset are available in [Google's Search
98120
Trends symptoms dataset documentation](https://storage.googleapis.com/gcp-public-data-symptom-search/COVID-19%20Search%20Trends%20symptoms%20dataset%20documentation%20.pdf).
99121

100122
## Source and Licensing
101123
This dataset is based on Google's [COVID-19 Search Trends symptoms dataset](http://goo.gle/covid19symptomdataset), which is licensed under Google's [Terms of Service](https://policies.google.com/terms).
102124

103-
To learn more about the source data, how it is generated and its limitations,
125+
To learn more about the source data, how it is generated and its limitations,
104126
read [Google's Search Trends symptoms dataset documentation](https://storage.googleapis.com/gcp-public-data-symptom-search/COVID-19%20Search%20Trends%20symptoms%20dataset%20documentation%20.pdf).
105-

0 commit comments

Comments
 (0)