Skip to content

Commit

Permalink
Document google-symptoms conjunctivitis signals (#1614)
Browse files Browse the repository at this point in the history
* conjuctivitis signal and updated start dates

* cleanup and mention H5N1

* control dates too

* earliest issue; wording about correlations
  • Loading branch information
nmdefries authored Feb 28, 2025
1 parent 2e6cd7e commit db9c6a9
Showing 1 changed file with 38 additions and 24 deletions.
62 changes: 38 additions & 24 deletions docs/api/covidcast-signals/google-symptoms.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,19 +9,27 @@ nav_order: 1
{: .no_toc}

* **Source name:** `google-symptoms`
* **Earliest issue available:** November 30, 2020
* **Earliest issue available:** Aug 20, 2017
* **Number of data revisions since May 19, 2020:** 1
* **Date of last change:** January 20, 2022
* **Date of last change:** February 28, 2025
* **Available for:** county, MSA, HRR, state, HHS, nation (see [geography coding docs](../covidcast_geography.md))
* **Time type:** day (see [date format docs](../covidcast_times.md))
* **License:** To download or use the data, you must agree to the Google [Terms of Service](https://policies.google.com/terms)

## Overview

This data source is based on the [COVID-19 Search Trends symptoms
dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=en-GB). Using
this search data, we estimate the volume of searches mapped to symptom sets related
to COVID-19. The resulting daily dataset for each region shows the average relative frequency of searches for each symptom set. The signals are measured in arbitrary units that are normalized for overall search users in the region and scaled by the maximum value of the normalized popularity within a geographic region across a specific time range. **Values are comparable across signals in the same location but NOT across geographic regions**. For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`. However, we cannot compare `s01_smoothed_search` between states. Larger numbers represent increased relative popularity of symptom-related searches.
dataset](https://console.cloud.google.com/marketplace/product/bigquery-public-datasets/covid19-search-trends?hl=en-GB).
We use this data to estimate the volume of web searches related
to COVID-19 and H5N1 highly-pathogenic avian influenza (HPAI).

The resulting daily dataset for each location shows the average relative frequency of searches for sets of specific symptoms.
The signals are measured in arbitrary units that are normalized for overall search users in the location and scaled by the maximum value of the normalized popularity within a location across a specific time range.
Larger numbers represent increased relative popularity of symptom-related searches.

**Values are comparable across signals in the same location, but NOT between locations or between geographic region types**.
For example, within a state, we can compare `s01_smoothed_search` and `s02_smoothed_search`.
However, we cannot compare `s01_smoothed_search` between states, or between a state and a county.

Between May 13 2024 and August 6 2024, [signal values were much lower](#limitations) compared to previous time periods due to a data outage.

Expand All @@ -33,28 +41,35 @@ Between May 13 2024 and August 6 2024, [signal values were much lower](#limitati
* _s04_: Shortness of breath, Wheeze, Croup, Pneumonia, Asthma, Crackles, Acute bronchitis, Bronchitis
* _s05_: Anosmia, Dysgeusia, Ageusia
* _s06_: Laryngitis, Sore throat, Throat irritation
* _s07_: Conjunctivitis, Red eye, Epiphora, Eye pain, Rheum
* _scontrol_: Type 2 diabetes, Urinary tract infection, Hair loss, Candidiasis, Weight gain

The symptoms were combined in sets that showed positive correlation with cases, especially after Omicron was declared a variant of concern by the WHO. Note that symptoms in _scontrol_ are not COVID-19 related, and this symptom set can be used as a negative control.
Symptoms sets _s01_-_s06_ are designed track a variety of COVID-19 systems.
They are positively correlated with COVID-19 cases, especially in the period when the Omicron variant was dominant.
Symptom set _s07_ is designed to track novel eye-related symptoms of H5N1.
Note that symptoms in _scontrol_ are not COVID-19 or H5N1 related.
This symptom set can be used as a negative control.

Until January 20, 2022, we had separate signals for symptoms Anosmia, Ageusia, and their sum.

| Signal | Description |
| --- | --- |
| `s01_raw_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `s01_smoothed_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `s02_raw_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `s02_smoothed_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `s03_raw_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `s03_smoothed_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `s04_raw_search` | The average of Google search volume for related searches of symptom set _s04_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `s04_smoothed_search` | The average of Google search volume for related searches of symptom set _s04_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `s05_raw_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `s05_smoothed_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `s06_raw_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `s06_smoothed_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `scontrol_raw_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2020-02-14 |
| `scontrol_smoothed_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2020-02-20 |
| `s01_raw_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s01_smoothed_search` | The average of Google search volume for related searches of symptom set _s01_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `s02_raw_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s02_smoothed_search` | The average of Google search volume for related searches of symptom set _s02_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `s03_raw_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s03_smoothed_search` | The average of Google search volume for related searches of symptom set _s03_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `s04_raw_search` | The average of Google search volume for related searches of symptom set _s04_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s04_smoothed_search` | The average of Google search volume for related searches of symptom set _s04_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `s05_raw_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s05_smoothed_search` | The average of Google search volume for related searches of symptom set _s05_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `s06_raw_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s06_smoothed_search` | The average of Google search volume for related searches of symptom set _s06_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `s07_raw_search` | The average of Google search volume for related searches of symptom set _s07_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `s07_smoothed_search` | The average of Google search volume for related searches of symptom set _s07_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `scontrol_raw_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users. <br/> **Earliest date available:** 2017-08-15 |
| `scontrol_smoothed_search` | The average of Google search volume for related searches of symptom set _scontrol_, in an arbitrary units that are normalized for overall search users, smoothed by 7-day average. <br/> **Earliest date available:** 2017-08-21 |
| `anosmia_raw_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 |
| `anosmia_smoothed_search` | Google search volume for anosmia-related searches, in arbitrary units that are normalized for overall search users, smoothed by 7-day average. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-20 |
| `ageusia_raw_search` | Google search volume for ageusia-related searches, in arbitrary units that are normalized for overall search users. _This signal is no longer updated as of 20 January, 2022._ <br/> **Earliest date available:** 2020-02-13 |
Expand All @@ -78,7 +93,6 @@ Each signal is the average of the
For each symptom set: when search trends for all symptoms are missing, the signal is reported as missing. When search trends are available for at least one of the symptoms, we fill the missing trends for other symptoms with 0 and compute the average. We use this approach because the missing observations in the Google Symptoms search trends dataset do not occur randomly; they represent low popularity and are censored for quality and/or privacy reasons. The same approach is used for smoothed signals. A 7 day moving average is used, and missing raw signals are filled with 0 as long as there is at least one day available within the 7 day window.



## Geographical Aggregation
The state-level and county-level `raw_search` signals for each symptoms set are the average of its individual symptoms search trends, taken directly from the [COVID-19 Search Trends
symptoms
Expand Down Expand Up @@ -115,16 +129,16 @@ The data was unfortunately not recoverable and the dip can not be repaired, but

When daily volume in a region does not meet quality or privacy thresholds, set
by Google, no daily value is reported. Weekly data may be available from Google
in these cases, but we do not yet support importation using weekly data.
in these cases, but we do not yet support weekly data.

Google uses differential privacy, which adds artificial noise to the raw
datasets to avoid identifying any individual persons without affecting the
quality of results.

Google normalizes and scales time series values to determine the relative
popularity of symptoms in searches within each geographical region individually.
This means that the resulting values of symptom set popularity are **NOT**
comparable across geographic regions, while the values of different symptom sets are comparable within the same location.
This means that Delphi's computed symptom set popularity values are **NOT**
comparable _between_ geographic regions or region types, but are comparable within the same location.

Standard errors and sample sizes are not available for this data source.

Expand Down

0 comments on commit db9c6a9

Please sign in to comment.