Update Google Docs Meta Data #1612

github-actions · 2025-02-27T00:36:56Z

Updating Google Docs Meta Data

Change summary:

in db_sources.csv:
- new entry for nssp
in db_signals.csv:
- 2 new google-symptoms signals for conjunctivitis
- 8 new nssp signals for rsv and for counts of reporting hospitals
- ~19 various edits, including:
  - "Causes"-->"Cause" in nchs-mortality names and signal sets
  - some Safegraph signal set changes

sonarqubecloud · 2025-02-27T00:37:22Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

melange396 · 2025-02-27T02:38:41Z

newest comparison code:

import csv
import requests


# pull down the existing and proposed/pending versions of the signal description csv file

dev_file = "https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/refs/heads/dev/src/server/endpoints/covidcast_utils/db_signals.csv"
dev = []
with requests.get(dev_file, stream=True) as req:
    for row in csv.reader(req.iter_lines(decode_unicode=True)):
        dev.append(row)

new_file = "https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/refs/heads/bot/update-docs/src/server/endpoints/covidcast_utils/db_signals.csv"
new = []
with requests.get(new_file, stream=True) as req:
    for row in csv.reader(req.iter_lines(decode_unicode=True)):
        new.append(row)


# column name lists
dev_cols = set(dev[0])
new_cols = set(new[0])
both_cols = list(dev_cols.intersection(new_cols))

# get the right column number for each version of the file, based on the column name
dev_col_lookup = {c: i for i,c in enumerate(dev[0])}
new_col_lookup = {c: i for i,c in enumerate(new[0])}

# get the right row number for each version of the file, based on `(source,signal)`

dev_row_lookup = {}
for i, row in enumerate(dev):
    src = row[dev_col_lookup["Source Subdivision"]]
    sig = row[dev_col_lookup["Signal"]]
    if (src, sig) in dev_row_lookup:
        print("!!! src:sig duplicate in dev file! --", src, ":", sig)
    dev_row_lookup[(src, sig)] = i
dev_signals = set(dev_row_lookup.keys())

new_row_lookup = {}
for i, row in enumerate(new):
    src = row[new_col_lookup["Source Subdivision"]]
    sig = row[new_col_lookup["Signal"]]
    if (src, sig) in new_row_lookup:
        print("!!! src:sig duplicate in new file! --", src, ":", sig)
    new_row_lookup[(src, sig)] = i
new_signals = set(new_row_lookup.keys())

# print summary info
if dev[0] != new[0]:
    print("column ordering changed!")
print("added columns:", sorted(list(new_cols-dev_cols)))
print("removed columns:", sorted(list(dev_cols-new_cols)))
print("# rows in dev file:", len(dev))
print("# rows in new file:", len(new))
print("row count difference:", len(new)-len(dev))
print("added signals:", sorted(list(new_signals-dev_signals)))
print("removed signals:", sorted(list(dev_signals-new_signals)))
print("\n")

# TODO: detect row reorderings

# add column names to this set as needed to ignore differences found in them (to simplify output for easier analysis)
columns_to_ignore = {"XXXXXX ignore me XXXXXX"}
both_cols = [col for col in both_cols if col not in columns_to_ignore]

# show individual changes
changes_count = 0
for i in range(len(dev)):
    src = dev[i][dev_col_lookup["Source Subdivision"]]
    sig = dev[i][dev_col_lookup["Signal"]]
    if (src, sig) not in new_row_lookup:
        # this is a removed signal so no summary is displayed
        continue
    dev_ln_num = i
    new_ln_num = new_row_lookup[(src, sig)]
    # prepare properly ordered list of values from both
    dev_line = [dev[dev_ln_num][dev_col_lookup[col]] for col in both_cols]
    new_line = [new[new_ln_num][new_col_lookup[col]] for col in both_cols]
    if dev_line != new_line:
        changes_count += 1
        print("\nMISMATCH!!  [", src, ":", sig, "]  dev row:", dev_ln_num+1, "/ new row:", new_ln_num+1)
        print("\n".join(["".join([
                "  ", col, ":\n    ", dev[dev_ln_num][dev_col_lookup[col]], "\n    -->\n    ", new[new_ln_num][new_col_lookup[col]]])
                for col in both_cols if dev[dev_ln_num][dev_col_lookup[col]]!=new[new_ln_num][new_col_lookup[col]]
            ]))

print("\n")
print("lines with changes:", changes_count)

# TODO: use f-string formatting in print() statements

melange396 · 2025-02-27T02:41:27Z

output from code above:

added columns: []
removed columns: []
# rows in dev file: 477
# rows in new file: 487
row count difference: 10
added signals: [('google-symptoms', 's07_raw_search'), ('google-symptoms', 's07_smoothed_search'), ('nhsn', 'confirmed_admissions_rsv_ew'), ('nhsn', 'confirmed_admissions_rsv_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_covid_ew'), ('nhsn', 'hosprep_confirmed_admissions_covid_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_flu_ew'), ('nhsn', 'hosprep_confirmed_admissions_flu_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_rsv_ew'), ('nhsn', 'hosprep_confirmed_admissions_rsv_ew_prelim')]
removed signals: []



MISMATCH!!  [ nchs-mortality : deaths_allcause_incidence_num ]  dev row: 411 / new row: 413
  Signal Set:
    NCHS All Causes Deaths
    -->
    NCHS All Cause Deaths
  Name:
    All Causes Deaths (Weekly new)
    -->
    All Cause Deaths (Weekly new)

MISMATCH!!  [ nchs-mortality : deaths_allcause_incidence_prop ]  dev row: 412 / new row: 414
  Signal Set:
    NCHS All Causes Deaths
    -->
    NCHS All Cause Deaths

MISMATCH!!  [ safegraph-daily : completely_home_prop ]  dev row: 442 / new row: 444
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : completely_home_prop_7dav ]  dev row: 443 / new row: 445
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : full_time_work_prop ]  dev row: 444 / new row: 446
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : full_time_work_prop_7dav ]  dev row: 445 / new row: 447
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : median_home_dwell_time ]  dev row: 446 / new row: 448
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : median_home_dwell_time_7dav ]  dev row: 447 / new row: 449
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : part_time_work_prop ]  dev row: 448 / new row: 450
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : part_time_work_prop_7dav ]  dev row: 449 / new row: 451
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-weekly : bars_visit_num ]  dev row: 450 / new row: 452
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : bars_visit_prop ]  dev row: 451 / new row: 453
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : restaurants_visit_num ]  dev row: 452 / new row: 454
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : restaurants_visit_prop ]  dev row: 453 / new row: 455
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ nhsn : confirmed_admissions_covid_ew ]  dev row: 474 / new row: 476
  Short Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday).
    -->
    COVID-19 hospital admissions per week (final)
  Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
    -->
    Total number of patients hospitalized with confirmed COVID-19 over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
  Member Short Name:
    final
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_covid_ew_prelim ]  dev row: 475 / new row: 478
  Short Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday).
    -->
    COVID-19 hospital admissions per week (preliminary)
  Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
    -->
    Total number of patients hospitalized with confirmed COVID-19 over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
  Member Short Name:
    prelim
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_flu_ew ]  dev row: 476 / new row: 480
  Short Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). 
    -->
    flu hospital admissions per week (final)
  Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
    -->
    Total number of patients hospitalized with confirmed influenza over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
  Member Short Name:
    final
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_flu_ew_prelim ]  dev row: 477 / new row: 482
  Short Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday).
    -->
    flu hospital admissions per week (preliminary)
  Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
    -->
    Total number of patients hospitalized with confirmed influenza over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
  Member Short Name:
    prelim
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized


lines with changes: 18

carlynvandyke

Looks good, sorry it took so long to review!

chore: update docs

5f1798a

github-actions bot added the chore label Feb 27, 2025

github-actions bot assigned melange396 Feb 27, 2025

github-actions bot requested a review from melange396 February 27, 2025 00:36

melange396 mentioned this pull request Feb 27, 2025

Update NHSN new signals spreadsheet and API documentation cmu-delphi/covidcast-indicators#2125

Closed

melange396 mentioned this pull request Feb 27, 2025

Automate parts of metadata csv update comparison #1564

Open

melange396 requested a review from carlynvandyke February 27, 2025 02:51

melange396 mentioned this pull request Feb 27, 2025

Add google-symptoms conjunctivitis to signals spreadsheet and API documentation #1613

Closed

carlynvandyke approved these changes Feb 28, 2025

View reviewed changes

melange396 merged commit 2e6cd7e into dev Feb 28, 2025
7 checks passed

melange396 deleted the bot/update-docs branch February 28, 2025 21:08

melange396 mentioned this pull request Mar 3, 2025

Release Delphi Epidata 4.1.32 #1617

Merged

melange396 mentioned this pull request Mar 14, 2025

Update Google Docs Meta Data #1622

Merged

melange396 mentioned this pull request Apr 1, 2025

Update Google Docs Meta Data #1633

Merged

melange396 mentioned this pull request May 30, 2025

Update Google Docs Meta Data #1643

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update Google Docs Meta Data #1612

Update Google Docs Meta Data #1612

Uh oh!

github-actions bot commented Feb 27, 2025 •

edited by melange396

Loading

Uh oh!

sonarqubecloud bot commented Feb 27, 2025

Uh oh!

melange396 commented Feb 27, 2025 •

edited

Loading

Uh oh!

melange396 commented Feb 27, 2025

Uh oh!

carlynvandyke left a comment

Uh oh!

Uh oh!

Uh oh!

Update Google Docs Meta Data #1612

Update Google Docs Meta Data #1612

Uh oh!

Conversation

github-actions bot commented Feb 27, 2025 • edited by melange396 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud bot commented Feb 27, 2025

Quality Gate passed

Uh oh!

melange396 commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melange396 commented Feb 27, 2025

Uh oh!

carlynvandyke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2025 •

edited by melange396

Loading

melange396 commented Feb 27, 2025 •

edited

Loading