Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Google Docs Meta Data #1612

Merged
merged 1 commit into from
Feb 28, 2025
Merged

Update Google Docs Meta Data #1612

merged 1 commit into from
Feb 28, 2025

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Feb 27, 2025

Updating Google Docs Meta Data

Change summary:

  • in db_sources.csv:
    • new entry for nssp
  • in db_signals.csv:
    • 2 new google-symptoms signals for conjunctivitis
    • 8 new nssp signals for rsv and for counts of reporting hospitals
    • ~19 various edits, including:
      • "Causes"-->"Cause" in nchs-mortality names and signal sets
      • some Safegraph signal set changes

@melange396
Copy link
Collaborator

melange396 commented Feb 27, 2025

newest comparison code:

import csv
import requests


# pull down the existing and proposed/pending versions of the signal description csv file

dev_file = "https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/refs/heads/dev/src/server/endpoints/covidcast_utils/db_signals.csv"
dev = []
with requests.get(dev_file, stream=True) as req:
    for row in csv.reader(req.iter_lines(decode_unicode=True)):
        dev.append(row)

new_file = "https://raw.githubusercontent.com/cmu-delphi/delphi-epidata/refs/heads/bot/update-docs/src/server/endpoints/covidcast_utils/db_signals.csv"
new = []
with requests.get(new_file, stream=True) as req:
    for row in csv.reader(req.iter_lines(decode_unicode=True)):
        new.append(row)


# column name lists
dev_cols = set(dev[0])
new_cols = set(new[0])
both_cols = list(dev_cols.intersection(new_cols))

# get the right column number for each version of the file, based on the column name
dev_col_lookup = {c: i for i,c in enumerate(dev[0])}
new_col_lookup = {c: i for i,c in enumerate(new[0])}

# get the right row number for each version of the file, based on `(source,signal)`

dev_row_lookup = {}
for i, row in enumerate(dev):
    src = row[dev_col_lookup["Source Subdivision"]]
    sig = row[dev_col_lookup["Signal"]]
    if (src, sig) in dev_row_lookup:
        print("!!! src:sig duplicate in dev file! --", src, ":", sig)
    dev_row_lookup[(src, sig)] = i
dev_signals = set(dev_row_lookup.keys())

new_row_lookup = {}
for i, row in enumerate(new):
    src = row[new_col_lookup["Source Subdivision"]]
    sig = row[new_col_lookup["Signal"]]
    if (src, sig) in new_row_lookup:
        print("!!! src:sig duplicate in new file! --", src, ":", sig)
    new_row_lookup[(src, sig)] = i
new_signals = set(new_row_lookup.keys())

# print summary info
if dev[0] != new[0]:
    print("column ordering changed!")
print("added columns:", sorted(list(new_cols-dev_cols)))
print("removed columns:", sorted(list(dev_cols-new_cols)))
print("# rows in dev file:", len(dev))
print("# rows in new file:", len(new))
print("row count difference:", len(new)-len(dev))
print("added signals:", sorted(list(new_signals-dev_signals)))
print("removed signals:", sorted(list(dev_signals-new_signals)))
print("\n")

# TODO: detect row reorderings

# add column names to this set as needed to ignore differences found in them (to simplify output for easier analysis)
columns_to_ignore = {"XXXXXX ignore me XXXXXX"}
both_cols = [col for col in both_cols if col not in columns_to_ignore]

# show individual changes
changes_count = 0
for i in range(len(dev)):
    src = dev[i][dev_col_lookup["Source Subdivision"]]
    sig = dev[i][dev_col_lookup["Signal"]]
    if (src, sig) not in new_row_lookup:
        # this is a removed signal so no summary is displayed
        continue
    dev_ln_num = i
    new_ln_num = new_row_lookup[(src, sig)]
    # prepare properly ordered list of values from both
    dev_line = [dev[dev_ln_num][dev_col_lookup[col]] for col in both_cols]
    new_line = [new[new_ln_num][new_col_lookup[col]] for col in both_cols]
    if dev_line != new_line:
        changes_count += 1
        print("\nMISMATCH!!  [", src, ":", sig, "]  dev row:", dev_ln_num+1, "/ new row:", new_ln_num+1)
        print("\n".join(["".join([
                "  ", col, ":\n    ", dev[dev_ln_num][dev_col_lookup[col]], "\n    -->\n    ", new[new_ln_num][new_col_lookup[col]]])
                for col in both_cols if dev[dev_ln_num][dev_col_lookup[col]]!=new[new_ln_num][new_col_lookup[col]]
            ]))

print("\n")
print("lines with changes:", changes_count)

# TODO: use f-string formatting in print() statements

@melange396
Copy link
Collaborator

output from code above:

added columns: []
removed columns: []
# rows in dev file: 477
# rows in new file: 487
row count difference: 10
added signals: [('google-symptoms', 's07_raw_search'), ('google-symptoms', 's07_smoothed_search'), ('nhsn', 'confirmed_admissions_rsv_ew'), ('nhsn', 'confirmed_admissions_rsv_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_covid_ew'), ('nhsn', 'hosprep_confirmed_admissions_covid_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_flu_ew'), ('nhsn', 'hosprep_confirmed_admissions_flu_ew_prelim'), ('nhsn', 'hosprep_confirmed_admissions_rsv_ew'), ('nhsn', 'hosprep_confirmed_admissions_rsv_ew_prelim')]
removed signals: []



MISMATCH!!  [ nchs-mortality : deaths_allcause_incidence_num ]  dev row: 411 / new row: 413
  Signal Set:
    NCHS All Causes Deaths
    -->
    NCHS All Cause Deaths
  Name:
    All Causes Deaths (Weekly new)
    -->
    All Cause Deaths (Weekly new)

MISMATCH!!  [ nchs-mortality : deaths_allcause_incidence_prop ]  dev row: 412 / new row: 414
  Signal Set:
    NCHS All Causes Deaths
    -->
    NCHS All Cause Deaths

MISMATCH!!  [ safegraph-daily : completely_home_prop ]  dev row: 442 / new row: 444
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : completely_home_prop_7dav ]  dev row: 443 / new row: 445
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : full_time_work_prop ]  dev row: 444 / new row: 446
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : full_time_work_prop_7dav ]  dev row: 445 / new row: 447
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : median_home_dwell_time ]  dev row: 446 / new row: 448
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : median_home_dwell_time_7dav ]  dev row: 447 / new row: 449
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : part_time_work_prop ]  dev row: 448 / new row: 450
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-daily : part_time_work_prop_7dav ]  dev row: 449 / new row: 451
  Signal Set:
    Safegraph Daily Mobility Data
    -->
    Safegraph Home Mobility

MISMATCH!!  [ safegraph-weekly : bars_visit_num ]  dev row: 450 / new row: 452
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : bars_visit_prop ]  dev row: 451 / new row: 453
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : restaurants_visit_num ]  dev row: 452 / new row: 454
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ safegraph-weekly : restaurants_visit_prop ]  dev row: 453 / new row: 455
  Signal Set:
    Safegraph Weekly Mobility Data
    -->
    Safegraph POI Mobility

MISMATCH!!  [ nhsn : confirmed_admissions_covid_ew ]  dev row: 474 / new row: 476
  Short Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday).
    -->
    COVID-19 hospital admissions per week (final)
  Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
    -->
    Total number of patients hospitalized with confirmed COVID-19 over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
  Member Short Name:
    final
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_covid_ew_prelim ]  dev row: 475 / new row: 478
  Short Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday).
    -->
    COVID-19 hospital admissions per week (preliminary)
  Description:
    Total number of patients hospitalized with confirmed COVID-19 over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
    -->
    Total number of patients hospitalized with confirmed COVID-19 over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
  Member Short Name:
    prelim
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_flu_ew ]  dev row: 476 / new row: 480
  Short Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). 
    -->
    flu hospital admissions per week (final)
  Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
    -->
    Total number of patients hospitalized with confirmed influenza over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Friday or Saturday of the following week.
  Member Short Name:
    final
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized

MISMATCH!!  [ nhsn : confirmed_admissions_flu_ew_prelim ]  dev row: 477 / new row: 482
  Short Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday).
    -->
    flu hospital admissions per week (preliminary)
  Description:
    Total number of patients hospitalized with confirmed influenza over the entire week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
    -->
    Total number of patients hospitalized with confirmed influenza over the epi-week (Sunday-Saturday). Only includes hospitalizations whose report was received before the Wednesday of the following week.
  Member Short Name:
    prelim
    -->
    "
  Format:
    
    -->
    count
  Source Name:
    National Healthcare Safety Network Respiratory Hospitalizations
    -->
    National Healthcare Safety Network
  Severity Pyramid Rungs:
    
    -->
    hospitalized


lines with changes: 18

Copy link

@carlynvandyke carlynvandyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, sorry it took so long to review!

@melange396 melange396 merged commit 2e6cd7e into dev Feb 28, 2025
7 checks passed
@melange396 melange396 deleted the bot/update-docs branch February 28, 2025 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants