Description
Summary
The omicron changes for Quidel in v0.3.2 came with a whole heap of regions which should no longer be reported. In order to prioritize correct display in the dashboard, we used the following procedure:
- Run Quidel v0.3.1 for the full calendar of reference dates. Initialize the archivediffer cache with this output.
- Run Quidel v0.3.2 for the full calendar of reference dates. Given the cached files and the new output, archivediffer should have marked the removed regions as deleted in the files delivered to /common/covidcast/receiving.
Archivediffer seems to have noticed the deletions but not marked them in the output.
We need to:
- figure out why the deletions weren't marked correctly
- find a way to generate deletion annotations for everything removed by quidel between v0.3.1 and v0.3.2.
Details
Here’s a sample: Washington County, KY (FIPS:21229)
There are 612 days the pre-omicron code published a value for Washington County. The following environment was configured using pre-omicron code (v0.3.1) to output 1000 days of data, which carries us through the beginning of the calendar for Quidel:
covidcast-indicators/quidel_covidtest$ git status -uno
HEAD detached at covidcast-indicators/v0.3.1
covidcast-indicators/quidel_covidtest$ grep ^21229 receiving/*_county_covid_ag_smoothed_pct_positive.csv |head
receiving/20200526_county_covid_ag_smoothed_pct_positive.csv:21229,4.913667717688675,3.0568697758622676,50.0
receiving/20200527_county_covid_ag_smoothed_pct_positive.csv:21229,4.6693051871295275,2.9837161653964275,50.0
receiving/20200528_county_covid_ag_smoothed_pct_positive.csv:21229,4.875077252904564,3.0454600541418,50.0
receiving/20200529_county_covid_ag_smoothed_pct_positive.csv:21229,5.258998129413856,3.1567158618292117,50.0
receiving/20200530_county_covid_ag_smoothed_pct_positive.csv:21229,5.0472663892846406,3.09597073928053,50.00000000000001
receiving/20200531_county_covid_ag_smoothed_pct_positive.csv:21229,6.006233899782876,3.3602039947580162,50.00000000000001
receiving/20200601_county_covid_ag_smoothed_pct_positive.csv:21229,5.751155128909549,3.292536188333159,50.0
receiving/20200602_county_covid_ag_smoothed_pct_positive.csv:21229,5.001733541585632,3.0827131418201743,50.0
receiving/20200603_county_covid_ag_smoothed_pct_positive.csv:21229,5.025413601051074,3.0896167343004803,50.0
receiving/20200604_county_covid_ag_smoothed_pct_positive.csv:21229,5.139949494826776,3.1227419639583016,50.0
covidcast-indicators/quidel_covidtest$ grep ^21229 receiving/*_county_covid_ag_smoothed_pct_positive.csv |wc -l
612
The archivediffer cache in production shows no days of data for Washington County:
[indicators@delphi-master-prod-01 quidel_covidtest]$ grep ^21229 archivediffer_cache/*_county_covid_ag_smoothed_pct_positive.csv |wc -l
0
The indicator logs for yesterday’s first v0.3.2 run included many lines like this:
Diff has deleted indices in ./receiving/20220114_county_covid_ag_smoothed_pct_positive.csv that have been coded as nans.
However, while the files in /common/covidcast/archive/successful have 611 entries for Washington County (one short is weird but w/e), they contain no nan annotation columns that would have marked these nan values as deleted:
[indicators@delphi-master-prod-01 quidel_covidtest]$ zgrep ^21229 /common/covidcast/archive/successful/quidel/*county_covid_ag_smoothed_pct_positive.csv.gz |head
/common/covidcast/archive/successful/quidel/20200526_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200527_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200528_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200529_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200530_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200531_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200601_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200602_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200603_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200604_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
[indicators@delphi-master-prod-01 quidel_covidtest]$ zgrep ^21229 /common/covidcast/archive/successful/quidel/*county_covid_ag_smoothed_pct_positive.csv.gz |wc -l
611
[indicators@delphi-master-prod-01 quidel_covidtest]$ zcat /common/covidcast/archive/successful/quidel/20200526_county_covid_ag_smoothed_pct_positive.csv.gz |head
geo_id,val,se,sample_size
01117,NA,NA,NA
12063,NA,NA,NA
12101,NA,NA,NA
12103,NA,NA,NA
13077,NA,NA,NA
13097,NA,NA,NA
13113,NA,NA,NA
13171,NA,NA,NA
13223,NA,NA,NA
I've put file archives of everything above online:
- v0.3.1 quidel full-calendar output
- current quidel archivediffer cache contents (presumably all v0.3.2)
- current quidel files successfully processed by epidata acquisition (presumably all v0.3.2, since modification times are all Feb 8 afternoon)