Skip to content

Repair initial issue for Quidel:Omicron edition #1520

Closed
@krivard

Description

@krivard

Summary

The omicron changes for Quidel in v0.3.2 came with a whole heap of regions which should no longer be reported. In order to prioritize correct display in the dashboard, we used the following procedure:

  • Run Quidel v0.3.1 for the full calendar of reference dates. Initialize the archivediffer cache with this output.
  • Run Quidel v0.3.2 for the full calendar of reference dates. Given the cached files and the new output, archivediffer should have marked the removed regions as deleted in the files delivered to /common/covidcast/receiving.

Archivediffer seems to have noticed the deletions but not marked them in the output.

We need to:

  • figure out why the deletions weren't marked correctly
  • find a way to generate deletion annotations for everything removed by quidel between v0.3.1 and v0.3.2.

Details

Here’s a sample: Washington County, KY (FIPS:21229)

There are 612 days the pre-omicron code published a value for Washington County. The following environment was configured using pre-omicron code (v0.3.1) to output 1000 days of data, which carries us through the beginning of the calendar for Quidel:

covidcast-indicators/quidel_covidtest$ git status -uno
HEAD detached at covidcast-indicators/v0.3.1
covidcast-indicators/quidel_covidtest$ grep ^21229 receiving/*_county_covid_ag_smoothed_pct_positive.csv |head
receiving/20200526_county_covid_ag_smoothed_pct_positive.csv:21229,4.913667717688675,3.0568697758622676,50.0
receiving/20200527_county_covid_ag_smoothed_pct_positive.csv:21229,4.6693051871295275,2.9837161653964275,50.0
receiving/20200528_county_covid_ag_smoothed_pct_positive.csv:21229,4.875077252904564,3.0454600541418,50.0
receiving/20200529_county_covid_ag_smoothed_pct_positive.csv:21229,5.258998129413856,3.1567158618292117,50.0
receiving/20200530_county_covid_ag_smoothed_pct_positive.csv:21229,5.0472663892846406,3.09597073928053,50.00000000000001
receiving/20200531_county_covid_ag_smoothed_pct_positive.csv:21229,6.006233899782876,3.3602039947580162,50.00000000000001
receiving/20200601_county_covid_ag_smoothed_pct_positive.csv:21229,5.751155128909549,3.292536188333159,50.0
receiving/20200602_county_covid_ag_smoothed_pct_positive.csv:21229,5.001733541585632,3.0827131418201743,50.0
receiving/20200603_county_covid_ag_smoothed_pct_positive.csv:21229,5.025413601051074,3.0896167343004803,50.0
receiving/20200604_county_covid_ag_smoothed_pct_positive.csv:21229,5.139949494826776,3.1227419639583016,50.0
covidcast-indicators/quidel_covidtest$ grep ^21229 receiving/*_county_covid_ag_smoothed_pct_positive.csv |wc -l
612

The archivediffer cache in production shows no days of data for Washington County:

[indicators@delphi-master-prod-01 quidel_covidtest]$ grep ^21229 archivediffer_cache/*_county_covid_ag_smoothed_pct_positive.csv |wc -l
0

The indicator logs for yesterday’s first v0.3.2 run included many lines like this:

Diff has deleted indices in ./receiving/20220114_county_covid_ag_smoothed_pct_positive.csv that have been coded as nans.

However, while the files in /common/covidcast/archive/successful have 611 entries for Washington County (one short is weird but w/e), they contain no nan annotation columns that would have marked these nan values as deleted:

[indicators@delphi-master-prod-01 quidel_covidtest]$ zgrep ^21229 /common/covidcast/archive/successful/quidel/*county_covid_ag_smoothed_pct_positive.csv.gz |head
/common/covidcast/archive/successful/quidel/20200526_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200527_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200528_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200529_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200530_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200531_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200601_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200602_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200603_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
/common/covidcast/archive/successful/quidel/20200604_county_covid_ag_smoothed_pct_positive.csv.gz:21229,NA,NA,NA
[indicators@delphi-master-prod-01 quidel_covidtest]$ zgrep ^21229 /common/covidcast/archive/successful/quidel/*county_covid_ag_smoothed_pct_positive.csv.gz |wc -l
611
[indicators@delphi-master-prod-01 quidel_covidtest]$ zcat /common/covidcast/archive/successful/quidel/20200526_county_covid_ag_smoothed_pct_positive.csv.gz |head
geo_id,val,se,sample_size
01117,NA,NA,NA
12063,NA,NA,NA
12101,NA,NA,NA
12103,NA,NA,NA
13077,NA,NA,NA
13097,NA,NA,NA
13113,NA,NA,NA
13171,NA,NA,NA
13223,NA,NA,NA

I've put file archives of everything above online:

Metadata

Metadata

Assignees

Labels

data qualityMissing data, weird data, broken data

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions