-
Notifications
You must be signed in to change notification settings - Fork 33
Missing per-county tested data #356
Comments
Actually, looking at
I see that for Pennsylvania counties the newest date with data is from 2020-06-07 |
But there are many counties where Corona Data Scraper had (this creates a file with the names of every county in the US with
and comparing them to data a similar file created from data fetched from https://coronadatascraper.com/timeseries.csv.zip today:
It looks like there are 851 counties that lost |
Here are 4 examples:
Looking at |
Hi @TomGoBravo , thx for the great notes. It looks like this is a result of a few things:
Some of the ported sources are currently failing in live: ref https://api.covidatlas.com/status?format=html.
I'll try fixing PA first, and see where that takes us. |
Re "I see that for Pennsylvania counties the newest date with data is from 2020-06-07" - checking code comments and issues - we had an issue for that, https://github.com/covidatlas/coronadatascraper/issues/1055. PA changed their reporting to now use PDFs. Code in src/shared/sources/us/pa/index.js has a comment:
I'll switch from PA to KS first (one of the Brown County items you listed) to see if I can get that working. |
Blarg, running into issues with getting KS to work. Similar to PA, KS switched to reporting stuff via PDFs, and for some reason the PDF code is not working -- have hacked around and can't grok it just yet. Will raise another issue for it. |
I've also found a similar issue - lots of county-level data in the central US appears to be missing. |
Hi all, I believe I've found the reason for missing data, though I'm not sure what caused the cause. Our reports are built up by location, stored in the Locations table. I checked the production table, and we don't have brown-county-texas-us (locationID iso1:us#iso2:us-tx#fips:48049), but we do have brown-county-illinois-us (iso1:us#iso2:us-il#fips:17009). I'm not sure why that's the case -- the location data should be populated when data is scraped. We do have data for the brown-country-texas-us location:
I'll look into a manual load of location data ... I don't know why we're loading locations during data scrape anyway, as we already have all of the location data. ps - I haven't bothered looking into the other missing counties -- thanks for the list above -- but it seems highly likely this is the problem. |
The "locations" lambda (which updates locations) appears to have been timing out. For most sources it's ok, but for something like jhu-usa, which updates thousands of locations, it fails. Local logging:
and it stops. I see errors in the lambda log, and am assuming it's that. I bumped up the timeout for the lambda. Updating all locations takes about 1.5 mins for jhu-usa locally. Simplifying the code slightly now. |
I believe this will be addressed by #367. I'll launch that to production soon (< 15 mins). We'll need to wait for a jhu-usa scrape to update all of the locations. |
Launched to prod ... let's see how things shake out. |
Also assigning @TomGoBravo and @martynwong , if you see the data has filled in before I do, please close the issue. Cheers! jz |
Hurrah! The data is working for me. Thanks! |
tested
dataAdditional context
CovidActNow has been regularly fetching this file for months and making a copy at https://github.com/covid-projections/covid-data-public/commits/master/data/cases-cds/timeseries.csv
With the change to Project Li I noticed that many counties that used to have values in the
tested
column have no data now. The problem seems to be particularly bad in Pennsylvania.The text was updated successfully, but these errors were encountered: