Skip to content

1972 replace covidcast #2056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
4afe0f0
in progress for replacing covidcast
aysim319 Jul 9, 2024
67a312c
moving wrapper in seperate module
aysim319 Jul 25, 2024
e4f2679
working on test
aysim319 Jul 29, 2024
c941982
post process for metadata
aysim319 Jul 29, 2024
a5628ac
lint and cleanup
aysim319 Jul 30, 2024
6e22db8
fixing for test
aysim319 Jul 30, 2024
a2a149f
implimentating suggested changes
aysim319 Aug 1, 2024
9269621
removing wrapper and directly converting in other places and moving t…
aysim319 Aug 5, 2024
8e67b6c
modifing test
aysim319 Aug 5, 2024
bf21d33
more suggestion
aysim319 Aug 6, 2024
d76cd40
sircomplainslot needs more filtering
aysim319 Aug 7, 2024
fd50d9d
adding credential for google symptoms
aysim319 Aug 8, 2024
76f1519
mocking api call in google symptoms
aysim319 Aug 8, 2024
157c6c6
organizing validations
aysim319 Aug 8, 2024
23384e7
extended date range and throws error when comes empty
aysim319 Aug 9, 2024
e5c3b46
delphi_utils/validator/datafetcher.py
aysim319 Aug 9, 2024
33936a4
test+refactor: tweak covidcast port tests and
dshemetov Aug 9, 2024
01a7f66
fix: don't parse datetimes on unused columns
dshemetov Aug 9, 2024
6eeef4e
lint: remove unused types
dshemetov Aug 9, 2024
1f59e06
fix: dont import covidcast in sir_complainsalot
dshemetov Aug 9, 2024
7f60275
fix: remove covidcast from indicator setup.py dependencies
dshemetov Aug 9, 2024
55150bc
fix: remove duplicate ported_signal
dshemetov Aug 10, 2024
329d340
fix: revert _parse_datetimes
dshemetov Aug 12, 2024
9d91be7
adding conditional to fail if api fails
aysim319 Aug 22, 2024
5ac98ab
change
aysim319 Sep 13, 2024
b92695a
Merge branch 'main' into 1972-replace-covidcast
aysim319 Sep 13, 2024
15ce75f
merge change that didn't make it for some reason
aysim319 Sep 13, 2024
6ee3e9e
lint
aysim319 Sep 13, 2024
f25605d
lint again
aysim319 Sep 13, 2024
2654946
fixing logic
aysim319 Sep 13, 2024
c63f095
remove covidcast from pyproject.toml
aysim319 Sep 13, 2024
79bf550
fix tests
aysim319 Sep 13, 2024
4c44d3a
need to update requirements
aysim319 Sep 13, 2024
8a308c4
fix test
aysim319 Sep 13, 2024
b4039c5
lint and fix package
aysim319 Sep 13, 2024
4916465
fix test
aysim319 Sep 13, 2024
670bf04
lint
aysim319 Sep 13, 2024
2f94d15
lint
aysim319 Sep 13, 2024
57fc591
suggested changes
aysim319 Sep 18, 2024
913c72f
fixed test
aysim319 Sep 18, 2024
a9cebed
handle check more gracefully
aysim319 Sep 18, 2024
e30aaca
export date util
aysim319 Sep 18, 2024
733c85e
wrap around try except for sircal
aysim319 Sep 19, 2024
f61462a
merge conflict
aysim319 Sep 19, 2024
06fafd0
lint
aysim319 Sep 19, 2024
d3bc895
remove former testing script
aysim319 Sep 19, 2024
68e2850
lint and fixing missing params
aysim319 Sep 19, 2024
9ff0979
lint
aysim319 Sep 19, 2024
f7fcefc
fixed test
aysim319 Sep 19, 2024
957af29
lock pandas version
aysim319 Sep 20, 2024
9585196
lint
aysim319 Sep 20, 2024
cf4f06d
lint
aysim319 Sep 20, 2024
b5929af
more fix test
aysim319 Sep 20, 2024
ee64984
lint again
aysim319 Sep 20, 2024
fa9143a
lint
aysim319 Sep 20, 2024
2de9d3b
Merge branch 'main' into 1972-replace-covidcast
aysim319 Nov 5, 2024
a1aad7a
changed based on suggestion
aysim319 Nov 11, 2024
38f25bb
made consistent with actual response
aysim319 Nov 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions _delphi_utils_python/delphi_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from __future__ import absolute_import

from .archive import ArchiveDiffer, GitArchiveDiffer, S3ArchiveDiffer
from .date_utils import convert_apitime_column_to_datetimes, date_to_api_string
from .export import create_backup_csv, create_export_csv
from .geomap import GeoMapper
from .logger import get_structured_logger
Expand Down
38 changes: 38 additions & 0 deletions _delphi_utils_python/delphi_utils/date_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
"""Utility for converting dates to a format accepted by epidata api."""

from datetime import datetime

import pandas as pd
from epiweeks import Week


def date_to_api_string(d: datetime, time_type: str = "day") -> str:
"""Convert a date object to a YYYYMMDD or YYYYMM string expected by the API."""
if time_type == "day":
return d.strftime("%Y%m%d")
if time_type == "week":
return Week.fromdate(d).cdcformat()
raise ValueError(f"Unknown time_type: {time_type}")


def convert_apitime_column_to_datetimes(
df: pd.DataFrame, col: str, date_format: str = "%Y%m%d"
) -> pd.Series:
"""Convert a DataFrame date or epiweek column into datetimes.

Dates are assumed to be in the YYYYMMDD format by default.
Weeks are assumed to be in the epiweek CDC format YYYYWW
format and return the date of the first day of the week.
"""
df[col] = df[col].astype("str")

def parse_row(row):
if row["time_type"] == "day":
return pd.to_datetime(row[col], format=date_format)
if row["time_type"] == "week":
return pd.to_datetime(
Week(int(row[col][:4]), int(row[col][-2:])).startdate()
)
return row[col]

return df.apply(parse_row, axis=1)
60 changes: 46 additions & 14 deletions _delphi_utils_python/delphi_utils/validator/datafetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,17 @@

import re
import threading
import warnings
from os import listdir

# pylint: disable=W0707
from os.path import isfile, join
import warnings
import requests
import pandas as pd

import numpy as np
import covidcast
import pandas as pd
import requests
from delphi_epidata import Epidata

from .errors import APIDataFetchError, ValidationFailure

FILENAME_REGEX = re.compile(
Expand Down Expand Up @@ -115,7 +119,13 @@ def get_geo_signal_combos(data_source, api_key):
meta_response.raise_for_status()
source_signal_mappings = {i['source']:i['db_source'] for i in
meta_response.json()}
meta = covidcast.metadata()

response = Epidata.covidcast_meta()

meta = pd.DataFrame.from_dict(Epidata.check(response))
# note: this will fail for signals with weekly data, but currently not supported for validation
meta = meta[meta["time_type"] == "day"]

source_meta = meta[meta['data_source'] == data_source]
# Need to convert np.records to tuples so they are hashable and can be used in sets and dicts.
geo_signal_combos = list(map(tuple,
Expand Down Expand Up @@ -158,18 +168,40 @@ def fetch_api_reference(data_source, start_date, end_date, geo_type, signal_type

Formatting is changed to match that of source data CSVs.
"""
with warnings.catch_warnings():
warnings.simplefilter("ignore")
api_df = covidcast.signal(
data_source, signal_type, start_date, end_date, geo_type)
if start_date > end_date:
raise ValueError(
"end_date must be on or after start_date, but "
+ f"start_date = '{start_date}', end_date = '{end_date}'"
)
response = Epidata.covidcast(
data_source,
signal_type,
time_type="day",
geo_type=geo_type,
time_values=Epidata.range(
start_date.strftime("%Y%m%d"), end_date.strftime("%Y%m%d")
),
geo_value="*",
)

error_context = f"when fetching reference data from {start_date} to {end_date} " +\
f"for data source: {data_source}, signal type: {signal_type}, geo type: {geo_type}"
try:
epidata_dict = Epidata.check(response)
except Exception as e:
raise APIDataFetchError(str(e))

if api_df is None:
if len(response["epidata"]) == 0:
error_context = f"when fetching reference data from {start_date} to {end_date} " + \
f"for data source: {data_source}, signal type: {signal_type}, geo type: {geo_type}"
raise APIDataFetchError("Error: no API data was returned " + error_context)
if not isinstance(api_df, pd.DataFrame):
raise APIDataFetchError("Error: API return value was not a dataframe " + error_context)

api_df = pd.DataFrame.from_dict(epidata_dict)
# note: this will fail for signals with weekly data, but currently not supported for validation
api_df["issue"] = pd.to_datetime(api_df["issue"], format="%Y%m%d")
api_df["time_value"] = pd.to_datetime(api_df["time_value"], format="%Y%m%d")
api_df.drop("direction", axis=1, inplace=True)
api_df["data_source"] = data_source
api_df["signal"] = signal_type


column_names = ["geo_id", "val",
"se", "sample_size", "time_value"]
Expand Down
14 changes: 7 additions & 7 deletions _delphi_utils_python/delphi_utils/validator/dynamic.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
"""Dynamic file checks."""

import re
from dataclasses import dataclass
from datetime import date, timedelta
from typing import Dict, Set
import re
import pandas as pd

import numpy as np
import covidcast
from .errors import ValidationFailure
import pandas as pd

from .datafetcher import get_geo_signal_combos, threaded_api_calls
from .utils import relative_difference_by_min, TimeWindow, lag_converter
from .errors import ValidationFailure
from .utils import TimeWindow, lag_converter, relative_difference_by_min


class DynamicValidator:
Expand Down Expand Up @@ -78,8 +80,6 @@ def validate(self, all_frames, report):
# Get 14 days prior to the earliest list date
outlier_lookbehind = timedelta(days=14)

# Authenticate API
covidcast.use_api_key(self.params.api_key)

# Get all expected combinations of geo_type and signal.
geo_signal_combos = get_geo_signal_combos(self.params.data_source,
Expand Down
8 changes: 5 additions & 3 deletions _delphi_utils_python/delphi_utils/validator/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,10 @@
when the module is run with `python -m delphi_utils.validator`.
"""
import argparse as ap
import covidcast
from .. import read_params, get_structured_logger

from delphi_epidata import Epidata

from .. import get_structured_logger, read_params
from .validate import Validator


Expand All @@ -18,7 +20,7 @@ def run_module():
args = parser.parse_args()
params = read_params()
assert "validation" in params
covidcast.use_api_key(params["validation"]["common"]["api_credentials"])
Epidata.auth = ("epidata", params["validation"]["common"]["api_credentials"])
dry_run_param = params["validation"]["common"].get("dry_run", False)
params["validation"]["common"]["dry_run"] = args.dry_run or dry_run_param
validator = Validator(params)
Expand Down
4 changes: 2 additions & 2 deletions _delphi_utils_python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,13 @@ classifiers = [
]
dependencies = [
"boto3",
"covidcast",
"cvxpy",
"delphi-epidata",
"epiweeks",
"gitpython",
"importlib_resources>=1.3",
"numpy",
"pandas>=1.1.0",
"pandas==1.5.3",
"requests",
"slackclient",
"scs<3.2.6", # TODO: remove this ; it is a cvxpy dependency, and the excluded version appears to break our jenkins build. see: https://github.com/cvxgrp/scs/issues/283
Expand Down
28 changes: 28 additions & 0 deletions _delphi_utils_python/tests/test_data/sample_epidata_metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{"data_source": ["chng", "chng", "chng",
"covid-act-now",
"covid-act-now",
"covid-act-now",
"chng"],
"signal": ["smoothed_outpatient_cli",
"smoothed_outpatient_covid",
"smoothed_outpatient_covid",
"pcr_specimen_positivity_rate",
"pcr_specimen_positivity_rate",
"pcr_specimen_total_tests",
"inactive"],
"geo_type": ["state", "state", "county",
"hrr", "msa", "msa",
"state"],
"min_time": ["20200101", "20200101", "20200101",
"20200101", "20200101", "20200101",
"20200101"],
"max_time": ["20240101", "20240101", "20240101",
"20240101", "20240101", "20240101",
"20240101"],
"last_update": [1711963480, 1711963480, 1711963480,
1711963480, 1711963480, 1711963480,
1711963480],
"time_type": ["day", "day", "day",
"day", "day", "day",
"day"]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[{"geo_value": "1044",
"stderr": null,
"value": 3,
"issue": 20200101,
"lag": 7,
"sample_size": null,
"time_value": 20200101,
"direction": null
}]
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[{"geo_value": "0888",
"stderr": 2,
"value": 14,
"issue": 20200101,
"lag": 1,
"sample_size": 100,
"time_value": 20200101,
"direction": null
}]
Loading