Skip to content

Commit 035a24a

Browse files
committed
add startBeforeEnd documentation
1 parent 6ae2569 commit 035a24a

File tree

2 files changed

+27
-12
lines changed

2 files changed

+27
-12
lines changed

vignettes/checks/plausibleAfterBirth.Rmd

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ output:
2020
The number and percent of records with a date value in the **cdmFieldName** field of the **cdmTableName** table that occurs prior to birth.
2121

2222
## Definition
23-
This check verifies that events happen after birth. This check is only run on fields where the **PLAUSIBLE_AFTER_BIRTH** parameter is set to **Yes**. The birthdate is taken from the `person` table, either the `birth_datetime` or composed from `year_of_birth`, `month_of_birth`, `day_of_birth` (taking 1st month/1st day if missing).
23+
This check verifies that events happen after birth. The birthdate is taken from the `person` table, either the `birth_datetime` or composed from `year_of_birth`, `month_of_birth`, `day_of_birth` (taking 1st month/1st day if missing).
2424

2525
- *Numerator*: The number of records with a non-null date value that happen prior to birth
2626
- *Denominator*: The total number of records in the table with a non-null date value

vignettes/checks/plausibleStartBeforeEnd.Rmd

+26-11
Original file line numberDiff line numberDiff line change
@@ -14,33 +14,48 @@ output:
1414
**Context**: Verification\
1515
**Category**: Plausibility\
1616
**Subcategory**: Temporal\
17-
**Severity**:
17+
**Severity**: CDM convention ⚠\
1818

1919

2020
## Description
21-
The number and percent of records with a value in the @cdmFieldName field of the @cdmTableName that occurs after the date in the @plausibleStartBeforeEndFieldName.
21+
The number and percent of records with a value in the **cdmFieldName** field of the **cdmTableName** that occurs after the date in the **plausibleStartBeforeEndFieldName**.
2222

2323

2424
## Definition
25+
Most tables have a field for the start and a field for the end date for the event. This check verifies that the start date is not after the end date. The start date can be before the end date or equal to the end date. It is applied to the start date field and takes the end date field as a parameter. Both date and datetime fields are checked.
2526

26-
- *Numerator*:
27-
- *Denominator*:
28-
- *Related CDM Convention(s)*:
29-
- *CDM Fields/Tables*:
30-
- *Default Threshold Value*:
27+
- *Numerator*: The number of records where date in **cdmFieldName** is after the date in **plausibleStartBeforeEndFieldName**.
28+
- *Denominator*: The total number of records with a non-null start and non-null end date value
29+
- *Related CDM Convention(s)*: -Not linked to a convention-
30+
- *CDM Fields/Tables*: This check runs on all date and datetime fields that have a start and end date in the same table. It also runs on the `cdm_source` table, comparing `source_release_date` is before `cdm_release_date`.
31+
- *Default Threshold Value*: 1% (except for vocabulary and cdm_source tables, where it is 0%)
3132

3233

3334
## User Guidance
34-
35+
If the start date is after the end date, it is likely that the data is incorrect or the dates are unreliable.
3536

3637
### Violated rows query
3738
```sql
38-
39+
SELECT
40+
'@cdmTableName.@cdmFieldName' AS violating_field,
41+
cdmTable.*
42+
FROM @schema.@cdmTableName cdmTable
43+
WHERE cdmTable.@cdmFieldName IS NOT NULL
44+
AND cdmTable.@plausibleStartBeforeEndFieldName IS NOT NULL
45+
AND cdmTable.@cdmFieldName > cdmTable.@plausibleStartBeforeEndFieldName
3946
```
4047

41-
4248
### ETL Developers
49+
There main reason for this check to fail is often that the source data is incorrect. If the end date is derived from other data, the calculation might not take into account some edge cases.
4350

51+
Any violating checks should either be removed or corrected. In most cases this can be done by adjusting the end date:
52+
- With a few exceptions, the end date is not mandatory and can be left empty.
53+
- If the end date is mandatory (visit_occurrence and drug_exposure), the end date can be set to the start date if the event. Note tha
54+
- If this check fails for the observation_period, it might signify a bigger underlying issue. Please investigate all records for this person in the CDM and source.
55+
- If neither the start or end date can be trusted, pleaes remove the record from the CDM.
4456

45-
### Data Users
57+
Make sure to clearly document the choices in your ETL specification.
4658

59+
### Data Users
60+
An start date after the end date gives negative event durations, which might break analyses.
61+
Especially take note if this check fails for the `observation_period` table. This means that there are persons with negative observation time. If these persons are included in a cohort, it will potentially skew e.g. survival analyses.

0 commit comments

Comments
 (0)