You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add some stats to the standard update pipeline reports comparing changes between two versions of the PAINT release (i.e. the IBD file and the set of IBA GAFs). Ideally, the parameters should just be two dates corresponding to before and after releases (e.g. 2020-01-31 and 2020-03-26).
We already have two reports yet to be committed to this repo:
A simple SQL query to count IBDs created between the two parameter dates, split out by curator. Example result of comparing 2020-01-31 vs 2020-03-26:
name
count
Pascale Gaudet
97
Huaiyu Mi
978
Marc Feuermann
2153
Michael Kesling
884
Total
4112
A python script that works only with the contents of our monthly releases posted on our FTP server. It compares sets of IBDs from the IBD.gaf files and cross-references to IBAs through the PANTHER:PTN in the IBA's with/from column.
Further description of the stats the python script calclulates:
Added IBDs - Given two IBD/IBA sets, "before" and "after", find the IBDs in "after" that aren't in "before".
Obsoleted IBDs - Now find IBDs in "before" that aren't in "after"
Added IBAs - In the "after" set of IBA GAFs, count all IBAs that reference IBD PTN and term in Added IBDs
Obsoleted IBAs - In the "before" set of IBA GAFs, count all IBAs that reference IBD PTN and term in Obsoleted IBDs
Net IBA change = Added IBAs - Obsoleted IBAs
When running the script on "before" release 2020-01-31 and "after" release 2020-03-26 I get these numbers:
Added IBDs: 4062
Obsoleted IBDs: 1224
Added IBAs: 319,250
Obsoleted IBAs: 71,491
Net IBA change: 247,759
A third report displaying the % change by individual IBA GAF (e.g. paint_mgi, paint_human) as well as overall % change in IBA count will be added.
These reports will help quickly QA and identify potential data issues that would've then got out to the GO release data.
The text was updated successfully, but these errors were encountered:
@pgaudet Right, it'll go in that folder, prefixed with the run date, e.g. 2020-03-26-[report_name]. Since these are sort of global update stats we can probably just call the new report 2020-03-26_update_stats? What do you think?
Add some stats to the standard update pipeline reports comparing changes between two versions of the PAINT release (i.e. the IBD file and the set of IBA GAFs). Ideally, the parameters should just be two dates corresponding to before and after releases (e.g. 2020-01-31 and 2020-03-26).
We already have two reports yet to be committed to this repo:
2020-01-31
vs2020-03-26
:IBD.gaf
files and cross-references to IBAs through thePANTHER:PTN
in the IBA's with/from column.Further description of the stats the python script calclulates:
When running the script on "before" release
2020-01-31
and "after" release2020-03-26
I get these numbers:Added IBDs: 4062
Obsoleted IBDs: 1224
Added IBAs: 319,250
Obsoleted IBAs: 71,491
Net IBA change: 247,759
A third report displaying the % change by individual IBA GAF (e.g.
paint_mgi
,paint_human
) as well as overall % change in IBA count will be added.These reports will help quickly QA and identify potential data issues that would've then got out to the GO release data.
The text was updated successfully, but these errors were encountered: