-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address qc-duplicate-exact-synonym-no-abbrev
failures in mondo
#751
Conversation
- Filters cases where there are multiple Mondo terms with the same exactSynonym, and instead puts them into a special curation TSV. - Add: reports/review-qc-duplicate-exact-synonym-no-abbrev.tsv - Add/Update: Make goals & Python scripting to filter files & create that TSV.
c05e5b3
to
8c2c20d
Compare
- Update: Handle Mondo -unconfirmed cases too: Cases where the synonym sync finds no trace of the synonym in the source, but since we have not yet - Update: Removed -confirmed deduping. This wasn't the correct way to handle checking against synonyms that already exist in Mondo, and -confirmed does not introduce duplication anyway. It doesn't add synonyms; only adds evidence for existing ones.
- Remove some temporary code
- Update: Handling cases now where a new exactSynonym coming in through the sync is equivalent to a label that exists on another Mondo term.
- codestyle update
- Added a make goal prereq
- Bug fix: Not all abbreviations were being filtered out - Bug fix: Rows were showing up that were not supposed to, as a result of insufficient abbreviation information; only some rows for the given synonym were being filtered out, which resulted in some cases where there was only 1 row for the synonym showing in the sheet, or rows where the only cases that were showing up were -confirmed or -unconfirmed, which didn't make sense.
- Removed some debug lines / comments
- Bug fix: Filtering obsoletes from check. It was being overzealous. It is allowable to have a conflict with an obsolete term. This was also manifsting in an issue where the only entries for a synonym in review.tsv were of -confirmed and -unconfirmed.
- Removed a todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I am not entirely convinced an additional script was needed vs. incorporating this as a requirement for the synonym sync script, I am approving since the data generated in monarch-initiative/mondo#8584 looks generally as expected and the mondo-edit.obo
file passes all QC steps. Nice job getting these items all sorted!
Thanks! If there's an idea of how to easily refactor this, let me know. It needs to come after the normal synonym sync, because the synonym sync, like the other pipeline, runs on each source individually. But in order to resolve this issue, we need to look at all sources in aggregate, hence the separate script. |
Partially addresses issues in:
Overview
Filters cases where there are multiple Mondo terms with the same exactSynonym, and instead puts them into a special curation TSV.
Pre-merge checklist
Documentation
Was the documentation added/updated under
docs/
?QC
Was the full pipeline run before submitting this PR using
sh run.sh make build-mondo-ingest
on this branch (afterdocker pull obolibrary/odkfull:dev
), and no errors occurred?Mini build:
qc-duplicate-exact-synonym-no-abbrev
failures inmondo
- mini build #752Build:
qc-duplicate-exact-synonym-no-abbrev
failures inmondo
- build #755New Packages
Were any new Python packages added?
Were any other non-Python packages added?
PR Review and Conversations Resolved
Has the PR been sufficiently reviewed by at least 1 team member of the Mondo Technical team and all threads resolved?
Additional info
Google sheet
Mondo PR using this branch:
qc-duplicate-exact-synonym-no-abbrev
mondo#8584