-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: check RGI mapping accuracy and manually curate incorrect hits #89
base: main
Are you sure you want to change the base?
Conversation
All RGI hits are checked for mapping accuracy using `check_mapping_accuracy.py`. This is done by comparing the drug categorization of the ARO assigned to the ARG with the drug categorization assigned to the ARG by its original database. A list of ARGs with mismatched mappings are included in `manual_curation/`. Mismatched hits are manually curated to correct ARGs and manual curation files are updated. Some mismatched hits are marked `correct` or `incorrect`. `correct` hits have a drug category mismatch but have an ARO mapping that's determined to be correct. `incorrect` hits have a drug category mismatch and a manually curated ARO can't be found for them. Both `correct` and `incorrect` hits should not be included in the final manual curation. For megares, a file called `megares_meta_biocide_and_virulence_genes.tsv` is also created with metal, biocide, and virulence genes. These should not be mapped to any ARO terms and can be manually curated to no ARO. Metal, biocide, and virulence genes are directly added to manual curation files for other databases. Groot is derived from resfinder and argannot. A list of all mismatched hits, with their manual curation can be found in `db_harmonisation/all_mismatched_hits.tsv`
See line comments, but also there are also no uses of the description field |
The description field for some genes has information on their origins and links to papers, e.g. https://academic.oup.com/femsle/article-abstract/203/1/49/478714?redirectedFrom=fulltext for the I do realize that the description field really isn't useful for some other manual curation files like the one for If I understand your comment, then you'd like me to remove the |
No, quite the contrary. Expand its use |
Got it, I'll add more details throughout. |
Also, besides these comments, I can't see any other line comments. I didn't receive anything in my email as well, am I missing something? |
All RGI hits are checked for mapping accuracy using
check_mapping_accuracy.py
. This is done by comparing the drug categorization of the ARO assigned to the ARG with the drug categorization assigned to the ARG by its original database. A list of ARGs with mismatched mappings are included inmanual_curation/
. Mismatched hits are manually curated to correct ARGs and manual curation files are updated. Some mismatched hits are markedcorrect
orincorrect
.correct
hits have a drug category mismatch but have an ARO mapping that's determined to be correct.incorrect
hits have a drug category mismatch and a manually curated ARO can't be found for them. Bothcorrect
andincorrect
hits should not be included in the final manual curation. For megares, a file calledmegares_meta_biocide_and_virulence_genes.tsv
is also created with metal, biocide, and virulence genes. These should not be mapped to any ARO terms and can be manually curated to no ARO. Metal, biocide, and virulence genes are directly added to manual curation files for other databases. Groot is derived from resfinder and argannot.A list of all mismatched hits, with their manual curation can be found in
db_harmonisation/all_mismatched_hits.tsv