Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: check RGI mapping accuracy and manually curate incorrect hits #89

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Vedanth-Ramji
Copy link
Member

All RGI hits are checked for mapping accuracy using check_mapping_accuracy.py. This is done by comparing the drug categorization of the ARO assigned to the ARG with the drug categorization assigned to the ARG by its original database. A list of ARGs with mismatched mappings are included in manual_curation/. Mismatched hits are manually curated to correct ARGs and manual curation files are updated. Some mismatched hits are marked correct or incorrect. correct hits have a drug category mismatch but have an ARO mapping that's determined to be correct. incorrect hits have a drug category mismatch and a manually curated ARO can't be found for them. Both correct and incorrect hits should not be included in the final manual curation. For megares, a file called megares_meta_biocide_and_virulence_genes.tsv is also created with metal, biocide, and virulence genes. These should not be mapped to any ARO terms and can be manually curated to no ARO. Metal, biocide, and virulence genes are directly added to manual curation files for other databases. Groot is derived from resfinder and argannot.

A list of all mismatched hits, with their manual curation can be found in db_harmonisation/all_mismatched_hits.tsv

All RGI hits are checked for mapping accuracy using `check_mapping_accuracy.py`. This is done by comparing the drug categorization of the ARO assigned to the ARG with the drug categorization assigned to the ARG by its original database. A list of ARGs with mismatched mappings are included in `manual_curation/`. Mismatched hits are manually curated to correct ARGs and manual curation files are updated. Some mismatched hits are marked `correct` or `incorrect`. `correct` hits have a drug category mismatch but have an ARO mapping that's determined to be correct. `incorrect` hits have a drug category mismatch and a manually curated ARO can't be found for them. Both `correct` and `incorrect` hits should not be included in the final manual curation. For megares, a file called `megares_meta_biocide_and_virulence_genes.tsv` is also created with metal, biocide, and virulence genes. These should not be mapped to any ARO terms and can be manually curated to no ARO. Metal, biocide, and virulence genes are directly added to manual curation files for other databases. Groot is derived from resfinder and argannot.

A list of all mismatched hits, with their manual curation can be found in `db_harmonisation/all_mismatched_hits.tsv`
@luispedro
Copy link
Member

See line comments, but also there are also no uses of the description field

@Vedanth-Ramji
Copy link
Member Author

The description field for some genes has information on their origins and links to papers, e.g. https://academic.oup.com/femsle/article-abstract/203/1/49/478714?redirectedFrom=fulltext for the fmtc gene in deeparg_curation. It's also really important in resfinder_curation where I've documented exactly which genes (and their AROs) are present in gene clusters.

I do realize that the description field really isn't useful for some other manual curation files like the one for resfinderfg, however, if mappings do change and more manual curation is required, I might have to add more description for manually curated genes.

If I understand your comment, then you'd like me to remove the Description column altogether, right?

@luispedro
Copy link
Member

If I understand your comment, then you'd like me to remove the Description column altogether, right?

No, quite the contrary. Expand its use

@Vedanth-Ramji
Copy link
Member Author

Got it, I'll add more details throughout.

@Vedanth-Ramji
Copy link
Member Author

Also, besides these comments, I can't see any other line comments. I didn't receive anything in my email as well, am I missing something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants