-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ENH: check RGI mapping accuracy and manually curate incorrect hits
All RGI hits are checked for mapping accuracy using `check_mapping_accuracy.py`. This is done by comparing the drug categorization of the ARO assigned to the ARG with the drug categorization assigned to the ARG by its original database. A list of ARGs with mismatched mappings are included in `manual_curation/`. Mismatched hits are manually curated to correct ARGs and manual curation files are updated. Some mismatched hits are marked `correct` or `incorrect`. `correct` hits have a drug category mismatch but have an ARO mapping that's determined to be correct. `incorrect` hits have a drug category mismatch and a manually curated ARO can't be found for them. Both `correct` and `incorrect` hits should not be included in the final manual curation. For megares, a file called `megares_meta_biocide_and_virulence_genes.tsv` is also created with metal, biocide, and virulence genes. These should not be mapped to any ARO terms and can be manually curated to no ARO. Metal, biocide, and virulence genes are directly added to manual curation files for other databases. Groot is derived from resfinder and argannot. A list of all mismatched hits, with their manual curation can be found in `db_harmonisation/all_mismatched_hits.tsv`
- Loading branch information
1 parent
934e208
commit f59f39b
Showing
23 changed files
with
19,423 additions
and
1,122 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,14 @@ | ||
Original ID ARO Gene Name in CARD Description | ||
(Phe)cpt_strepv:U09991:AAB36569:1412-1948:537 3000249 chloramphenicol phosphotransferase Parent ARO mapping | ||
(Tet)tetH:EF460464:6286-7839:1554 3000175 tet(H) Loose RGI mapping. Mapped incorrectly to ARO:3004797. | ||
(AGly)aadC:V01282:225-701:477 3000225 ANT(6) Parent ARO mapping | ||
Original ID ARO Gene Name in CARD Description | ||
(Phe)cpt_strepv:U09991:AAB36569:1412-1948:537 3000249 chloramphenicol phosphotransferase Parent ARO mapping | ||
(AGly)aadC:V01282:225-701:477 3000225 ANT(6) Parent ARO mapping | ||
(Gly)vanA-G:AY271782:157-606:450 3000010 | ||
(MLS)lmr(A):X59926:318-1763:1446 3003028 | ||
(MLS)mph(D):NC_017312:2291580-2292413:834 3000333 | ||
(Rif)rif:EF541029:530-2170:1641 3004040 | ||
(Tet)tet(34):AB061440:306-770:465 3002870 | ||
(Tet)tetH:EF460464:6286-7839:1554 3000175 | ||
(Tet)tetU:U01917:413-730:318 3004650 | ||
(TetracenomycinC)tcmA:NG_048121:101-1717:1617 3003554 | ||
(Phe)cpt:NG_051909:101-631:531 3000249 | ||
(Fcyn)FomC:AB016934:10868-11656:789 3004246 | ||
(Flq)crpP:NG_062203:WP_033179079:101-298:198 3004467 |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,9 @@ | ||
Original ID ARO Gene Name in CARD Description | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629588.1|pediatric_fecal_sample|CYC 3003970 D-Ala-D-Ala ligase | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629153.1|pediatric_fecal_sample|CYC 3003970 D-Ala-D-Ala ligase | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KJ695568.1|Agricultural soil|CYC 3003970 D-Ala-D-Ala ligase | ||
Original ID ARO Gene Name in CARD Description | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629588.1|pediatric_fecal_sample|CYC 3003970 D-Ala-D-Ala ligase | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629153.1|pediatric_fecal_sample|CYC 3003970 D-Ala-D-Ala ligase | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KJ695568.1|Agricultural soil|CYC 3003970 D-Ala-D-Ala ligase | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KX125757.1|human_gut|CYC 3003970 | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KX125843.1|human_gut|CYC 3003970 | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF627869.1|pediatric_fecal_sample|CYC 3003970 | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF629229.1|pediatric_fecal_sample|CYC 3003970 | ||
UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase|KF630034.1|pediatric_fecal_sample|CYC 3003970 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,5 @@ | ||
Original ID ARO Gene Name in CARD Description | ||
gb|AAG57600.1|ARO:3000318|mphB 3000318 mphB | ||
AM180355.1.gene2260.p01 3000250 ErmC | ||
gb|AUW34359.1|ARO:3004445|RSA-2 3005440 RSA2 beta-lactamase ARO number of RSA2 had been changed. | ||
Original ID ARO Gene Name in CARD Description | ||
gb|AAG57600.1|ARO:3000318|mphB 3000318 mphB | ||
AM180355.1.gene2260.p01 3000250 ErmC | ||
gb|AUW34359.1|ARO:3004445|RSA-2 3005440 RSA2 beta-lactamase ARO number of RSA2 had been changed. | ||
gb|AAB08924.1|ARO:3004650|tetU 3004650 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.