Skip to content

Commit 8ebfb35

Browse files
Vedanth-Ramjiluispedro
authored andcommitted
BUG: account for genes being mapped directly to drug classes without intermediate drugs in confers_resistance_to().
Whenever genes were mapped to drug classes (i.e. immediate child of antibiotic molecule), `confers_resistance_to()` would store these mappings in a temporary list (`backup_drugs`) and only use them if the ARG wasn't being mapped to any other drug. This strategy fails if an ARG is mapped to a drug class and a drug that falls under another drug class (not the same as the one being directly mapped to the ARG). In this case, the ARG is only mapped to the drug and the mapped drug class information is lost. Now, `confers_resistance_to()` does not utilize `backup_drugs`. Rather, it maps ARGs to every possible drug and drug class. It then iterates over these drugs and drug classes to check if any of the drugs are children nodes of mapped drug classes. Mapped drug classes that are the parents of mapped drugs are removed (they will be restored by `drugs_to_drug_classes()`).
1 parent 2b064e2 commit 8ebfb35

13 files changed

+371
-358
lines changed

argnorm/drug_categorization.py

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def _get_drug_classes(super_classes_list: List[str]) -> List[str]:
3030

3131
return output
3232

33-
def confers_resistance_to(aro_num: str) -> List[str]:
33+
def _get_drugs(aro_num: str) -> List[str]:
3434
'''
3535
Description: Returns a list of the drugs/antibiotics to which a gene confers resistance to.
3636
@@ -41,34 +41,45 @@ def confers_resistance_to(aro_num: str) -> List[str]:
4141
target (list[str]):
4242
A list with ARO number of the drugs/antibiotics to which the input gene confers resistance to.
4343
'''
44-
# some gene superclasses can map to drugs which are immediate children of 'antibiotic molecule'
45-
# only use these if no other drugs can be found, as this information will be present in drugs_to_drug_classes
4644

47-
backup_drugs = []
4845
target = set()
4946

5047
for superclass in ARO[aro_num].superclasses():
5148
for drug in ARO[superclass.id].relationships.get(confers_resistance_to_drug_class_rel, []):
52-
if list(ARO[drug.id].superclasses())[1:] == antibiotic_molecule_node:
53-
backup_drugs.append(drug.id)
54-
else:
55-
target.add(drug.id)
49+
target.add(drug.id)
5650

5751
for drug in ARO[superclass.id].relationships.get(confers_resistance_to_antibiotic_rel, []):
58-
if list(ARO[drug.id].superclasses())[1:] == antibiotic_molecule_node:
59-
backup_drugs.append(drug.id)
60-
else:
61-
target.add(drug.id)
52+
target.add(drug.id)
6253

6354
for rel in [regulates_rel, participates_in_rel, part_of_rel]:
6455
for term in superclass.relationships.get(rel, []):
65-
target.update(confers_resistance_to(term.id))
66-
67-
if not target:
68-
target.update(backup_drugs)
56+
target.update(_get_drugs(term.id))
6957

7058
return sorted(target)
7159

60+
def confers_resistance_to(aro_num: str) -> List[str]:
61+
# some gene superclasses can map to drugs which are immediate children of 'antibiotic molecule'
62+
# only use these if no other drugs can be found, as this information will be present in drugs_to_drug_classes
63+
64+
drugs = set(_get_drugs(aro_num))
65+
66+
drug_classes = set()
67+
for drug in drugs:
68+
if list(ARO[drug].superclasses())[1:] == antibiotic_molecule_node:
69+
drug_classes.add(drug)
70+
71+
drugs = drugs - drug_classes
72+
redundant_drug_classes = set()
73+
for drug in drugs:
74+
for drug_class in drug_classes:
75+
if ARO[drug_class] in list(ARO[drug].superclasses())[1:]:
76+
redundant_drug_classes.add(drug_class)
77+
78+
drug_classes = drug_classes - redundant_drug_classes
79+
drugs.update(drug_classes)
80+
81+
return sorted(drugs)
82+
7283
def drugs_to_drug_classes(drugs_list: List[str]) -> List[str]:
7384
'''
7485
Description: Returns a list of categories of drug classes, e.g. cephem and penam are categorized as beta_lactam antibiotics.

outputs/hamronized/abricate.argannot.tsv

Lines changed: 66 additions & 66 deletions
Large diffs are not rendered by default.

outputs/hamronized/abricate.megares.tsv

Lines changed: 71 additions & 71 deletions
Large diffs are not rendered by default.

outputs/hamronized/abricate.resfinder.tsv

Lines changed: 31 additions & 31 deletions
Large diffs are not rendered by default.

outputs/hamronized/abricate.resfinderfg.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3953,7 +3953,7 @@ Unnamed: 0 input_file_name gene_symbol gene_name reference_database_id reference
39533953
3950 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 84.47 GMGC10.269_887_786.DACB 598 1434 + 100.0
39543954
3951 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 81.39 GMGC10.270_123_852.DACB 598 1430 + 99.52
39553955
3952 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 81.12 GMGC10.280_419_486.DACB 598 1434 + 100.0
3956-
3953 GMGC10.95nr_block_0294 ABC-F "ABC-F type ribosomal protection protein Msr(E)""|MG585948.1|pharmaceutical_effluent|AZM" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 99.93 GMGC10.280_792_741.UNKNOWN 1 1476 + 100.0 ARO:3003109 ARO:0000006 ARO:0000000
3956+
3953 GMGC10.95nr_block_0294 ABC-F "ABC-F type ribosomal protection protein Msr(E)""|MG585948.1|pharmaceutical_effluent|AZM" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 99.93 GMGC10.280_792_741.UNKNOWN 1 1476 + 100.0 ARO:3003109 ARO:0000006,ARO:0000026 ARO:0000000,ARO:0000026
39573957
3954 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 99.04 GMGC10.281_471_119.DACB 637 1473 + 100.0
39583958
3955 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 80.05 GMGC10.281_560_329.DACB 598 1434 + 100.0
39593959
3956 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 80.14 GMGC10.281_806_382.DACB 598 1433 + 99.88

outputs/hamronized/amrfinderplus.ncbi.orfs.tsv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,6 @@ amrfinderplus.ncbi.orfs.tsv bexA multidrug efflux MATE transporter BexA NCBI Ref
1818
amrfinderplus.ncbi.orfs.tsv aad9 ANT(9) family aminoglycoside nucleotidyltransferase NCBI Reference Gene Database 2023-Nov-01 WP_002578722.1 amrfinderplus 3.10.30 gene_presence_detected AMINOGLYCOSIDE 100.0 AMINOGLYCOSIDE 68 841 258 k119_82797 258 - 99.61 ARO:3002630 ARO:0000039 ARO:0000016
1919
amrfinderplus.ncbi.orfs.tsv erm(B) 23S rRNA (adenine(2058)-N(6))-methyltransferase Erm(B) NCBI Reference Gene Database 2023-Nov-01 WP_002292226.1 amrfinderplus 3.10.30 gene_presence_detected MACROLIDE 100.0 MACROLIDE 24650 25384 245 k119_84636 245 - 100.0 ARO:3000375 ARO:0000006,ARO:0000027,ARO:0000046,ARO:0000057,ARO:0000065,ARO:0000066,ARO:3000145,ARO:3000156,ARO:3000158,ARO:3000176,ARO:3000583,ARO:3000584,ARO:3000669,ARO:3000672,ARO:3000673,ARO:3000674,ARO:3000675,ARO:3000677,ARO:3000678,ARO:3000679,ARO:3000680,ARO:3000681,ARO:3000682,ARO:3000867 ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000017,ARO:0000017,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026
2020
amrfinderplus.ncbi.orfs.tsv lnu(AN2) lincosamide nucleotidyltransferase Lnu(AN2) NCBI Reference Gene Database 2023-Nov-01 WP_004308783.1 amrfinderplus 3.10.30 gene_presence_detected LINCOSAMIDE 100.0 LINCOSAMIDE 1830 2339 170 k119_91457 170 - 100.0 ARO:3002835 ARO:0000046,ARO:3007169 ARO:0000017,ARO:0000017
21-
amrfinderplus.ncbi.orfs.tsv mef(En2) macrolide efflux MFS transporter Mef(En2) NCBI Reference Gene Database 2023-Nov-01 WP_063853729.1 amrfinderplus 3.10.30 gene_presence_detected MACROLIDE 100.0 MACROLIDE 2367 3569 401 k119_91457 401 - 99.5 ARO:3004659 ARO:0000046 ARO:0000017
21+
amrfinderplus.ncbi.orfs.tsv mef(En2) macrolide efflux MFS transporter Mef(En2) NCBI Reference Gene Database 2023-Nov-01 WP_063853729.1 amrfinderplus 3.10.30 gene_presence_detected MACROLIDE 100.0 MACROLIDE 2367 3569 401 k119_91457 401 - 99.5 ARO:3004659 ARO:0000000,ARO:0000046 ARO:0000000,ARO:0000017
2222
amrfinderplus.ncbi.orfs.tsv tet(W) tetracycline resistance ribosomal protection protein Tet(W) NCBI Reference Gene Database 2023-Nov-01 WP_002586627.1 amrfinderplus 3.10.30 gene_presence_detected TETRACYCLINE 100.0 TETRACYCLINE 21199 23115 639 k119_9485 639 + 100.0 ARO:3000194 ARO:0000051,ARO:0000069,ARO:3000152,ARO:3000528,ARO:3000667,ARO:3000668 ARO:3000050,ARO:3000050,ARO:3000050,ARO:3000050,ARO:3000050,ARO:3000050
2323
amrfinderplus.ncbi.orfs.tsv catA13 type A-13 chloramphenicol O-acetyltransferase NCBI Reference Gene Database 2023-Nov-01 WP_043774378.1 amrfinderplus 3.10.30 gene_presence_detected CHLORAMPHENICOL 100.0 PHENICOL 1975 2595 207 k119_95290 207 + 100.0 ARO:3004454 ARO:3000385 ARO:3000387

0 commit comments

Comments
 (0)