Skip to content

Commit d95dc22

Browse files
committed
DOC: add docs for HamronizationNormalizer
1 parent 7555e0c commit d95dc22

File tree

5 files changed

+89
-118
lines changed

5 files changed

+89
-118
lines changed

CHANGELOG.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22

33
## Unreleased
44

5+
#### Added `HamronizationNormalizer`
6+
- Removed the `is_hamronized` property for all normalizers and removed `--hamronized` flag for CLI.
7+
- All hamronized results now go through the `HamronizationNormalizer` class.
8+
- HamronizationNormalizer reads a hamronized file line by line, procures input genes, and loads all ARO mapping tables to support hamronized results that combine the outputs from multiple tools and databases.
9+
- For CLI hamronization commands will look like:
10+
```bash
11+
argnorm hamronization -i PATH_TO_INPUT -o PATH_TO_OUTPUT
12+
```
13+
14+
> Note: Updated preprocessing of resfinder genes. Concatenating entries from 'gene_name' and 'reference_accession' in hamronized results to form input genes for HamronizationNormalizer. While this improves ARO mapping accuracy (previously only `gene_symbol` was used and several genes can have the same `gene_symbol`), this simplifies preprocessing of resfinder inputs (if `gene_symbol` is used, two different preprocessing functions are required for `resfinder` and `abricate` for resfinder db).
15+
516
#### Update `confers_resistance_to()` to use `regulates`, `part_of`, and `participates_in` ARO relationships
617
Previously, argNorm used the `is_a` ARO relationship along with `confers_resistance_to_drug_class` and `confers_resistance_to_antibiotic` to map ARGs to the drugs they confer resistance to. While this worked well for most genes, some ARGs such as those coding for efflux pumps/proteins (e.g. `ARO:3003548`, `ARO:3000826`, `ARO:3003066`) were previously not mapped to any drugs. This is because none of their superclasses mapped to drugs/antibiotics via `confers_resistance_to_antibiotic` or `confers_resistance_to_drug_class`. However, these genes were related to other ARGs that did map to drugs via the `regulates`, `part_of`, or `participates_in` ARO relationships. argNorm now also utilizes these three relationships to ensure that even if the superclasses (derived using `is_a`) of an ARG don't map to a drug, the gene can be assigned a drug mapping.
718

README.md

Lines changed: 23 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -51,18 +51,18 @@ The `resistance_to_drug_classes` column will contain ARO numbers of the broader
5151
If you use argNorm in a publication, please cite the preprint:
5252
> Ugarcina Perovic S, Ramji V et al. argNorm: Normalization of Antibiotic Resistance Gene Annotations to the Antibiotic Resistance Ontology (ARO). Queensland University of Technology ePrints, 2024. DOI: https://doi.org/10.5204/rep.eprints.252448 [Preprint] (Under review).
5353
54-
## Supported tools and databases
54+
## Supported ARG annotation tools and databases
5555

5656
| ARG database | Tool for ARG annotation |
5757
| ---------------------------------- | ------------------------------------------------------- |
58-
| ARG-ANNOT v5.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
59-
| DeepARG v2 | [DeepARG v1.0.2](https://bench.cs.vt.edu/deeparg) |
60-
| Groot v1.1.2 | [GROOT v1.1.2](https://github.com/will-rowe/groot) |
61-
| MEGARes v3.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
62-
| NCBI Reference Gene Database v3.12 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [AMRFinderPlus v3.10.30](https://github.com/ncbi/amr) |
63-
| ResFinder v4.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [ResFinder v4.0](https://bitbucket.org/genomicepidemiology/resfinder/src/master/) |
64-
| ResFinderFG v2.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
65-
| SARG (reads mode) v3.2.1 | [ARGs-OAP v2.3](https://galaxyproject.org/use/args-oap/) |
58+
| ARG-ANNOT v5.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
59+
| DeepARG v2 | [DeepARG v1.0.2](https://bench.cs.vt.edu/deeparg) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
60+
| Groot v1.1.2 | [GROOT v1.1.2](https://github.com/will-rowe/groot) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
61+
| MEGARes v3.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
62+
| NCBI Reference Gene Database v3.12 | [ABRicate v1.0.1](https://github.com/tseemann/abricate), [AMRFinderPlus v3.10.30](https://github.com/ncbi/amr), & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
63+
| ResFinder v4.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate), [ResFinder v4.0](https://bitbucket.org/genomicepidemiology/resfinder/src/master/), & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
64+
| ResFinderFG v2.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
65+
| SARG (reads mode) v3.2.1 | [ARGs-OAP v2.3](https://galaxyproject.org/use/args-oap/) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
6666

6767
- Note: ARG database and ARG annotation tool versions can change. argNorm is only intended for supported versions listed above.
6868
- Note: the argNorm tool will be periodically updated to support the latest versions of databases and annotation tools if they undergo significant changes.
@@ -98,7 +98,7 @@ argNorm is readily available in the funcscan pipeline which can be accessed (her
9898
Here is a basic outline of calling argNorm.
9999

100100
```bash
101-
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv] [--hamronized (if hAMRonization used)]
101+
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv]
102102
```
103103

104104
### `tool` (required)
@@ -109,6 +109,7 @@ The most important ***required positional*** argument is `tool` (see [here](#sup
109109
- `resfinder`
110110
- `amrfinderplus`
111111
- `groot`
112+
- `hamronization`
112113

113114
### I/O (required)
114115
- `-i` or `--input`: path to the annotation result
@@ -135,31 +136,26 @@ ARG annotation tools can use several ARG databases for annotation. Hence, the `t
135136
| `resfinder` | Not required |
136137
| `amrfinderplus` | Not required |
137138
| `groot` | Any from `groot-argannot`, `groot-resfinder`, `groot-db`, `groot-core-db`, or `groot-card` |
138-
139-
### `--hamronized` (optional)
140-
Use this if the input is hamronized by [hAMRonization](https://github.com/pha4ge/hAMRonization)
139+
| `hamronization` | Not required |
141140

142141
### `-h` or `--help`
143142
Use `argnorm -h` or `argnorm --help` to see available options.
144143

145144
```bash
146145
>argnorm -h
147-
usage: argnorm [-h]
148-
[--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}]
149-
[--hamronized] [-i INPUT] [-o OUTPUT]
150-
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot}
146+
usage: argnorm [-h] [--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}] [-i INPUT] [-o OUTPUT]
147+
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
151148

152149
argNorm normalizes ARG annotation results from different tools and databases to the same ontology, namely ARO (Antibiotic Resistance Ontology).
153150

154151
positional arguments:
155-
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot}
152+
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
156153
The tool you used to do ARG annotation.
157154

158-
optional arguments:
155+
options:
159156
-h, --help show this help message and exit
160157
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}
161158
The database you used to do ARG annotation.
162-
--hamronized Use this if the input is hamronized (processed using the hAMRonization tool)
163159
-i INPUT, --input INPUT
164160
The annotation result you have
165161
-o OUTPUT, --output OUTPUT
@@ -209,23 +205,19 @@ argnorm -h
209205
210206
```
211207
> argnorm -h
212-
usage: argnorm [-h]
213-
[--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg}]
214-
[--hamronized] [-i INPUT] [-o OUTPUT]
215-
{argsoap,abricate,deeparg,resfinder,amrfinderplus}
208+
usage: argnorm [-h] [--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}] [-i INPUT] [-o OUTPUT]
209+
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
216210

217211
argNorm normalizes ARG annotation results from different tools and databases to the same ontology, namely ARO (Antibiotic Resistance Ontology).
218212

219213
positional arguments:
220-
{argsoap,abricate,deeparg,resfinder,amrfinderplus}
214+
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
221215
The tool you used to do ARG annotation.
222216

223-
optional arguments:
217+
options:
224218
-h, --help show this help message and exit
225-
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg}
219+
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}
226220
The database you used to do ARG annotation.
227-
--hamronized Use this if the input is hamronized (processed using
228-
the hAMRonization tool)
229221
-i INPUT, --input INPUT
230222
The annotation result you have
231223
-o OUTPUT, --output OUTPUT
@@ -257,10 +249,10 @@ wget https://raw.githubusercontent.com/BigDataBiology/argNorm/main/examples/raw/
257249
Here is a basic outline of most argNorm commands:
258250
259251
```bash
260-
argnorm [tool] -i [original_annotation.tsv] -o [argnorm_result.tsv] [--hamronized]
252+
argnorm [tool] -i [original_annotation.tsv] -o [argnorm_result.tsv] [--db]
261253
```
262254

263-
Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--hamronized` is an option to indicate if the input data is a result of using the [hAMRonization package](https://github.com/pha4ge/hAMRonization). In our example, the input data is not a result of using the hAMRonization package, and so the `--hamronized` option can be omitted.
255+
Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--db` is the ARG databases used along with `tool` to perform annotation. ResFinder does not require a `--db` (argNorm will automatically load up the ResFinder database), however, `--db` is required for the ARG annotation tools `groot` and `abricate`.
264256

265257

266258
To run argNorm on the input data, use this command in your terminal:

docs/api.md

Lines changed: 7 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -84,9 +84,8 @@ print(drugs_to_drug_classes(['ARO:0000030', 'ARO:0000051', 'ARO:0000069', 'ARO:3
8484

8585
Normalizers classes for specific tools which normalize ARG annotation outputs. Same functionality as CLI.
8686

87-
All normalizers have 2 parameters:
87+
All normalizers have 1 optional parameter:
8888
* database (str): name of database. Can be: argannot, deeparg, megares, ncbi, resfinderfg, sarg, groot-db, groot-core-db, groot-card, groot-argannot, and groot-resfinder.
89-
* is_hamronized (bool, False by default): whether or not the ARG annotation output has been processed by the hamronization package.
9089

9190
> Note: the database parameter only needs to be specified for AbricateNormalizer and GrootNormalizer. ncbi, deeparg, resfinder, sarg, megares, argannot, resfinderfg are the supported databases for AbricateNormalizer and groot-db, groot-core-db, groot-argannot, groot-resfinder, and groot-card are the supported databases for GrootNormalizer.
9291
@@ -97,6 +96,7 @@ Available normalizers:
9796
* argnorm.normalizers.AMRFinderPlusNormalizer
9897
* argnorm.normalizers.AbricateNormalizer
9998
* argnorm.normalizers.GrootNormalizer
99+
* argnorm.normalizers.HamronizationNormalizer
100100

101101
### Methods
102102

@@ -128,18 +128,7 @@ resfinder_normalizer.run('./resfinder.resfinder.orfs.tsv').to_csv('./resfinder.r
128128

129129
This will create a file called `resfinder.resfinder.orfs.normed.tsv` with ARO mappings and drug categorization.
130130

131-
### Example 2: using AbricteNormalizer with the ResFinderFG database
132-
133-
The database parameter needs to be specified for the AbricateNormalizer. Supported databases are:
134-
* `ncbi`
135-
* `deeparg`
136-
* `resfinder`
137-
* `sarg`
138-
* `megares`
139-
* `argannot`
140-
* `resfinderfg`
141-
142-
For this example, we will run the AbricateNormalizer with the [`resfinderfg` database option](https://www.big-data-biology.org/paper/2022_resfinderfgv2/).
131+
### Example 2: using HamronizationNormalizer
143132

144133
Download the sample data [here](https://raw.githubusercontent.com/BigDataBiology/argNorm/7ee9d74c9fa51956ecb7706fa979cc0696ae305d/examples/hamronized/abricate.resfinderfg.tsv), and store it in a folder called `argnorm_normalizers_tutorial`.
145134

@@ -151,13 +140,11 @@ wget https://raw.githubusercontent.com/BigDataBiology/argNorm/7ee9d74c9fa51956ec
151140

152141
Save the following piece of Python code in the `argnorm_normalizers_tutorial` folder, and run the script.
153142

154-
> Note: the data is hamronized, and so the `is_hamronized` parameter should be set to `True`.
155-
156143
```
157-
from argnorm.normalizers import AbricateNormalizer
144+
from argnorm.normalizers import HamronizationNormalizer
158145
159-
abricate_normalizer = AbricateNormalizer(database='resfinderfg', is_hamronized=True)
160-
abricate_normalizer.run('./abricate.resfinderfg.tsv').to_csv('./abricate.resfinderfg.normed.tsv', sep='\t')
146+
normalizer = HamronizationNormalizer()
147+
normalizer.run('./abricate.resfinderfg.tsv').to_csv('./abricate.resfinderfg.normed.tsv', sep='\t')
161148
```
162149

163-
This will create a file called `abricate.resfinderfg.normed.tsv` with ARO mappings and drug categorization.
150+
This will create a file called `abricate.resfinderfg.normed.tsv` with ARO mappings and drug categorization.

0 commit comments

Comments
 (0)