Skip to content

Commit

Permalink
DOC: add docs for HamronizationNormalizer
Browse files Browse the repository at this point in the history
  • Loading branch information
Vedanth-Ramji committed Jan 30, 2025
1 parent 7555e0c commit d95dc22
Show file tree
Hide file tree
Showing 5 changed files with 89 additions and 118 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@

## Unreleased

#### Added `HamronizationNormalizer`
- Removed the `is_hamronized` property for all normalizers and removed `--hamronized` flag for CLI.
- All hamronized results now go through the `HamronizationNormalizer` class.
- HamronizationNormalizer reads a hamronized file line by line, procures input genes, and loads all ARO mapping tables to support hamronized results that combine the outputs from multiple tools and databases.
- For CLI hamronization commands will look like:
```bash
argnorm hamronization -i PATH_TO_INPUT -o PATH_TO_OUTPUT
```

> Note: Updated preprocessing of resfinder genes. Concatenating entries from 'gene_name' and 'reference_accession' in hamronized results to form input genes for HamronizationNormalizer. While this improves ARO mapping accuracy (previously only `gene_symbol` was used and several genes can have the same `gene_symbol`), this simplifies preprocessing of resfinder inputs (if `gene_symbol` is used, two different preprocessing functions are required for `resfinder` and `abricate` for resfinder db).
#### Update `confers_resistance_to()` to use `regulates`, `part_of`, and `participates_in` ARO relationships
Previously, argNorm used the `is_a` ARO relationship along with `confers_resistance_to_drug_class` and `confers_resistance_to_antibiotic` to map ARGs to the drugs they confer resistance to. While this worked well for most genes, some ARGs such as those coding for efflux pumps/proteins (e.g. `ARO:3003548`, `ARO:3000826`, `ARO:3003066`) were previously not mapped to any drugs. This is because none of their superclasses mapped to drugs/antibiotics via `confers_resistance_to_antibiotic` or `confers_resistance_to_drug_class`. However, these genes were related to other ARGs that did map to drugs via the `regulates`, `part_of`, or `participates_in` ARO relationships. argNorm now also utilizes these three relationships to ensure that even if the superclasses (derived using `is_a`) of an ARG don't map to a drug, the gene can be assigned a drug mapping.

Expand Down
54 changes: 23 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,18 +51,18 @@ The `resistance_to_drug_classes` column will contain ARO numbers of the broader
If you use argNorm in a publication, please cite the preprint:
> Ugarcina Perovic S, Ramji V et al. argNorm: Normalization of Antibiotic Resistance Gene Annotations to the Antibiotic Resistance Ontology (ARO). Queensland University of Technology ePrints, 2024. DOI: https://doi.org/10.5204/rep.eprints.252448 [Preprint] (Under review).
## Supported tools and databases
## Supported ARG annotation tools and databases

| ARG database | Tool for ARG annotation |
| ---------------------------------- | ------------------------------------------------------- |
| ARG-ANNOT v5.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
| DeepARG v2 | [DeepARG v1.0.2](https://bench.cs.vt.edu/deeparg) |
| Groot v1.1.2 | [GROOT v1.1.2](https://github.com/will-rowe/groot) |
| MEGARes v3.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
| NCBI Reference Gene Database v3.12 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [AMRFinderPlus v3.10.30](https://github.com/ncbi/amr) |
| ResFinder v4.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [ResFinder v4.0](https://bitbucket.org/genomicepidemiology/resfinder/src/master/) |
| ResFinderFG v2.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) |
| SARG (reads mode) v3.2.1 | [ARGs-OAP v2.3](https://galaxyproject.org/use/args-oap/) |
| ARG-ANNOT v5.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| DeepARG v2 | [DeepARG v1.0.2](https://bench.cs.vt.edu/deeparg) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| Groot v1.1.2 | [GROOT v1.1.2](https://github.com/will-rowe/groot) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| MEGARes v3.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| NCBI Reference Gene Database v3.12 | [ABRicate v1.0.1](https://github.com/tseemann/abricate), [AMRFinderPlus v3.10.30](https://github.com/ncbi/amr), & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| ResFinder v4.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate), [ResFinder v4.0](https://bitbucket.org/genomicepidemiology/resfinder/src/master/), & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| ResFinderFG v2.0 | [ABRicate v1.0.1](https://github.com/tseemann/abricate) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |
| SARG (reads mode) v3.2.1 | [ARGs-OAP v2.3](https://galaxyproject.org/use/args-oap/) & [hAMRonization](https://github.com/pha4ge/hAMRonization) |

- Note: ARG database and ARG annotation tool versions can change. argNorm is only intended for supported versions listed above.
- Note: the argNorm tool will be periodically updated to support the latest versions of databases and annotation tools if they undergo significant changes.
Expand Down Expand Up @@ -98,7 +98,7 @@ argNorm is readily available in the funcscan pipeline which can be accessed (her
Here is a basic outline of calling argNorm.

```bash
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv] [--hamronized (if hAMRonization used)]
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv]
```

### `tool` (required)
Expand All @@ -109,6 +109,7 @@ The most important ***required positional*** argument is `tool` (see [here](#sup
- `resfinder`
- `amrfinderplus`
- `groot`
- `hamronization`

### I/O (required)
- `-i` or `--input`: path to the annotation result
Expand All @@ -135,31 +136,26 @@ ARG annotation tools can use several ARG databases for annotation. Hence, the `t
| `resfinder` | Not required |
| `amrfinderplus` | Not required |
| `groot` | Any from `groot-argannot`, `groot-resfinder`, `groot-db`, `groot-core-db`, or `groot-card` |

### `--hamronized` (optional)
Use this if the input is hamronized by [hAMRonization](https://github.com/pha4ge/hAMRonization)
| `hamronization` | Not required |

### `-h` or `--help`
Use `argnorm -h` or `argnorm --help` to see available options.

```bash
>argnorm -h
usage: argnorm [-h]
[--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}]
[--hamronized] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot}
usage: argnorm [-h] [--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}

argNorm normalizes ARG annotation results from different tools and databases to the same ontology, namely ARO (Antibiotic Resistance Ontology).

positional arguments:
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot}
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
The tool you used to do ARG annotation.

optional arguments:
options:
-h, --help show this help message and exit
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}
The database you used to do ARG annotation.
--hamronized Use this if the input is hamronized (processed using the hAMRonization tool)
-i INPUT, --input INPUT
The annotation result you have
-o OUTPUT, --output OUTPUT
Expand Down Expand Up @@ -209,23 +205,19 @@ argnorm -h
```
> argnorm -h
usage: argnorm [-h]
[--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg}]
[--hamronized] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus}
usage: argnorm [-h] [--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}] [-i INPUT] [-o OUTPUT]
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}

argNorm normalizes ARG annotation results from different tools and databases to the same ontology, namely ARO (Antibiotic Resistance Ontology).

positional arguments:
{argsoap,abricate,deeparg,resfinder,amrfinderplus}
{argsoap,abricate,deeparg,resfinder,amrfinderplus,groot,hamronization}
The tool you used to do ARG annotation.

optional arguments:
options:
-h, --help show this help message and exit
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg}
--db {sarg,ncbi,resfinder,deeparg,megares,argannot,resfinderfg,groot-argannot,groot-resfinder,groot-db,groot-core-db,groot-card}
The database you used to do ARG annotation.
--hamronized Use this if the input is hamronized (processed using
the hAMRonization tool)
-i INPUT, --input INPUT
The annotation result you have
-o OUTPUT, --output OUTPUT
Expand Down Expand Up @@ -257,10 +249,10 @@ wget https://raw.githubusercontent.com/BigDataBiology/argNorm/main/examples/raw/
Here is a basic outline of most argNorm commands:
```bash
argnorm [tool] -i [original_annotation.tsv] -o [argnorm_result.tsv] [--hamronized]
argnorm [tool] -i [original_annotation.tsv] -o [argnorm_result.tsv] [--db]
```

Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--hamronized` is an option to indicate if the input data is a result of using the [hAMRonization package](https://github.com/pha4ge/hAMRonization). In our example, the input data is not a result of using the hAMRonization package, and so the `--hamronized` option can be omitted.
Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--db` is the ARG databases used along with `tool` to perform annotation. ResFinder does not require a `--db` (argNorm will automatically load up the ResFinder database), however, `--db` is required for the ARG annotation tools `groot` and `abricate`.


To run argNorm on the input data, use this command in your terminal:
Expand Down
27 changes: 7 additions & 20 deletions docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,8 @@ print(drugs_to_drug_classes(['ARO:0000030', 'ARO:0000051', 'ARO:0000069', 'ARO:3

Normalizers classes for specific tools which normalize ARG annotation outputs. Same functionality as CLI.

All normalizers have 2 parameters:
All normalizers have 1 optional parameter:
* database (str): name of database. Can be: argannot, deeparg, megares, ncbi, resfinderfg, sarg, groot-db, groot-core-db, groot-card, groot-argannot, and groot-resfinder.
* is_hamronized (bool, False by default): whether or not the ARG annotation output has been processed by the hamronization package.

> Note: the database parameter only needs to be specified for AbricateNormalizer and GrootNormalizer. ncbi, deeparg, resfinder, sarg, megares, argannot, resfinderfg are the supported databases for AbricateNormalizer and groot-db, groot-core-db, groot-argannot, groot-resfinder, and groot-card are the supported databases for GrootNormalizer.
Expand All @@ -97,6 +96,7 @@ Available normalizers:
* argnorm.normalizers.AMRFinderPlusNormalizer
* argnorm.normalizers.AbricateNormalizer
* argnorm.normalizers.GrootNormalizer
* argnorm.normalizers.HamronizationNormalizer

### Methods

Expand Down Expand Up @@ -128,18 +128,7 @@ resfinder_normalizer.run('./resfinder.resfinder.orfs.tsv').to_csv('./resfinder.r

This will create a file called `resfinder.resfinder.orfs.normed.tsv` with ARO mappings and drug categorization.

### Example 2: using AbricteNormalizer with the ResFinderFG database

The database parameter needs to be specified for the AbricateNormalizer. Supported databases are:
* `ncbi`
* `deeparg`
* `resfinder`
* `sarg`
* `megares`
* `argannot`
* `resfinderfg`

For this example, we will run the AbricateNormalizer with the [`resfinderfg` database option](https://www.big-data-biology.org/paper/2022_resfinderfgv2/).
### Example 2: using HamronizationNormalizer

Download the sample data [here](https://raw.githubusercontent.com/BigDataBiology/argNorm/7ee9d74c9fa51956ecb7706fa979cc0696ae305d/examples/hamronized/abricate.resfinderfg.tsv), and store it in a folder called `argnorm_normalizers_tutorial`.

Expand All @@ -151,13 +140,11 @@ wget https://raw.githubusercontent.com/BigDataBiology/argNorm/7ee9d74c9fa51956ec

Save the following piece of Python code in the `argnorm_normalizers_tutorial` folder, and run the script.

> Note: the data is hamronized, and so the `is_hamronized` parameter should be set to `True`.
```
from argnorm.normalizers import AbricateNormalizer
from argnorm.normalizers import HamronizationNormalizer
abricate_normalizer = AbricateNormalizer(database='resfinderfg', is_hamronized=True)
abricate_normalizer.run('./abricate.resfinderfg.tsv').to_csv('./abricate.resfinderfg.normed.tsv', sep='\t')
normalizer = HamronizationNormalizer()
normalizer.run('./abricate.resfinderfg.tsv').to_csv('./abricate.resfinderfg.normed.tsv', sep='\t')
```

This will create a file called `abricate.resfinderfg.normed.tsv` with ARO mappings and drug categorization.
This will create a file called `abricate.resfinderfg.normed.tsv` with ARO mappings and drug categorization.
Loading

0 comments on commit d95dc22

Please sign in to comment.