You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+11Lines changed: 11 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,17 @@
2
2
3
3
## Unreleased
4
4
5
+
#### Added `HamronizationNormalizer`
6
+
- Removed the `is_hamronized` property for all normalizers and removed `--hamronized` flag for CLI.
7
+
- All hamronized results now go through the `HamronizationNormalizer` class.
8
+
- HamronizationNormalizer reads a hamronized file line by line, procures input genes, and loads all ARO mapping tables to support hamronized results that combine the outputs from multiple tools and databases.
> Note: Updated preprocessing of resfinder genes. Concatenating entries from 'gene_name' and 'reference_accession' in hamronized results to form input genes for HamronizationNormalizer. While this improves ARO mapping accuracy (previously only `gene_symbol` was used and several genes can have the same `gene_symbol`), this simplifies preprocessing of resfinder inputs (if `gene_symbol` is used, two different preprocessing functions are required for `resfinder` and `abricate` for resfinder db).
15
+
5
16
#### Update `confers_resistance_to()` to use `regulates`, `part_of`, and `participates_in` ARO relationships
6
17
Previously, argNorm used the `is_a` ARO relationship along with `confers_resistance_to_drug_class` and `confers_resistance_to_antibiotic` to map ARGs to the drugs they confer resistance to. While this worked well for most genes, some ARGs such as those coding for efflux pumps/proteins (e.g. `ARO:3003548`, `ARO:3000826`, `ARO:3003066`) were previously not mapped to any drugs. This is because none of their superclasses mapped to drugs/antibiotics via `confers_resistance_to_antibiotic` or `confers_resistance_to_drug_class`. However, these genes were related to other ARGs that did map to drugs via the `regulates`, `part_of`, or `participates_in` ARO relationships. argNorm now also utilizes these three relationships to ensure that even if the superclasses (derived using `is_a`) of an ARG don't map to a drug, the gene can be assigned a drug mapping.
Copy file name to clipboardExpand all lines: README.md
+23-31Lines changed: 23 additions & 31 deletions
Original file line number
Diff line number
Diff line change
@@ -51,18 +51,18 @@ The `resistance_to_drug_classes` column will contain ARO numbers of the broader
51
51
If you use argNorm in a publication, please cite the preprint:
52
52
> Ugarcina Perovic S, Ramji V et al. argNorm: Normalization of Antibiotic Resistance Gene Annotations to the Antibiotic Resistance Ontology (ARO). Queensland University of Technology ePrints, 2024. DOI: https://doi.org/10.5204/rep.eprints.252448[Preprint] (Under review).
- Note: ARG database and ARG annotation tool versions can change. argNorm is only intended for supported versions listed above.
68
68
- Note: the argNorm tool will be periodically updated to support the latest versions of databases and annotation tools if they undergo significant changes.
@@ -98,7 +98,7 @@ argNorm is readily available in the funcscan pipeline which can be accessed (her
98
98
Here is a basic outline of calling argNorm.
99
99
100
100
```bash
101
-
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv] [--hamronized (if hAMRonization used)]
101
+
argnorm [tool] [--db] -i [path to original_annotation.tsv] -o [path to annotation_result_with_aro.tsv]
102
102
```
103
103
104
104
### `tool` (required)
@@ -109,6 +109,7 @@ The most important ***required positional*** argument is `tool` (see [here](#sup
109
109
-`resfinder`
110
110
-`amrfinderplus`
111
111
-`groot`
112
+
-`hamronization`
112
113
113
114
### I/O (required)
114
115
-`-i` or `--input`: path to the annotation result
@@ -135,31 +136,26 @@ ARG annotation tools can use several ARG databases for annotation. Hence, the `t
135
136
|`resfinder`| Not required |
136
137
|`amrfinderplus`| Not required |
137
138
|`groot`| Any from `groot-argannot`, `groot-resfinder`, `groot-db`, `groot-core-db`, or `groot-card`|
138
-
139
-
### `--hamronized` (optional)
140
-
Use this if the input is hamronized by [hAMRonization](https://github.com/pha4ge/hAMRonization)
139
+
|`hamronization`| Not required |
141
140
142
141
### `-h` or `--help`
143
142
Use `argnorm -h` or `argnorm --help` to see available options.
Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--hamronized` is an option to indicate if the input data is a result of using the [hAMRonization package](https://github.com/pha4ge/hAMRonization). In our example, the input data is not a result of using the hAMRonization package, and so the `--hamronized` option can be omitted.
255
+
Here, `tool` refers to the ARG annotation tool used (ResFinder in this case). `original_annotation.tsv` is the path to the input data and `argnorm_result.tsv` is the path to output file where the resulting table from argNorm will be stored. `--db` is the ARG databases used along with `tool` to perform annotation. ResFinder does not require a `--db` (argNorm will automatically load up the ResFinder database), however, `--db` is required for the ARG annotation tools `groot` and `abricate`.
264
256
265
257
266
258
To run argNorm on the input data, use this command in your terminal:
Normalizers classes for specific tools which normalize ARG annotation outputs. Same functionality as CLI.
86
86
87
-
All normalizers have 2 parameters:
87
+
All normalizers have 1 optional parameter:
88
88
* database (str): name of database. Can be: argannot, deeparg, megares, ncbi, resfinderfg, sarg, groot-db, groot-core-db, groot-card, groot-argannot, and groot-resfinder.
89
-
* is_hamronized (bool, False by default): whether or not the ARG annotation output has been processed by the hamronization package.
90
89
91
90
> Note: the database parameter only needs to be specified for AbricateNormalizer and GrootNormalizer. ncbi, deeparg, resfinder, sarg, megares, argannot, resfinderfg are the supported databases for AbricateNormalizer and groot-db, groot-core-db, groot-argannot, groot-resfinder, and groot-card are the supported databases for GrootNormalizer.
This will create a file called `resfinder.resfinder.orfs.normed.tsv` with ARO mappings and drug categorization.
130
130
131
-
### Example 2: using AbricteNormalizer with the ResFinderFG database
132
-
133
-
The database parameter needs to be specified for the AbricateNormalizer. Supported databases are:
134
-
*`ncbi`
135
-
*`deeparg`
136
-
*`resfinder`
137
-
*`sarg`
138
-
*`megares`
139
-
*`argannot`
140
-
*`resfinderfg`
141
-
142
-
For this example, we will run the AbricateNormalizer with the [`resfinderfg` database option](https://www.big-data-biology.org/paper/2022_resfinderfgv2/).
131
+
### Example 2: using HamronizationNormalizer
143
132
144
133
Download the sample data [here](https://raw.githubusercontent.com/BigDataBiology/argNorm/7ee9d74c9fa51956ecb7706fa979cc0696ae305d/examples/hamronized/abricate.resfinderfg.tsv), and store it in a folder called `argnorm_normalizers_tutorial`.
0 commit comments