Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Action to annotate MAGs and contigs with AMRFinderPlus #88

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
d33e0cb
added new amrfinder directory and moved types to card directory
VinzentRisch Jun 28, 2024
32e8b7c
dirformat with filecollections
VinzentRisch Jul 1, 2024
d1b2ca6
dirformat with validating all filepaths
VinzentRisch Jul 1, 2024
facc75d
added test data to package data
VinzentRisch Jul 2, 2024
f948195
added amrprot.pot file to git
VinzentRisch Jul 2, 2024
c445800
merge main
VinzentRisch Jul 3, 2024
0670d7e
added new annotation format
VinzentRisch Jul 3, 2024
bfafdb8
added sampledata and feature data dir fmts
VinzentRisch Jul 4, 2024
bb9220c
register all formats
VinzentRisch Jul 4, 2024
317e5cb
using filecollections for the database format
VinzentRisch Jul 4, 2024
71a8da2
merge 80
VinzentRisch Jul 4, 2024
0bf7f20
renamed to dirfmt
VinzentRisch Jul 4, 2024
060f24d
merge 80
VinzentRisch Jul 4, 2024
8378b45
overwrite all pathmakers with code from busco moshpit
VinzentRisch Jul 4, 2024
82a1558
added field to annotation format
VinzentRisch Jul 4, 2024
f42d845
changed name of file in annotation format to allow oter names
VinzentRisch Jul 4, 2024
7e31553
added mags action
VinzentRisch Jul 5, 2024
514688c
registered annotations types in plusgin setup
VinzentRisch Jul 5, 2024
6deb616
Merge branch '85_amrfinderplusannotation_type' into 87_annotate_mags_…
VinzentRisch Jul 5, 2024
a017eeb
changes
VinzentRisch Jul 5, 2024
07e9f52
Revert "overwrite all pathmakers with code from busco moshpit"
VinzentRisch Jul 5, 2024
78c4329
Merge branch '80_amrfinder_database_type' into 85_amrfinderplusannota…
VinzentRisch Jul 5, 2024
09156bb
Merge branch '85_amrfinderplusannotation_type' into 87_annotate_mags_…
VinzentRisch Jul 5, 2024
9237c73
working action
VinzentRisch Jul 5, 2024
34a34b8
removed nested structure of annotaion type
VinzentRisch Jul 5, 2024
c004300
Merge branch '85_amrfinderplusannotation_type' into 87_annotate_mags_…
VinzentRisch Jul 5, 2024
4e38e21
working action with non nested output format
VinzentRisch Jul 5, 2024
16db485
changed magid to id in mags annotaiton
VinzentRisch Jul 8, 2024
8a902ed
moved run fucntion into utils added protein option
VinzentRisch Jul 8, 2024
0cb8492
changed utils to not inlcude _ in filenames
VinzentRisch Jul 9, 2024
4f09ee7
changed type of featuredata one to also include mutations in name
VinzentRisch Jul 9, 2024
f10dcb3
Merge branch '85_amrfinderplusannotation_type' into 87_annotate_mags_…
VinzentRisch Jul 9, 2024
bbaca6e
changed type and path_maker
VinzentRisch Jul 10, 2024
81f0fbb
Merge branch '85_amrfinderplusannotation_type' into 87_annotate_mags_…
VinzentRisch Jul 10, 2024
7577214
added sampledata contigs as input
VinzentRisch Jul 10, 2024
b97152f
added validation positive for emty files
VinzentRisch Jul 10, 2024
ac8f92c
Merge branch '85_amrfinderplusannotation_type' into 87_annotate_mags_…
VinzentRisch Jul 10, 2024
fa019d5
fixed bug in mutations empty file creation
VinzentRisch Jul 10, 2024
86babac
fixed other bug in mutations empty file creation
VinzentRisch Jul 10, 2024
783870d
changed utils protein and nucleotide naming
VinzentRisch Jul 11, 2024
4b9c200
changed utils protein and nucleotide naming 2
VinzentRisch Jul 11, 2024
107c039
Revert "changed utils protein and nucleotide naming"
VinzentRisch Jul 11, 2024
92c8400
Revert "changed utils protein and nucleotide naming 2"
VinzentRisch Jul 11, 2024
3813a2b
changed mag and samplename addition to main function
VinzentRisch Jul 11, 2024
14a3b62
added tests for utils and sample data
VinzentRisch Jul 12, 2024
db37f81
changed the way manifest is loaded
VinzentRisch Jul 12, 2024
1b49424
merge main
VinzentRisch Jul 16, 2024
a0343d9
added database_format_version
VinzentRisch Jul 16, 2024
a1abc26
Merge branch '91_adding_database_format_version' into 87_annotate_mag…
VinzentRisch Jul 16, 2024
06e809a
added cureated_indet as parameter
VinzentRisch Jul 16, 2024
bff9595
bugfix missing parameter
VinzentRisch Jul 16, 2024
26e66a6
Merge branch 'main' into 87_annotate_mags_amrfinderplus
VinzentRisch Jul 16, 2024
c9a12a9
bug parameter added in mocked function
VinzentRisch Jul 16, 2024
b215266
merge main
VinzentRisch Jul 16, 2024
1d94b24
renaming tests
VinzentRisch Jul 16, 2024
f642503
chnages after review
VinzentRisch Jul 17, 2024
e4c6bfd
added s in utils sequneces
VinzentRisch Jul 17, 2024
7a05e70
changed plugin setup description
VinzentRisch Jul 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions q2_amr/amrfinderplus/sample_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
import os
import shutil
import tempfile
from typing import Union

import pandas as pd
from q2_types.genome_data import GenesDirectoryFormat
from q2_types.per_sample_sequences import ContigSequencesDirFmt, MultiMAGSequencesDirFmt

from q2_amr.amrfinderplus.types import (
AMRFinderPlusAnnotationsDirFmt,
AMRFinderPlusDatabaseDirFmt,
)
from q2_amr.amrfinderplus.utils import run_amrfinderplus_n
from q2_amr.card.utils import create_count_table, read_in_txt


def annotate_sample_data_amrfinderplus(
sequences: Union[MultiMAGSequencesDirFmt, ContigSequencesDirFmt],
amrfinderplus_db: AMRFinderPlusDatabaseDirFmt,
organism: str = None,
plus: bool = False,
report_all_equal: bool = False,
ident_min: float = None,
curated_ident: bool = False,
coverage_min: float = 0.5,
translation_table: str = "11",
threads: int = None,
) -> (
AMRFinderPlusAnnotationsDirFmt,
AMRFinderPlusAnnotationsDirFmt,
GenesDirectoryFormat,
pd.DataFrame,
):
annotations = AMRFinderPlusAnnotationsDirFmt()
mutations = AMRFinderPlusAnnotationsDirFmt()
genes = GenesDirectoryFormat()
frequency_list = []

# Create list of paths to all mags or contigs
if isinstance(sequences, MultiMAGSequencesDirFmt):
manifest = sequences.manifest.view(pd.DataFrame)
files = manifest["filename"]
else:
files = [
os.path.join(str(sequences), file) for file in os.listdir(str(sequences))
]

with tempfile.TemporaryDirectory() as tmp:
# Iterate over paths of MAGs or contigs
for file in files:
# Set sample and MAG IDs
if isinstance(sequences, MultiMAGSequencesDirFmt):
index_value = manifest.query("filename == @file").index[0]
sample_id = index_value[0]
mag_id = index_value[1]
else:
sample_id = os.path.splitext(os.path.basename(file))[0][:-8]
mag_id = ""

# Run amrfinderplus
run_amrfinderplus_n(
working_dir=tmp,
amrfinderplus_db=amrfinderplus_db,
dna_sequences=file,
protein_sequences=None,
gff=None,
organism=organism,
plus=plus,
report_all_equal=report_all_equal,
ident_min=ident_min,
curated_ident=curated_ident,
coverage_min=coverage_min,
translation_table=translation_table,
threads=threads,
)

# Create frequency dataframe and append it to list
frequency_df = read_in_txt(
path=os.path.join(tmp, "amr_annotations.tsv"),
samp_bin_name=str(os.path.join(sample_id, mag_id)),
data_type="mags",
colname="Gene symbol",
)
frequency_list.append(frequency_df)

# Move mutations file. If it is not created, create an empty mutations file
des_path_mutations = os.path.join(
str(mutations),
sample_id,
f"{mag_id + '_' if mag_id else ''}amr_mutations.tsv",
)
os.makedirs(os.path.dirname(des_path_mutations), exist_ok=True)
if organism:
shutil.move(os.path.join(tmp, "amr_mutations.tsv"), des_path_mutations)
else:
with open(des_path_mutations, "w"):
pass

# Move annotations file
des_path_annotations = os.path.join(
str(annotations),
sample_id,
f"{mag_id + '_' if mag_id else ''}amr_annotations.tsv",
)
os.makedirs(os.path.dirname(des_path_annotations), exist_ok=True)
shutil.move(os.path.join(tmp, "amr_annotations.tsv"), des_path_annotations)

# Move genes file
shutil.move(
os.path.join(tmp, "amr_genes.fasta"),
os.path.join(
str(genes), f"{mag_id if mag_id else sample_id}_amr_genes.fasta"
),
)

feature_table = create_count_table(df_list=frequency_list)
return (
annotations,
mutations,
genes,
feature_table,
)
101 changes: 101 additions & 0 deletions q2_amr/amrfinderplus/tests/test_sample_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
import os
from unittest.mock import MagicMock, patch

from q2_types.per_sample_sequences import ContigSequencesDirFmt, MultiMAGSequencesDirFmt
from qiime2.plugin.testing import TestPluginBase

from q2_amr.amrfinderplus.sample_data import annotate_sample_data_amrfinderplus
from q2_amr.amrfinderplus.types import AMRFinderPlusDatabaseDirFmt


class TestAnnotateSampleDataAMRFinderPlus(TestPluginBase):
package = "q2_amr.amrfinderplus.tests"

def mock_run_amrfinderplus_n(
self,
working_dir,
amrfinderplus_db,
dna_sequences,
protein_sequences,
gff,
organism,
plus,
report_all_equal,
ident_min,
curated_ident,
coverage_min,
translation_table,
threads,
):
with open(os.path.join(working_dir, "amr_annotations.tsv"), "w"):
pass
if organism:
with open(os.path.join(working_dir, "amr_mutations.tsv"), "w"):
pass
if dna_sequences:
with open(os.path.join(working_dir, "amr_genes.fasta"), "w"):
pass

files_contigs = [
"amr_annotations.tsv",
"amr_mutations.tsv",
"sample1_amr_genes.fasta",
]

files_mags = [
"mag1_amr_annotations.tsv",
"mag1_amr_mutations.tsv",
"mag1_amr_genes.fasta",
]

def test_annotate_sample_data_amrfinderplus_mags(self):
sequences = MultiMAGSequencesDirFmt()
with open(os.path.join(str(sequences), "MANIFEST"), "w") as file:
file.write("sample-id,mag-id,filename\nsample1,mag1,sample1/mag1.fasta\n")
self._helper(sequences=sequences, organism=None, files=self.files_mags)

def test_annotate_sample_data_amrfinderplus_mags_organism(self):
sequences = MultiMAGSequencesDirFmt()
with open(os.path.join(str(sequences), "MANIFEST"), "w") as file:
file.write("sample-id,mag-id,filename\nsample1,mag1,sample1/mag1.fasta\n")
self._helper(sequences, "Escherichia", files=self.files_mags)

def test_annotate_sample_data_amrfinderplus_contigs(self):
sequences = ContigSequencesDirFmt()
with open(os.path.join(str(sequences), "sample1_contigs.fasta"), "w"):
pass
self._helper(sequences=sequences, organism=None, files=self.files_contigs)

def test_annotate_sample_data_amrfinderplus_contigs_organism(self):
sequences = ContigSequencesDirFmt()
with open(os.path.join(str(sequences), "sample1_contigs.fasta"), "w"):
pass
self._helper(
sequences=sequences, organism="Escherichia", files=self.files_contigs
)

def _helper(self, sequences, organism, files):
amrfinderplus_db = AMRFinderPlusDatabaseDirFmt()
mock_create_count_table = MagicMock()
mock_read_in_txt = MagicMock()
with patch(
"q2_amr.amrfinderplus.sample_data.run_amrfinderplus_n",
side_effect=self.mock_run_amrfinderplus_n,
), patch(
"q2_amr.amrfinderplus.sample_data.read_in_txt", mock_read_in_txt
), patch(
"q2_amr.amrfinderplus.sample_data.create_count_table",
mock_create_count_table,
):
result = annotate_sample_data_amrfinderplus(
sequences=sequences,
amrfinderplus_db=amrfinderplus_db,
organism=organism,
)
self.assertTrue(
os.path.exists(os.path.join(str(result[0]), "sample1", files[0]))
)
self.assertTrue(
os.path.exists(os.path.join(str(result[1]), "sample1", files[1]))
)
self.assertTrue(os.path.exists(os.path.join(str(result[2]), files[2])))
95 changes: 95 additions & 0 deletions q2_amr/amrfinderplus/tests/test_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
from unittest.mock import patch

from qiime2.plugin.testing import TestPluginBase

from q2_amr.amrfinderplus.utils import run_amrfinderplus_n


class TestAnnotateMagsCard(TestPluginBase):
package = "q2_amr.amrfinderplus.tests"

@patch("q2_amr.amrfinderplus.utils.run_command")
def test_run_amrfinderplus_n(self, mock_run_command):
run_amrfinderplus_n(
working_dir="path_dir",
amrfinderplus_db="amrfinderplus_db",
dna_sequences="dna_sequences",
protein_sequences="protein_sequences",
gff="gff",
organism="Escherichia",
plus=True,
report_all_equal=True,
ident_min=1,
curated_ident=False,
coverage_min=1,
translation_table="11",
threads=4,
)
mock_run_command.assert_called_once_with(
[
"amrfinder",
"--database",
"amrfinderplus_db",
"-o",
"path_dir/amr_annotations.tsv",
"--print_node",
"-n",
"dna_sequences",
"--nucleotide_output",
"path_dir/amr_genes.fasta",
"-p",
"protein_sequences",
"--protein_output",
"path_dir/amr_proteins.fasta",
"-g",
"gff",
"--threads",
"4",
"--organism",
"Escherichia",
"--mutation_all",
"path_dir/amr_mutations.tsv",
"--plus",
"--report_all_equal",
"--ident_min",
"1",
"--coverage_min",
"1",
"--translation_table",
"11",
],
"path_dir",
verbose=True,
)

@patch("q2_amr.amrfinderplus.utils.run_command")
def test_run_amrfinderplus_n_minimal(self, mock_run_command):
run_amrfinderplus_n(
working_dir="path_dir",
amrfinderplus_db="amrfinderplus_db",
dna_sequences=None,
protein_sequences=None,
gff=None,
organism=None,
plus=False,
report_all_equal=False,
ident_min=None,
curated_ident=True,
coverage_min=None,
translation_table=None,
threads=None,
)
mock_run_command.assert_called_once_with(
[
"amrfinder",
"--database",
"amrfinderplus_db",
"-o",
"path_dir/amr_annotations.tsv",
"--print_node",
"--ident_min",
"-1",
],
"path_dir",
verbose=True,
)
Loading
Loading