Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #421

Merged
merged 9 commits into from
Feb 14, 2025
Merged

Dev #421

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 23 additions & 23 deletions db/tbdb.bed
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@ Chromosome 1 1524 Rv0001 dnaA isoniazid
Chromosome 4933 7267 Rv0005 gyrB levofloxacin,moxifloxacin
Chromosome 7068 9818 Rv0006 gyrA levofloxacin,moxifloxacin
Chromosome 13133 13911 Rv0010c Rv0010c isoniazid
Chromosome 490545 491793 Rv0407 fgd1 pretomanid,delamanid,clofazimine
Chromosome 574479 576790 Rv0486 mshA isoniazid,ethionamide
Chromosome 619500 620865 Rv0529 ccsA kanamycin,capreomycin,amikacin
Chromosome 490545 491793 Rv0407 fgd1 clofazimine,delamanid,pretomanid
Chromosome 574479 576790 Rv0486 mshA ethionamide,isoniazid
Chromosome 619500 620865 Rv0529 ccsA amikacin,capreomycin,kanamycin
Chromosome 656010 657739 Rv0565c Rv0565c ethionamide
Chromosome 731680 732406 Rv0635 hadA isoniazid
Chromosome 733853 734970 Rv0639 nusG rifampicin
Expand All @@ -16,58 +16,58 @@ Chromosome 778477 779624 Rv0677c mmpS5 bedaquiline,clofazimine
Chromosome 778790 779487 Rv0678 mmpR5 bedaquiline,clofazimine
Chromosome 781126 781934 Rv0682 rpsL streptomycin
Chromosome 800106 801462 Rv0701 rplC linezolid
Chromosome 1253074 1254783 Rv1129c Rv1129c levofloxacin,moxifloxacin,isoniazid,rifampicin
Chromosome 1302606 1305501 Rv1173 fbiC pretomanid,delamanid,clofazimine
Chromosome 1253074 1254783 Rv1129c Rv1129c isoniazid,levofloxacin,moxifloxacin,rifampicin
Chromosome 1302606 1305501 Rv1173 fbiC clofazimine,delamanid,pretomanid
Chromosome 1364162 1365186 Rv1221 sigE pyrazinamide
Chromosome 1406081 1407604 Rv1258c Rv1258c streptomycin,pyrazinamide,isoniazid
Chromosome 1406081 1407604 Rv1258c Rv1258c isoniazid,pyrazinamide,streptomycin
Chromosome 1416181 1418048 Rv1267c embR ethambutol
Chromosome 1460802 1461290 Rv1305 atpE bedaquiline
Chromosome 1471498 1473382 EBG00000313325 rrs kanamycin,capreomycin,streptomycin,amikacin
Chromosome 1471498 1473382 EBG00000313325 rrs amikacin,capreomycin,kanamycin,streptomycin
Chromosome 1473408 1476795 EBG00000313339 rrl capreomycin,linezolid
Chromosome 1673148 1675011 Rv1484 inhA isoniazid,ethionamide
Chromosome 1673148 1675011 Rv1484 inhA ethionamide,isoniazid
Chromosome 1833247 1834987 Rv1630 rpsA pyrazinamide
Chromosome 1853358 1854388 Rv1644 tsnR linezolid
Chromosome 1917506 1918746 Rv1694 tlyA capreomycin
Chromosome 2062809 2065010 Rv1819c bacA kanamycin,capreomycin,streptomycin,amikacin
Chromosome 2101651 2103337 Rv1854c ndh isoniazid,delamanid,ethionamide
Chromosome 2062809 2065010 Rv1819c bacA amikacin,capreomycin,kanamycin,streptomycin
Chromosome 2101651 2103337 Rv1854c ndh delamanid,ethionamide,isoniazid
Chromosome 2153889 2156842 Rv1908c katG isoniazid
Chromosome 2167649 2170934 Rv1918c PPE35 pyrazinamide
Chromosome 2221719 2223825 Rv1979c Rv1979c bedaquiline,clofazimine
Chromosome 2288681 2290323 Rv2043c pncA pyrazinamide
Chromosome 2517915 2519365 Rv2245 kasA isoniazid
Chromosome 2714124 2715832 Rv2416c eis kanamycin,amikacin
Chromosome 2714124 2715832 Rv2416c eis amikacin,kanamycin
Chromosome 2725899 2726780 Rv2428 ahpC isoniazid
Chromosome 2746135 2747798 Rv2447c folC para-aminosalicylic_acid
Chromosome 2782366 2786169 Rv2477c Rv2477c kanamycin,levofloxacin,moxifloxacin,streptomycin,rifampicin,ethambutol,amikacin
Chromosome 2782366 2786169 Rv2477c Rv2477c amikacin,ethambutol,kanamycin,levofloxacin,moxifloxacin,rifampicin,streptomycin
Chromosome 2859300 2860640 Rv2535c pepQ bedaquiline,clofazimine
Chromosome 2986639 2987615 Rv2671 ribD para-aminosalicylic_acid
Chromosome 2995772 2996737 Rv2680 Rv2680 capreomycin
Chromosome 2996539 2998055 Rv2681 Rv2681 capreomycin
Chromosome 3064515 3067372 Rv2752c Rv2752c levofloxacin,moxifloxacin,isoniazid,rifampicin,ethambutol
Chromosome 3064515 3067372 Rv2752c Rv2752c ethambutol,isoniazid,levofloxacin,moxifloxacin,rifampicin
Chromosome 3067193 3068161 Rv2754c thyX para-aminosalicylic_acid
Chromosome 3073680 3074671 Rv2764c thyA para-aminosalicylic_acid
Chromosome 3086620 3087935 Rv2780 ald cycloserine
Chromosome 3338868 3339762 Rv2983 fbiD pretomanid,delamanid,clofazimine
Chromosome 3338868 3339762 Rv2983 fbiD clofazimine,delamanid,pretomanid
Chromosome 3448253 3449991 Rv3083 Rv3083 ethionamide
Chromosome 3568401 3569280 Rv3197A whiB7 kanamycin,amikacin,streptomycin
Chromosome 3568401 3569280 Rv3197A whiB7 amikacin,kanamycin,streptomycin
Chromosome 3611959 3613847 Rv3236c Rv3236c pyrazinamide
Chromosome 3623159 3625110 Rv3244c lpqB rifampicin,bedaquiline
Chromosome 3624910 3626860 Rv3245c mtrB rifampicin,bedaquiline
Chromosome 3626663 3627924 Rv3246c mtrA rifampicin,bedaquiline
Chromosome 3640207 3641538 Rv3261 fbiA pretomanid,delamanid,clofazimine
Chromosome 3641335 3642881 Rv3262 fbiB pretomanid,delamanid,clofazimine
Chromosome 3623159 3625110 Rv3244c lpqB bedaquiline,rifampicin
Chromosome 3624910 3626860 Rv3245c mtrB bedaquiline,rifampicin
Chromosome 3626663 3627924 Rv3246c mtrA bedaquiline,rifampicin
Chromosome 3640207 3641538 Rv3261 fbiA clofazimine,delamanid,pretomanid
Chromosome 3641335 3642881 Rv3262 fbiB clofazimine,delamanid,pretomanid
Chromosome 3840194 3841620 Rv3423c alr cycloserine
Chromosome 3877464 3879240 Rv3457c rpoA rifampicin
Chromosome 3986612 3987299 Rv3547 ddn pretomanid,delamanid
Chromosome 3986612 3987299 Rv3547 ddn delamanid,pretomanid
Chromosome 4038158 4041013 Rv3596c clpC1 pyrazinamide
Chromosome 4043862 4046428 Rv3601c panD pyrazinamide
Chromosome 4138202 4140002 Rv3696c glpK levofloxacin,moxifloxacin,isoniazid,streptomycin,rifampicin,ethambutol
Chromosome 4138202 4140002 Rv3696c glpK ethambutol,isoniazid,levofloxacin,moxifloxacin,rifampicin,streptomycin
Chromosome 4237683 4243147 Rv3793 embC ethambutol
Chromosome 4242947 4246517 Rv3794 embA ethambutol
Chromosome 4246314 4249810 Rv3795 embB ethambutol
Chromosome 4266953 4269124 Rv3805c aftB ethambutol
Chromosome 4268925 4270084 Rv3806c ubiA ethambutol
Chromosome 4326004 4330174 Rv3854c ethA ethionamide
Chromosome 4327328 4328199 Rv3855 ethR ethionamide
Chromosome 4338171 4338961 Rv3862c whiB6 kanamycin,capreomycin,amikacin
Chromosome 4338171 4338961 Rv3862c whiB6 amikacin,capreomycin,kanamycin
Chromosome 4407528 4408481 Rv3919c gid streptomycin
2 changes: 0 additions & 2 deletions db/tbdb.dict

This file was deleted.

2 changes: 1 addition & 1 deletion db/tbdb.dr.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion db/tbdb.fasta.fai

This file was deleted.

4 changes: 1 addition & 3 deletions db/tbdb.mask.bed
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,7 @@ Chromosome 99162 99174
Chromosome 102100 102138
Chromosome 102140 102150
Chromosome 103743 103756
Chromosome 103788 104164
Chromosome 104317 104985
Chromosome 104986 104987
Chromosome 103788 104987
Chromosome 106207 106343
Chromosome 125830 125834
Chromosome 126259 126260
Expand Down
2 changes: 0 additions & 2 deletions db/tbdb.rules.txt

This file was deleted.

17 changes: 10 additions & 7 deletions db/tbdb.variables.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"db-schema-version": "1.0.0",
"db-schema-version": "1.1.0",
"snpEff_db": "Mycobacterium_tuberculosis_h37rv",
"drugs": [
"rifampicin",
Expand All @@ -24,10 +24,14 @@
"tb-profiler-version": ">=6.0.0,<7.0.0",
"version": {
"name": "tbdb",
"commit": "72ef6fa",
"Author": "Jody Phelan <[email protected]>",
"Date": "Tue Jul 16 16:56:19 2024 +0100",
"db-schema-version": "1.0.0"
"repo": "[email protected]:jodyphelan/tbdb.git",
"branch": "tbdb",
"commit": "7066eb43",
"status": "clean",
"author": "Jody Phelan",
"date": "Fri Feb 14 09:41:10 2025 +0100",
"db-schema-version": "1.1.0",
"tb-profiler-version": ">=6.0.0,<7.0.0"
},
"amplicon": false,
"files": {
Expand All @@ -39,7 +43,6 @@
"spoligotype_spacers": "tbdb.spoligotype_spacers.txt",
"spoligotype_annotations": "tbdb.spoligotype_list.csv",
"bedmask": "tbdb.mask.bed",
"barcode": "tbdb.barcode.bed",
"rules": "tbdb.rules.txt"
"barcode": "tbdb.barcode.bed"
}
}
18 changes: 8 additions & 10 deletions tb-profiler
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ discovered_plugins = {

__softwarename__ = 'tbprofiler'
__default_db_dir__ = f'{sys.base_prefix}/share/{__softwarename__}'
__compatible_db_schema_version__ = '1.0.0'

@atexit.register
def cleanup():
Expand Down Expand Up @@ -246,8 +247,6 @@ def main_update_tbdb(args):


extra_args = []
if os.path.isfile('rules.txt'):
extra_args.append("--rules rules.txt")
if args.match_ref:
extra_args.append("--match_ref %s" % os.path.abspath(args.match_ref))

Expand All @@ -260,8 +259,8 @@ def main_update_tbdb(args):

def main_create_db(args):

version_string = json.load(open('variables.json'))['tb-profiler-version']
tbp.check_db_version(version_string,tbp.__version__)
version_string = json.load(open('variables.json'))['db-schema-version']
tbp.check_db_version(version_string,__compatible_db_schema_version__)

if args.no_overwrite:
dbs = pp.list_db(args.software_name)
Expand All @@ -277,8 +276,6 @@ def main_create_db(args):
}
if args.barcode:
extra_files["barcode"] = args.barcode
if args.rules:
extra_files["rules"] = args.rules

with TempFilePrefix() as tmpfile:
args.csv = tbp.reformat_variant_csv_file(args.csv,f'{tmpfile}.variants.csv')
Expand Down Expand Up @@ -371,11 +368,11 @@ def main_batch(args):


def main_list_db(args):
dbs = pp.list_db(args.software_name)
dbs = pp.list_db(args.db_dir)
for db in dbs:
if 'version' in db:
d = dict(**db['version'], location=f"{sys.base_prefix}/share/{args.software_name}/{db['version']['name']}")
sys.stdout.write("%(name)s\t%(commit)s\t%(Author)s\t%(Date)s\t%(location)s\n" % d)
d = dict(**db['version'], location=f"{args.db_dir}/{db['version']['name']}")
sys.stdout.write("%(name)s\t%(commit)s\t%(author)s\t%(date)s\t%(location)s\n" % d)



Expand Down Expand Up @@ -587,7 +584,6 @@ parser_sub.add_argument('--spoligotypes',default="spoligotype_spacers.txt",type=
parser_sub.add_argument('--spoligotype_annotations','--spoligotype-annotations',default="spoligotype_list.csv")
parser_sub.add_argument('--barcode',default="barcode.bed",type=str,help='A bed file containing lineage barcode SNPs')
parser_sub.add_argument('--bedmask',default="mask.bed",type=str,help='A bed file containing a list of low-complexity regions')
parser_sub.add_argument('--rules',type=str,default="rules.txt",help='A file containing python rules')
parser_sub.add_argument('--amplicon_primers','--amplicon-primers',type=str,help='A file containing a list of amplicon primers')
parser_sub.add_argument('--match_ref','--match-ref',type=str,help='Match the chromosome name to the given fasta file')
parser_sub.add_argument('--custom',action="store_true",help='Tells the script this is a custom database, this is used to alter the generation of the version definition')
Expand Down Expand Up @@ -639,6 +635,7 @@ parser_sub.add_argument('--args',type=str, help='Arguments to use with tb-profil
parser_sub.add_argument('--jobs','-j',default=1,help='Threads to use',type=int)
parser_sub.add_argument('--threads_per_job','--threads-per-job','-t',default=1,help='Threads to use',type=int)
parser_sub.add_argument('--dir','-d',default=".",help='Storage directory')
parser_sub.add_argument('--db_dir',type=os.path.abspath,default=__default_db_dir__,help='Database directory')
parser_sub.add_argument('--no_clean','--no-clean', action='store_true',help=argparse.SUPPRESS)
parser_sub.add_argument('--temp',help="Temp firectory to process all files",type=str,default=".")
parser_sub.add_argument('--version', action='version', version="tb-profiler version %s" % tbp.__version__)
Expand All @@ -650,6 +647,7 @@ parser_sub = subparsers.add_parser('list_db', help='List loaded databases', form
parser_sub.add_argument('--dir','-d',default=".",help='Storage directory')
parser_sub.add_argument('--no_clean','--no-clean', action='store_true',help=argparse.SUPPRESS)
parser_sub.add_argument('--temp',help="Temp firectory to process all files",type=str,default=".")
parser_sub.add_argument('--db_dir',type=os.path.abspath,default=__default_db_dir__,help='Database directory')
parser_sub.add_argument('--version', action='version', version="tb-profiler version %s" % tbp.__version__)
parser_sub.add_argument('--logging',type=str.upper,default="INFO",choices=["DEBUG","INFO","WARNING","ERROR","CRITICAL"],help='Logging level')
parser_sub.add_argument('--debug',action='store_true',help=argparse.SUPPRESS)
Expand Down
6 changes: 3 additions & 3 deletions tbprofiler/reformat.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
from typing import List, Tuple , Union, Optional
from .utils import get_gene2drugs
import argparse
from pathogenprofiler.utils import shared_dict
from pathogenprofiler.utils import shared_dict, get_software_used

def get_main_lineage(lineages: List[Lineage],max_node_skip: int = 1) -> Tuple[str, str]:
"""
Expand Down Expand Up @@ -206,7 +206,7 @@ def create_lineage_result(
pipeline = Pipeline(
software_version=args.version,
db_version=args.conf['version'],
software=[{'process':k,'software':v} for k,v in shared_dict.items()]
software=get_software_used()
)
data = {
'id':args.prefix,
Expand All @@ -233,7 +233,7 @@ def create_resistance_result(
pipeline = Pipeline(
software_version=args.version,
db_version=args.conf['version'],
software=[{'process':k,'software':v} for k,v in shared_dict.items()]
software=get_software_used()
)
if hasattr(qc, 'missing_positions'):
qc.missing_positions = filter_missing_positions(qc.missing_positions)
Expand Down
24 changes: 10 additions & 14 deletions tbprofiler/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
import csv
import logging
import re
from packaging.version import Version


def process_tb_profiler_args(args: argparse.Namespace) -> None:
if args.snp_dist:
Expand Down Expand Up @@ -93,17 +95,11 @@ def reformat_variant_csv_file(files: list, outfile: str) -> str:

return outfile

def check_db_version(db_version: str, tbprofiler_version: str) -> None:
for d in db_version.split(","):
r = re.search('([<>=]+)(.*)',d)
if r==None:
logging.error(f"Invalid version string: {d}")
quit(1)

d = f"{r.group(1)} '{r.group(2)}'"
if eval(f"'{tbprofiler_version}' {d}")==False:
if ">" in d:
logging.error(f"Your version of tb-profiler ({tbprofiler_version}) is too old to use this version of the database. Please update tb-profiler to {db_version}")
else:
logging.error(f"Your version of tb-profiler ({tbprofiler_version}) is too new to use this version of the database. Please update the database to {db_version}")
quit(1)
def check_db_version(db_current_version_str: str, compatible_schema_version_str: str):
db_current_version = Version(db_current_version_str)
compatible_schema_version = Version(compatible_schema_version_str)
logging.debug(f"Database version: {db_current_version}")
logging.debug(f"Compatible schema version: {compatible_schema_version}")
if db_current_version.major != compatible_schema_version.major:
logging.error(f"Latest database schema version {db_current_version_str} is not compatible with this version of tb-profiler (requires {compatible_schema_version.major}.x.x). Please make sure you are using the latest software and database versions.")
quit(1)
13 changes: 2 additions & 11 deletions tests/run_test.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,6 @@
if not os.path.isdir("tb-profiler-test-data"):
run_cmd("git clone https://github.com/jodyphelan/tb-profiler-test-data.git")

# por5_dr_variants = [
# ('rpoB', 'p.Ser450Leu'),
# ('fabG1', 'c.-15C>T'),
# ('inhA', 'p.Ile194Thr'),
# ('pncA', 'p.Val125Gly'),
# ('embB', 'p.Met306Val'),
# ('embB', 'p.Met423Thr'),
# ('gid', 'p.Ala80Pro')
# ]

por5_dr_variants = [
('rpoB', 'p.Ser450Leu'),
Expand Down Expand Up @@ -59,8 +50,8 @@ def test_vcf():
check_assertations("results/por5_vcf.results.json")

def test_nanopore():
run_cmd(f"tb-profiler profile --db {db} -1 tb-profiler-test-data/por5A.nanopore_reduced.fastq.gz --platform nanopore -p por5A_illumina_nanopore -t 4 --af '0.5,0.7' --depth '0,5' --txt --csv --docx")
check_assertations("results/por5A_illumina_nanopore.results.json")
run_cmd(f"tb-profiler profile --db {db} -1 tb-profiler-test-data/por5A.nanopore_reduced.fastq.gz --platform nanopore -p por5A_nanopore -t 4 --caller bcftools --af '0.5,0.7' --depth '0,5' --txt --csv --docx")
check_assertations("results/por5A_nanopore.results.json")

def test_fasta():
run_cmd(f"tb-profiler profile --db {db} -f tb-profiler-test-data/por5A1.fasta -p por5A_fasta --txt --csv --docx")
Expand Down
Loading