Skip to content

Commit

Permalink
Remove unused variables and refactor GENE_LIST
Browse files Browse the repository at this point in the history
Noted in previous PRs that that the `GENES` and `GENES_SPACE_DELIMITED`
variables are not needed¹ or used in the workflow,² so refactor the
`GENE_LIST` to be a hardcoded list of genes.

If we want to ensure that we do not miss any genes from the Nextclade
dataset, we could parse out the gene names from the dataset's
genome_annotation.gff file. However, I think that will over-complicate
the Snakemake workflow so I'm leaving the hardcoded list.

¹ #372 (comment)
² #435 (comment)
  • Loading branch information
joverlee521 committed Feb 21, 2024
1 parent 5f0b4e2 commit 121b613
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@ import os
# Snakemake 7.7.0 introduced `retries` directive used in fetch_sequences
min_version("7.7.0")

GENES = "E,M,N,ORF1a,ORF1b,ORF3a,ORF6,ORF7a,ORF7b,ORF8,ORF9b,S"
GENES_SPACE_DELIMITED = GENES.replace(",", " ")
GENE_LIST = GENES.split(",")
# Hardcoded gene list used to create the DAG for both nextclade.smk and upload.smk
# It does _not_ need to be supplied to the `nextclade run` invocations because
# it matches the genes listed in the SARS-CoV-2 Nextclade dataset genome_annotations.gff
# https://github.com/nextstrain/nextclade_data/blob/244058e7d599a8295d748b12cffdd25cec6d3e7b/data/nextstrain/sars-cov-2/wuhan-hu-1/orfs/genome_annotation.gff3
# - Jover, 21 Feb 2024
GENE_LIST = ['E', 'M', 'N', 'ORF1a', 'ORF1b', 'ORF3a', 'ORF6', 'ORF7a', 'ORF7b', 'ORF8', 'ORF9b', 'S']

#################################################################
####################### general setup ###########################
Expand Down

0 comments on commit 121b613

Please sign in to comment.