Skip to content

Commit

Permalink
udpating to use gofasta rather than datafunk as a speedup for alignme…
Browse files Browse the repository at this point in the history
…nt. Also piping sam from minimap2 to gofasta now rather than writing to a file
  • Loading branch information
aineniamh committed Feb 17, 2021
1 parent 266c287 commit 3617ac1
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 31 deletions.
2 changes: 1 addition & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ dependencies:
- pip=19.3.1
- python=3.7
- snakemake-minimal=5.13
- gofasta
- pip:
- pandas==1.0.1
- git+https://github.com/cov-ert/datafunk.git
- git+https://github.com/cov-lineages/pangoLEARN.git
46 changes: 16 additions & 30 deletions pangolin/scripts/pangolearn.smk
Original file line number Diff line number Diff line change
Expand Up @@ -65,44 +65,30 @@ rule parse_paf:
else:
unmapped.write(f"{record.id},failed to map\n")


rule minimap2_to_reference:
rule align_to_reference:
input:
fasta = rules.parse_paf.output.fasta,
reference = config["reference_fasta"]
output:
sam = os.path.join(config["tempdir"],"reference_mapped.sam")
log:
os.path.join(config["tempdir"], "logs/minimap2_sam.log")
shell:
"""
minimap2 -a -x asm5 -t {workflow.cores} {input.reference:q} {input.fasta:q} -o {output.sam:q} &> {log}
"""

rule datafunk_trim_and_pad:
input:
sam = rules.minimap2_to_reference.output.sam,
reference = config["reference_fasta"]
params:
trim_start = config["trim_start"],
trim_end = config["trim_end"],
insertions = os.path.join(config["tempdir"],"insertions.txt")
trim_start = 265,
trim_end = 29674
output:
fasta = os.path.join(config["aligndir"],"sequences.aln.fasta")
log:
os.path.join(config["outdir"], "logs/minimap2_sam.log")
shell:
"""
datafunk sam_2_fasta \
-s {input.sam:q} \
-r {input.reference:q} \
-o {output.fasta:q} \
-t [{params.trim_start}:{params.trim_end}] \
--pad \
--log-inserts
minimap2 -a -x asm5 -t {workflow.cores} {input.reference:q} {input.fasta:q} | \
gofasta sam toMultiAlign \
--reference {input.reference:q} \
--trimstart {params.trim_start} \
--trimend {params.trim_end} \
--pad > {output.fasta:q}
"""

rule pangolearn:
input:
fasta = rules.datafunk_trim_and_pad.output.fasta,
fasta = rules.align_to_reference.output.fasta,
model = config["trained_model"],
header = config["header_file"],
reference = config["reference_fasta"]
Expand Down Expand Up @@ -154,7 +140,7 @@ rule add_failed_seqs:

rule type_variants_b117:
input:
fasta = rules.datafunk_trim_and_pad.output.fasta,
fasta = rules.align_to_reference.output.fasta,
variants = config["b117_variants"],
reference = config["reference_fasta"]
output:
Expand All @@ -171,7 +157,7 @@ rule type_variants_b117:

rule type_variants_b1351:
input:
fasta = rules.datafunk_trim_and_pad.output.fasta,
fasta = rules.align_to_reference.output.fasta,
variants = config["b1351_variants"],
reference = config["reference_fasta"]
output:
Expand All @@ -188,7 +174,7 @@ rule type_variants_b1351:

rule type_variants_p2:
input:
fasta = rules.datafunk_trim_and_pad.output.fasta,
fasta = rules.align_to_reference.output.fasta,
variants = config["p2_variants"],
reference = config["reference_fasta"]
output:
Expand All @@ -206,7 +192,7 @@ rule type_variants_p2:

rule type_variants_p1:
input:
fasta = rules.datafunk_trim_and_pad.output.fasta,
fasta = rules.align_to_reference.output.fasta,
variants = config["p1_variants"],
reference = config["reference_fasta"]
output:
Expand Down

3 comments on commit 3617ac1

@niemasd
Copy link

@niemasd niemasd commented on 3617ac1 Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the switch to gofasta may break some environments: we had a working pangolin installation, then updated using pangolin --update, and now it didn't work anymore, with no descriptive error. We had to dig through the verbose messages to find the "command not found" error

@aineniamh
Copy link
Member Author

@aineniamh aineniamh commented on 3617ac1 Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @niemasd, I don't think it'll 'break' the environment per se, I think they'll just have to be updated. I'll clarify in the docs that the --update flag only updates to latest versions of pangolin and pangoLEARN, doesn't deal with dependencies. If you follow the alternative update instructions (https://cov-lineages.org/pangolin_docs/updating.html) it should update the environment successfully! If there's problems with that and gofasta won't install for some reason, just let me know!

@niemasd
Copy link

@niemasd niemasd commented on 3617ac1 Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying! Perhaps one workaround would be to have checks for the dependencies at the beginning of running pangolin.

The primary issue was that the error message we received was about minimap2 crashing (with no other info), and when we tried running the minimap2 command, minimap2 complained that the FASTA file it was looking for didn't exist. We then reran pangolin in verbose mode and had to dig through the verbose output to notice the gofasta: command not found error. I coincidentally happened to also be looking at the pangolin GitHub repo and saw that the most recent commit mentioned switching to gofasta, which prompted me to think that perhaps a new dependency was introduced.

If pangolin were to do a quick dependency check at the beginning and provide feedback regarding what tool is missing (e.g. "Dependency gofasta does not seem to be installed. Please refer to the Pangolin installation instructions to view how to install it") or something, I think it could potentially help. Here are some examples of how I do this type of dependency check in my ViralMSA tool:

https://github.com/niemasd/ViralMSA/blob/master/ViralMSA.py#L119

Thanks for the awesome tool, and I hope you're doing well! 😄

Please sign in to comment.