This example is written for Themisto v3.0.0 or newer and mSWEEP v2.0.0 or newer.
Download a toy dataset from Zenodo.
Create a list containing paths to the input assemblies
ls -d $(pwd)"/assemblies/"*.fasta.gz > input_sequences.txt
Index the input data with Themisto (v3.0.0 or newer)
mkdir tmp
themisto build -k 31 -i input_sequences.txt -o themisto_index --temp-dir tmp -t 2 --mem-gigas 4
this will create the themisto_index.tcolors
and themisto_index.tdbg
files.
Correct errors in the reads using [fastp](https://github.com/opengene/fastp]
fastp --in1 215_1.fastq.gz --in2 215_2.fastq.gz --out1 corr_1.fastq.gz --out2 corr_2.fastq.gz -c --thread 2
Pseudoalign the reads with Themisto
themisto pseudoalign -q 215_1.fastq.gz -i themisto_index --temp-dir tmp -t 2 > 215_1.txt
themisto pseudoalign -q 215_2.fastq.gz -i themisto_index --temp-dir tmp -t 2 > 215_2.txt
Pseudoalign the reads as above and compress the alignment file with alignment-writer
ntargets=$(wc -l clustering.txt | cut -f1 -d' ')
nreads=$((`gunzip -c 215_1.fastq.gz | wc -l` / 4 ))
themisto pseudoalign -q 215_1.fastq.gz -i themisto_index --temp-dir tmp -t 1 | alignment-writer -n $ntargets -r $nreads > 215_1.aln
themisto pseudoalign -q 215_1.fastq.gz -i themisto_index --temp-dir tmp -t 1 | alignment-writer -n $ntargets -r $nreads > 215_2.aln
this is particularly useful for very large alignments. mSWEEP v2.0.0 and newer can automatically detect the file format if the alignment files are compressed.
Estimate abundances with
mSWEEP --themisto-1 215_1.txt --themisto-2 215_2.txt -i clustering.txt -t 2
this will print the abundances after the estimation is finished. To write the abundances to 215_abundances.txt
, run
mSWEEP --themisto-1 215_1.txt --themisto-2 215_2.txt -i clustering.txt -t 2 -o 215
Bin the reads by adding the --bin-reads
toggle
mSWEEP --themisto-1 215_1.txt --themisto-2 215_2.txt -i clustering.txt -t 2 --bin-reads
which will create the clust1.bin
, clust2.bin
, clust3.bin
, and clust4.bin
files. These files can be used as input to mGEMS to extract the reads from the input data to themisto pseudoalign
.