|
| 1 | +--- |
| 2 | +title: "Ebola virus Nanopore sequencing bioinformatics protocol | amplicon, native barcoding" |
| 3 | +keywords: protocol |
| 4 | +layout: document |
| 5 | +last_updated: Dec 12, 2019 |
| 6 | +tags: [protocol] |
| 7 | +summary: |
| 8 | +permalink: ebov-bioinformatics-sop |
| 9 | +folder: ebov |
| 10 | +title_text: "Ebola virus bioinformatics protocol" |
| 11 | +subtitle_text: "Nanopore | bioinformatics" |
| 12 | +document_name: "ARTIC-EBOV-bioinformaticsSOP" |
| 13 | +version: v1.0.1 |
| 14 | +creation_date: 2018-05-26 |
| 15 | +revision_date: |
| 16 | +forked_from: |
| 17 | +author: Nick Loman |
| 18 | +citation: "Loman *et al.* In Prep." |
| 19 | +nav_menu: false |
| 20 | +show_tile: false |
| 21 | +category: ebov |
| 22 | +--- |
| 23 | + |
| 24 | +{% include callout.html |
| 25 | +type='default' |
| 26 | +content='**Overview:** A complete bioinformatics protocol to take the output from the [sequencing protocol](/ebov/ebov-seq-sop.html) to consensus genome sequences. Includes basecalling, de-multiplexing, mapping, polishing and consensus generation. |
| 27 | +' |
| 28 | +%} |
| 29 | + |
| 30 | +<br /> |
| 31 | + |
| 32 | +This document is part of the Ebola virus Nanopore sequencing protocol package: |
| 33 | +: [http://artic.network/ebov/](http://artic.network/ebov/) |
| 34 | + |
| 35 | +#### Related documents: |
| 36 | + |
| 37 | +Ebola virus Nanopore sequencing protocol: |
| 38 | +: [http://artic.network/ebov/ebov-seq-sop.html](/ebov/ebov-seq-sop.html) |
| 39 | + |
| 40 | +Setting up the laptop computing environment using Conda: |
| 41 | +: [http://artic.network/ebov/ebov-it-setup.html](http://artic.network/ebov/ebov-it-setup.html) |
| 42 | + |
| 43 | +Phylogenetic analysis and visualization: |
| 44 | +: [http://artic.network/ebov/ebov-phylogenetics-sop.html](http://artic.network/ebov/ebov-phylogenetics-sop.html) |
| 45 | + |
| 46 | + |
| 47 | +<br /><br /><br /> |
| 48 | + |
| 49 | +{% include wellcome-trust.html %} |
| 50 | + |
| 51 | +<div class="pagebreak"> </div> |
| 52 | + |
| 53 | +## Preparation |
| 54 | + |
| 55 | +Set up the computing environment as described here in this document: [ebov-it-setup](ebov-it-setup.html). This should be done and tested prior to sequencing, particularly if this will be done in an environment without internet access or where this is slow or unreliable. Once this is done, the bioinformatics can be performed largely off-line. If you are already using lab-on-SSD, you can skip this step. |
| 56 | + |
| 57 | +## Make a new directory for analysis |
| 58 | + |
| 59 | +Give your analysis directory a meaningful name, e.g.. analysis/run_name |
| 60 | + |
| 61 | +```bash |
| 62 | +mkdir analysis |
| 63 | +cd analysis |
| 64 | + |
| 65 | +mkdir run_name |
| 66 | +cd run_name |
| 67 | +``` |
| 68 | + |
| 69 | +## Activate the ARTIC environment: |
| 70 | + |
| 71 | +All steps in this tutorial should be performed in the artic-ebov conda environment: |
| 72 | + |
| 73 | +```bash |
| 74 | +source activate artic-ebov |
| 75 | +``` |
| 76 | + |
| 77 | +## RAMPART |
| 78 | + |
| 79 | +To run RAMPART on a current run: |
| 80 | + |
| 81 | +```bash |
| 82 | +artic rampart |
| 83 | +``` |
| 84 | + |
| 85 | +Select your run and protocol, enter the names of your barcodes, then open http://localhost:3000 in your browser. |
| 86 | + |
| 87 | +### Basecalling with Guppy |
| 88 | + |
| 89 | +If you did basecalling with MinKNOW, skip this step. |
| 90 | + |
| 91 | +Run the Guppy basecaller on the new MinION run folder: |
| 92 | + |
| 93 | +For fast mode basecalling: |
| 94 | + |
| 95 | +```bash |
| 96 | +guppy_basecaller -c dna_r9.4.1_450bps_fast.cfg -i /path/to/reads -s run_name -x auto -r |
| 97 | +``` |
| 98 | + |
| 99 | +For high-accuracy mode basecalling: |
| 100 | + |
| 101 | +```bash |
| 102 | +guppy_basecaller -c dna_r9.4.1_450bps_hac.cfg -i /path/to/reads -s run_name -x auto -r |
| 103 | +``` |
| 104 | + |
| 105 | +You need to substitute `/path/to/reads` to the folder where the FAST5 files from your |
| 106 | +run are. Common locations are: |
| 107 | + |
| 108 | + - Mac: ```/Library/MinKNOW/data/run_name``` |
| 109 | + - Linux: ```/var/lib/MinKNOW/data/run_name``` |
| 110 | + - Windows ```c:/data/reads``` |
| 111 | + |
| 112 | +This will create a folder called `run_name` with the base-called reads in it. |
| 113 | + |
| 114 | +### Consensus sequence generation |
| 115 | + |
| 116 | +We first collect all the FASTQ files (typically stored in files each containing 4000 reads) |
| 117 | +into a single file. |
| 118 | + |
| 119 | +```bash |
| 120 | +artic gather --min-length 400 --max-length 700 --prefix run_name |
| 121 | +``` |
| 122 | + |
| 123 | +The command will show you the runs in /var/lib/MinKNOW/data and ask you to select one. If you know the path to the reads use: |
| 124 | + |
| 125 | +```bash |
| 126 | +artic gather --min-length 400 --max-length 700 --prefix run_name --directory /path/to/reads |
| 127 | +``` |
| 128 | + |
| 129 | +Here `/path_to_reads` should be the folder in which MinKNOW put the base-called reads (i.e., `run_name` from the command above). |
| 130 | + |
| 131 | +We use a length filter here of between 400 and 700 to remove obviously chimeric reads. |
| 132 | + |
| 133 | +You may need to change these numbers if you are using different length primer schemes. Try the minimum lengths of the amplicons as the |
| 134 | +minimum, and the maximum length of the amplicons plus 200 as the maximum. |
| 135 | + |
| 136 | +I.e. if your amplicons are 300 base pairs, use --min-length 300 --max-length 500 |
| 137 | + |
| 138 | +You will now have a file called: ``run_name_pass.fastq`` |
| 139 | +and a file called ``run_name_sequencing_summary.txt``, |
| 140 | +as well as individual files for each barcode (if previously demultiplexed). |
| 141 | + |
| 142 | +### Demultiplex with Porechop with stringent settings |
| 143 | + |
| 144 | +This stage is obligatory, even if you have already demultiplexed with Guppy, due to |
| 145 | +significant barcoding misassignments that can confound results: |
| 146 | + |
| 147 | +```bash |
| 148 | +artic demultiplex --threads 4 run_name_pass.fastq |
| 149 | +``` |
| 150 | + |
| 151 | +Now you will have new files called: |
| 152 | + |
| 153 | +```bash |
| 154 | +run_name_pass_NB01.fastq |
| 155 | +run_name_pass_NB02.fastq |
| 156 | +run_name_pass_NB03.fastq |
| 157 | +``` |
| 158 | + |
| 159 | +### Create the nanopolish index (once per sequencing run, not per sample) |
| 160 | + |
| 161 | +```bash |
| 162 | +nanopolish index -s run_name_sequencing_summary.txt -d /path/to/reads run_name_pass.fastq |
| 163 | +``` |
| 164 | + |
| 165 | +Again, alter ``/path/to/reads`` to point to the original location of the FAST5 files. |
| 166 | + |
| 167 | +## Run the MinION pipeline |
| 168 | + |
| 169 | +For each barcode you wish to process (e.g. run this command 12 times for 12 barcodes), replacing the file name and sample name as appropriate: |
| 170 | + |
| 171 | +E.g. for NB01 |
| 172 | + |
| 173 | +```bash |
| 174 | +artic minion --normalise 200 --threads 4 --scheme-directory ~/artic/artic-ebov/primer-schemes --read-file run_name_pass_NB01.fastq --nanopolish-read-file run_name_pass.fastq IturoEbola/V1 samplename |
| 175 | +``` |
| 176 | + |
| 177 | +Replace ``samplename`` as appropriate. |
| 178 | + |
| 179 | +E.g. for NB02 |
| 180 | + |
| 181 | +```bash |
| 182 | +artic minion --normalise 200 --threads 4 --scheme-directory ~/artic/artic-ebov/primer-schemes --read-file run_name_pass_NB02.fastq --nanopolish-read-file run_name_pass.fastq IturoEbola/V1 samplename |
| 183 | +``` |
| 184 | + |
| 185 | +## Output files |
| 186 | + |
| 187 | + * ``samplename.primertrimmed.bam`` - BAM file for visualisation after primer-binding site trimming |
| 188 | + * ``samplename.vcf`` - detected variants in VCF format |
| 189 | + * ``samplename.variants.tab`` - detected variants |
| 190 | + * ``samplename.consensus.fasta`` - consensus sequence |
| 191 | + |
| 192 | +To put all the consensus sequences in one filei called my_consensus_genome, run |
| 193 | + |
| 194 | +```bash |
| 195 | +cat *.consensus.fasta > my_consensus_genomes.fasta |
| 196 | +``` |
| 197 | + |
| 198 | +## To visualise genomes in Tablet |
| 199 | + |
| 200 | +Open a new Terminal window: |
| 201 | + |
| 202 | +```bash |
| 203 | +conda activate tablet |
| 204 | +tablet |
| 205 | +``` |
| 206 | + |
| 207 | +Go to "Open Assembly" |
| 208 | + |
| 209 | +Load the BAM (binary alignment file) as the first file. |
| 210 | + |
| 211 | +Load the refernece file (in artic/artic-ebov/primer_schemes/IturiEbola/V1/IturiEbola.reference.fasta) as the second file. |
| 212 | + |
| 213 | +Select Variants mode in Color Schemes for ease of viewing variants. |
| 214 | + |
0 commit comments