Progress

Jump to bottom

mealser edited this page Jul 19, 2017 · 11 revisions

All Tasks [June 12th - current]

Concatenate all Contigs for each genome & build the genome database (.fa file).
Extract unique Reads.
Extract multimapped reads within genomes.
Extract multimapped reads across genomes.
Build "coverage plot" for 1, 2, and 3.
Nominate best multimapped read within a genome (based on edit distance, then alignment score, then randomly).
Nominate best multimapped read across genomes (based on the genome that has highest percentage of (total number of unique reads x their total length / reference length).
Build histogram plot of edit distance for 1, 2, and 3.
Build histogram plot of matches (if read length=150 & CIGAR= 50S100M1I10M10S, then matches%=110/150) for 1, 2, and 3.
Calculate relative abundance for each genome.

New Tasks

Build our comprehensive reference database (fungi, eukaryote, and plasmid).
For each fungal reference in the database, we want to generate all substrings using sliding window (overlapping or non-overlapping) and map them to all bacterial references. Then from the .sam file we can select one best bacteria that maps to our generated substring. We plot this result on a new track of the same coverage plot (inner circle of the plot).
Interactive HighCharts.

What is done so far

Task #1
Task #2
Task #3
Task #4
Task #5
Task #6