Skip to content

Step 4: Extract Unique Reads

mealser edited this page Jul 14, 2017 · 2 revisions

Step 4: Extract Unique Reads:

  1. Extract the name and length of each reference in the database.
  2. Extract the Unique Reads.
  3. Build the Coverage Plot (it takes the plot window size and the read classifier mode).
$ grep '>' fungi_ConcatContigs.fa > GenomeInformation.txt
$ python ReadClassifier.py SRR3546361_MergedContigs_Sorted.sam 1 > Read_FullList.sam
$ python CoveragePlot.py Read_FullList.sam GenomeInformation.txt 100 1

This command also extracts the uniquereads.

$ samtools view SRR3546361_MergedContigs_Sorted.bam | awk 'BEGIN { FS="\t" } { c[$1]++; l[$1,c[$1]]=$0 } END { for (i in c) { if (c[i] == 1) for (j = 1; j <= c[i]; j++) print l[i,j] } }' | sort -t$'\t' -k 3,3 -V -k 1,1 > Read_FullList.sam

Check the correctness of the results:

To count the number of Unique reads, then you compare this number with the number of extracted unique reads:

$ samtools view SRR3546361_MergedContigs_Filtered.bam | awk '{print $1}'| uniq -c | awk -F ' ' '{if ($1==1) print $2}' | wc -l

or more accurately compare the reads:

$ samtools view SRR3546361_MergedContigs_Filtered.bam | awk '{print $1}'| uniq -c | awk -F ' ' '{if ($1==1) print $2}' | sort > test.txt
$ awk '{print $1}' UniqueReads_Sorted.sam | sort > test2.txt
$ diff test.txt test2.txt

alt text

Clone this wiki locally