-
Notifications
You must be signed in to change notification settings - Fork 0
Step 4: Extract Unique Reads
mealser edited this page Jul 14, 2017
·
2 revisions
- Extract the name and length of each reference in the database.
- Extract the Unique Reads.
- Build the Coverage Plot (it takes the plot window size and the read classifier mode).
$ grep '>' fungi_ConcatContigs.fa > GenomeInformation.txt
$ python ReadClassifier.py SRR3546361_MergedContigs_Sorted.sam 1 > Read_FullList.sam
$ python CoveragePlot.py Read_FullList.sam GenomeInformation.txt 100 1
This command also extracts the uniquereads.
$ samtools view SRR3546361_MergedContigs_Sorted.bam | awk 'BEGIN { FS="\t" } { c[$1]++; l[$1,c[$1]]=$0 } END { for (i in c) { if (c[i] == 1) for (j = 1; j <= c[i]; j++) print l[i,j] } }' | sort -t$'\t' -k 3,3 -V -k 1,1 > Read_FullList.sam
To count the number of Unique reads, then you compare this number with the number of extracted unique reads:
$ samtools view SRR3546361_MergedContigs_Filtered.bam | awk '{print $1}'| uniq -c | awk -F ' ' '{if ($1==1) print $2}' | wc -l
or more accurately compare the reads:
$ samtools view SRR3546361_MergedContigs_Filtered.bam | awk '{print $1}'| uniq -c | awk -F ' ' '{if ($1==1) print $2}' | sort > test.txt
$ awk '{print $1}' UniqueReads_Sorted.sam | sort > test2.txt
$ diff test.txt test2.txt