Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-mapped reads when mapping to reference genome #9

Open
michael-kotliar opened this issue Apr 4, 2018 · 1 comment
Open

Multi-mapped reads when mapping to reference genome #9

michael-kotliar opened this issue Apr 4, 2018 · 1 comment

Comments

@michael-kotliar
Copy link

I believe that "$PARAM_BAM_PREFIX"_total.bam should be either filtered by quality 255 like you do here or by flag 0x100 (not primary alignment) similar to the way you do here. Otherwise your BAM file includes multi-mapped reads, that influences on scaling coefficient. Alternatively, you can use --outFilterMultimapNmax 1 for STAR aligner and don't use any filters after it. By default, STAR allows up to 10 multiple alignments.

@julienrichardalbert
Copy link
Owner

Filtering the reference-aligned bam file with the same criteria for filtering pseudogenome-aligned bams was considered.. in fact it is still an active topic in our group. One pragmatic reason it is coded as such is because we are interested in multi-mapped reads (especially to repetitive elements).

The main reason why we keep multi-mapped reads is because "mappability" is different between the reference and pseudo-genomes. This is important at genes part of large families - when a SNV or INDEL is introduced, this gene will become "more mappable" in the pseudogenome relative to the reference. Then, when creating the genomic tracks, if uniquely aligned reads are removed from the reference bam, allele-specific reads outnumber "total" reads at these repetitive genes. This causes really bad visualization of allelic effects, which is something we want to avoid.

A word of caution in specifying the number of multimap alignments to report: when using Tophat2, setting -g/--max-multihits resulted in unique reads alignments at repetitive loci. Instead of discarding all alignments with equal mapping quality, the aligner marks one of the alignments as unique (high MAPQ). Not sure if this applies to STAR as well, but I am now wary of these parameters.

Happy to discuss more on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants