Hi, BamClipper has been really useful for the pipelines I work with, so thanks for the tool.
I came across an issue with some data from a tiling amplicon scheme that uses small amplicons (~280 bp) that are prepped using a tagmentation library prep and sequenced using Illumina, especially with 2 x 300 bp sequencing. The tagmentation means that many of the inserts of the actual sequencing library are parts of the original amplicon and often contain only one of the primer sites. For amplicons that are not too small, that isn't too much of an issue as the forward read won't frequently include the reverse primer and the reverse read won't frequently include the forward primer. But in a context of small amplicons and long sequencing reads this scenario can happen a lot.
When the main script checks to see if a read overlaps a primer site, it only checks for the forward primer on sense reads and only for the reverse primer for anti-sense reads. With the original setup described in the paper, with the entire amplicon as the insert, that approach makes sense. However, if the amplicons are subjected to Nextera or TruSeq-type library preps, you'll have inserts that include just one primer site. In that case only the sense or the anti-sense read will be clipped. This is not a huge issued for large amplicons but a combination of small-ish amplicons and long sequencing reads (e.g. 2 x 300) can be a problem.
The problematic lines:
clipprimer.pl lines 216-224:
if ($alignmentdirection eq "+") {
if (defined $position2amplicon_positive{$chrom}{$left_alignpos}) {
$primers = $position2amplicon_positive{$chrom}{$left_alignpos};
}
} elsif ($alignmentdirection eq "-") {
if (defined $position2amplicon_negative{$chrom}{$right_alignpos}) {
$primers = $position2amplicon_negative{$chrom}{$right_alignpos};
}
}
Changing to this will solve the issue:
if (defined $position2amplicon_positive{$chrom}{$left_alignpos}) {
$primers = $position2amplicon_positive{$chrom}{$left_alignpos};
} elsif (defined $position2amplicon_negative{$chrom}{$right_alignpos}) {
$primers = $position2amplicon_negative{$chrom}{$right_alignpos};
}
Essentially, just check to see if the read pair overlaps any primer site. The code afterwards will only clip the parts of the reads that actually overlap with the primer sites. For example, this properly clips a reverse read that covers the forward primer out to the middle of the amplicon, where the current check ignores such a read.
Hi, BamClipper has been really useful for the pipelines I work with, so thanks for the tool.
I came across an issue with some data from a tiling amplicon scheme that uses small amplicons (~280 bp) that are prepped using a tagmentation library prep and sequenced using Illumina, especially with 2 x 300 bp sequencing. The tagmentation means that many of the inserts of the actual sequencing library are parts of the original amplicon and often contain only one of the primer sites. For amplicons that are not too small, that isn't too much of an issue as the forward read won't frequently include the reverse primer and the reverse read won't frequently include the forward primer. But in a context of small amplicons and long sequencing reads this scenario can happen a lot.
When the main script checks to see if a read overlaps a primer site, it only checks for the forward primer on sense reads and only for the reverse primer for anti-sense reads. With the original setup described in the paper, with the entire amplicon as the insert, that approach makes sense. However, if the amplicons are subjected to Nextera or TruSeq-type library preps, you'll have inserts that include just one primer site. In that case only the sense or the anti-sense read will be clipped. This is not a huge issued for large amplicons but a combination of small-ish amplicons and long sequencing reads (e.g. 2 x 300) can be a problem.
The problematic lines:
clipprimer.pl lines 216-224:
Changing to this will solve the issue:
Essentially, just check to see if the read pair overlaps any primer site. The code afterwards will only clip the parts of the reads that actually overlap with the primer sites. For example, this properly clips a reverse read that covers the forward primer out to the middle of the amplicon, where the current check ignores such a read.