Description
I found this tool after trying to use the function vmatchPDict
from the Bioconductor package Biostrings
for barcode matching (which was horribly slow, and took 100 GB of memory if matching with mismatches). guide-counter
is amazingly fast and easy to use! 👍
I have one issue, however: My barcodes that I'm matching are lower complexity than the CRISPR sg RNAs would be, i.e. only 12 nucleotides instead of the 20 mentioned in #8.
As such, the automated offset detection for the library sequences yields many false positives, stemming from the read randomly containing this sequence at another position. As a result, I get many more barcode matches than I have reads (usually on average 1.5-2 per read).
Would it be possible to restrict the offset matching to a common value for all reads, or provide a custom offset via a command-line option?