Counting lower complexity barcodes yields many false positives

I found this tool after trying to use the function `vmatchPDict`  from the Bioconductor package `Biostrings` for barcode matching (which was horribly slow, and took 100 GB of memory if matching with mismatches). `guide-counter` is amazingly fast and easy to use! :+1:

I have one issue, however: My barcodes that I'm matching are lower complexity than the CRISPR sg RNAs would be, i.e. only 12 nucleotides instead of the 20 mentioned in https://github.com/fulcrumgenomics/guide-counter/issues/8.

As such, the automated offset detection for the library sequences yields many false positives, stemming from the read randomly containing this sequence at another position. As a result, I get many more barcode matches than I have reads (usually on average 1.5-2 per read).

Would it be possible to restrict the offset matching to a common value for all reads, or provide a custom offset via a command-line option?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Counting lower complexity barcodes yields many false positives #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Counting lower complexity barcodes yields many false positives #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions