Skip to content

Counting lower complexity barcodes yields many false positives #12

Open
@mschubert

Description

@mschubert

I found this tool after trying to use the function vmatchPDict from the Bioconductor package Biostrings for barcode matching (which was horribly slow, and took 100 GB of memory if matching with mismatches). guide-counter is amazingly fast and easy to use! 👍

I have one issue, however: My barcodes that I'm matching are lower complexity than the CRISPR sg RNAs would be, i.e. only 12 nucleotides instead of the 20 mentioned in #8.

As such, the automated offset detection for the library sequences yields many false positives, stemming from the read randomly containing this sequence at another position. As a result, I get many more barcode matches than I have reads (usually on average 1.5-2 per read).

Would it be possible to restrict the offset matching to a common value for all reads, or provide a custom offset via a command-line option?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions