Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRAM intergration #232

Open
IanSudbery opened this issue Aug 4, 2016 · 1 comment
Open

CRAM intergration #232

IanSudbery opened this issue Aug 4, 2016 · 1 comment

Comments

@IanSudbery
Copy link
Member

Has anyone been thinking about modifications to allow CRAM compatibility (i.e. CRAM output from the mapping pipeline, CRAM input to the transcript assembly and quantitation pipelines etc)?

Stripping sequencing from BAM files does result in a huge storage space gain, but on so many occasions it has turned out that we are actaully intrested in keeping the sequence - sequencing stripping is not lossless, because although the positions of the mismatches are stored, the actual mismatches themselves are not (mapper dependent).

Lossless CRAM is about 50% smaller than BAM, and for high depth runs, this is even true if you include the reference sequence in the CRAM file.

@AndreasHeger
Copy link
Member

pysam can read/write CRAM, so it should not be an issue for our own tools but require some patching here or there.

I don't know how many of the the downstream tools we use can make use of CRAM. bedtools, for example, is not there yet:

https://twitter.com/aaronquinlan/status/738069194936188930

Changing the output of pipeline_mapping to CRAM should be a first, fairly independent step and we can then adapt over time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants