CRAM intergration #232

IanSudbery · 2016-08-04T09:52:43Z

Has anyone been thinking about modifications to allow CRAM compatibility (i.e. CRAM output from the mapping pipeline, CRAM input to the transcript assembly and quantitation pipelines etc)?

Stripping sequencing from BAM files does result in a huge storage space gain, but on so many occasions it has turned out that we are actaully intrested in keeping the sequence - sequencing stripping is not lossless, because although the positions of the mismatches are stored, the actual mismatches themselves are not (mapper dependent).

Lossless CRAM is about 50% smaller than BAM, and for high depth runs, this is even true if you include the reference sequence in the CRAM file.

AndreasHeger · 2016-08-22T10:06:07Z

pysam can read/write CRAM, so it should not be an issue for our own tools but require some patching here or there.

I don't know how many of the the downstream tools we use can make use of CRAM. bedtools, for example, is not there yet:

https://twitter.com/aaronquinlan/status/738069194936188930

Changing the output of pipeline_mapping to CRAM should be a first, fairly independent step and we can then adapt over time?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRAM intergration #232

CRAM intergration #232

IanSudbery commented Aug 4, 2016

AndreasHeger commented Aug 22, 2016

CRAM intergration #232

CRAM intergration #232

Comments

IanSudbery commented Aug 4, 2016

AndreasHeger commented Aug 22, 2016