Add UMI support to FASTQ input/output. #1960
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The fastq_umi option (FASTQ_OPT_UMI enum) is used to enable UMI parsing in read names.
When reading FASTQ it converts the 8th Illumina field to an aux tag (default to RX). We may need to amend this if people require it to work on earlier Illumina naming systems, but for now we target the current software.
When writing FASTQ we hunt for a series of tags and choose the first one found. RX is the usual one, but users may wish to also use OX if they potentially have error corrected data in RX and want to regenerate fastq from the original uncorrect tag.
Complexities arrive when dealing with /1 or /2 and #num multi-plexing strings, meaning we have to use temporary buffers rather than simply truncating the read names.