Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update to fail on duplicate LN tags, with different LN values #1882

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

vasudeva8
Copy link
Contributor

@vasudeva8 vasudeva8 commented Jan 27, 2025

Fixes #1866
Updated to use 1st LN tag value as in cram handling.
Accepts duplicate LN tags as long as they have same value.
If the values differ, returns failure.

sam_hdr_create updated for sam file checks and sam_hdr_sanitise for handling bam and cram.
sam_hdr_sanitise calls sam_hdr_fill_hrecs and in turn sam_hrecs_update_hashes, where duplicate check is made.

For cram, there is no added overhead as fill_hrecs is already in use.
For sam and bam, the sanitise update adds as overhead which is required to identify the duplicates.
For sam, the check occurs during normal parsing in sam_hdr_create and sanitise update is redundant.
For bam, fill_hrecs update is required to identify the duplicates. sanitise seem to be the best common place to have this.

@daviesrob daviesrob self-assigned this Jan 30, 2025
sam.c Outdated
@@ -1911,6 +1911,12 @@ static sam_hdr_t *sam_hdr_sanitise(sam_hdr_t *h) {
cp[h->l_text] = '\0';
}

if (sam_hdr_fill_hrecs(h) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we want to call sam_hdr_fill_hrecs() here. It will trigger a full parse of the header, which we may not want to do unless really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

sam.c Outdated
hts_pos_t tmp = strtoll(q + 3, (char**)&q, 10);
if (ln != -1 && tmp != ln) {
//duplicate LN tag with different value
hts_log_error("111Header includes @SQ line \"%s\" with \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message looks a bit odd...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

@vasudeva8
Copy link
Contributor Author

updated to avoid hrec fill overhead.
bam has no change in processing and does not show any warning on duplicate LN tag in text part
cram shows the warning as earlier.
sam shows warning on duplicate LN tag and uses the 1st LN value as in CRAM processing.

bam processing is odd that it doesn't show the warning but avoids the overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistency in parsing @SQ headers with multiple LN: and SN: tags
2 participants