Skip to content

Conversation

cjw85
Copy link
Contributor

@cjw85 cjw85 commented Sep 30, 2025

This saves around 10-15% in programs calling the modified base APIs.

@jmarshall
Copy link
Member

Looks like a good idea.

If you lift the declarations of tmp (to the char *cp declaration) and of (as, say, failed, to the top) these can stay as one-liners, which would be nice (clearer).

@daviesrob
Copy link
Member

I'd agree that lofting particularly of could be useful. It could even be tested at the end to detect overflows, which isn't done at the moment.

It would also be a good idea to change the strtol() calls in bam_parse_basemod2(). bam_mods_at_next_pos and bam_parse_basemod2 rely somewhat on the strings being parsed in the same way, so I'd be a bit worried about subtle differences between strtol() and hts_str2uint() resulting in odd things happening if the input is not strictly spec-compliant.

sam_mods.c Outdated
if (cp != state->MM[i])
state->MMcount[i] = strtol(cp+1, NULL, 10);
else
if (cp != state->MM[i]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now there's no need to modify a bunch of surrounding lines by adding { … }.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

: 0;
if (!cp_end) {
// empty list
if (*cp == ',') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbonfield This site made me think for a minute; the interplay of the ternary and the if(!cp_end) worried me. I believe I have the logic correct, it reads a little more direct now as a consequence.

Copy link
Contributor

@jkbonfield jkbonfield Oct 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This highlights a bug in the old code infact.

Earlier on we have:

            char *ms = cp, *me; // mod code start and end                       
            char *cp_end = NULL;                                                
            int chebi = 0;                                                      
            if (isdigit_c(*cp)) {                                               
                chebi = strtol(cp, &cp_end, 10);                                
                cp = cp_end;                                                    
                ms = cp-1;                                                      
            } else {                                                            
                while (*cp && isalpha_c(*cp))                                   
                    cp++;                                                       
                if (*cp == '\0')                                                
                    return -1;                                                  
            }                                                                   

However cp_end is non-NULL if we have a CHEBI code, and NULL otherwise.

Moving on to the code you commented on above: if we have no comma, and hence an empty list, then we're now affected by the CHEBI vs otherwise logic. Indeed this test file shows a problem:

$ cat /tmp/MM-chebi.sam
*	0	*	0	0	*	*	0	0	ACGCT	*	Mm:Z:C+m;C+76792;N+n;

$ ./test/test_mod /tmp/MM-chebi.sam  2>/dev/null
0	A
1	C	C+(76792).
2	G
3	C
4	T
---
Present: m. #-76792. n.
1	C	C+(76792).

If I put a cp_end = NULL after the above code block so it's always NULL regardless of code-vs-CHEBI then the rogue mod vanishes.

The revised code side-steps the issue and I agree it has easier to understand logic. A good spot.

@jkbonfield jkbonfield merged commit 72422ef into samtools:develop Oct 6, 2025
9 checks passed
@jkbonfield
Copy link
Contributor

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants