Skip to content

Forbid \C in UTF-8 patterns#21139

Open
arnaud-lb wants to merge 1 commit intophp:PHP-8.4from
arnaud-lb:gh21134
Open

Forbid \C in UTF-8 patterns#21139
arnaud-lb wants to merge 1 commit intophp:PHP-8.4from
arnaud-lb:gh21134

Conversation

@arnaud-lb
Copy link
Member

@arnaud-lb arnaud-lb commented Feb 5, 2026

Possible fix for GH-21134.

\C results in undefined, potentially unsafe behavior in UTF-8 mode, so error out in case the escape sequence is used in UTF-8 mode.

@arnaud-lb arnaud-lb changed the base branch from master to PHP-8.4 February 5, 2026 14:25
@arnaud-lb arnaud-lb changed the title Disable \C in UTF-8 patterns Forbid \C in UTF-8 patterns Feb 5, 2026
@arnaud-lb arnaud-lb marked this pull request as ready for review February 5, 2026 18:45
coptions |= PCRE2_UCP;
#endif
/* The \C escape sequence is unsafe in PCRE2_UTF mode */
coptions |= PCRE2_NEVER_BACKSLASH_C;
Copy link
Member

@devnexen devnexen Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need to document it somehow (UPGRADING ?) wdyt ?

}
pcre2_get_error_message(errnumber, error, sizeof(error));
if (errnumber == PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED) {
strlcpy((char*)error, "using \\C is incompatible with the 'u' modifier", sizeof(error));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the default message is a bit vague in this context, makes sense.

@devnexen
Copy link
Member

devnexen commented Feb 5, 2026

I know enough pcre2 to see it s LGTM but cc @ndossche just in case :)

Comment on lines +792 to 797
if (errnumber == PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED) {
strlcpy((char*)error, "using \\C is incompatible with the 'u' modifier", sizeof(error));
} else {
pcre2_get_error_message(errnumber, error, sizeof(error));
}
php_error_docref(NULL,E_WARNING, "Compilation failed: %s at offset %zu", error, erroffset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the strlcpy call is ugly, but I won't object. Why not smth like this (and possibly rename error to error_buf):

Suggested change
if (errnumber == PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED) {
strlcpy((char*)error, "using \\C is incompatible with the 'u' modifier", sizeof(error));
} else {
pcre2_get_error_message(errnumber, error, sizeof(error));
}
php_error_docref(NULL,E_WARNING, "Compilation failed: %s at offset %zu", error, erroffset);
const char *err_msg = (const char *) error;
if (errnumber == PCRE2_ERROR_BACKSLASH_C_CALLER_DISABLED) {
err_msg = "using \\C is incompatible with the 'u' modifier";
} else {
pcre2_get_error_message(errnumber, error, sizeof(error));
}
php_error_docref(NULL,E_WARNING, "Compilation failed: %s at offset %zu", err_msg, erroffset);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants