Skip to content

Commit 0109aa6

Browse files
committed
Simplify decoding filter for UTF-8
When decoding a 3-byte UTF-8 code unit, redundant checks for overlong code unit and for illegal codepoints from U+D800-DFFF were included. Both of these conditions are caught by the line which reads: if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) { As such, there is no reason to check for the same error conditions again. Likewise, when decoding a 4-byte UTF-8 code unit, there was a redundant check for overlong code unit. That was already caught by the line which reads: if ((c2 & 0xC0) != 0x80 || (c == 0xF0 && c2 < 0x90) || (c == 0xF4 && c2 >= 0x90)) {
1 parent 50e3201 commit 0109aa6

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

ext/mbstring/libmbfl/filters/mbfilter_utf8.c

+5-6
Original file line numberDiff line numberDiff line change
@@ -249,11 +249,9 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
249249
p--;
250250
} else {
251251
uint32_t decoded = ((c & 0xF) << 12) | ((c2 & 0x3F) << 6) | (c3 & 0x3F);
252-
if (decoded < 0x800 || (decoded >= 0xD800 && decoded <= 0xDFFF)) {
253-
*out++ = MBFL_BAD_INPUT;
254-
} else {
255-
*out++ = decoded;
256-
}
252+
ZEND_ASSERT(decoded >= 0x800); /* Not an overlong code unit */
253+
ZEND_ASSERT(decoded < 0xD800 || decoded > 0xDFFF); /* U+D800-DFFF are reserved, illegal code points */
254+
*out++ = decoded;
257255
}
258256
} else {
259257
*out++ = MBFL_BAD_INPUT;
@@ -283,7 +281,8 @@ static size_t mb_utf8_to_wchar(unsigned char **in, size_t *in_len, uint32_t *buf
283281
p--;
284282
} else {
285283
uint32_t decoded = ((c & 0x7) << 18) | ((c2 & 0x3F) << 12) | ((c3 & 0x3F) << 6) | (c4 & 0x3F);
286-
*out++ = (decoded < 0x10000) ? MBFL_BAD_INPUT : decoded;
284+
ZEND_ASSERT(decoded >= 0x10000); /* Not an overlong code unit */
285+
*out++ = decoded;
287286
}
288287
} else {
289288
*out++ = MBFL_BAD_INPUT;

0 commit comments

Comments
 (0)