percent_encode and percent_encode_index are on the hot path for URL parsing. All three are scalar today
I have been prototyping NEON implementations:
percent_encode_index: 16-byte scan using vqtbl2q_u8 for character_set bitmask lookup, replacing the 8-byte unrolled scalar loop
percent_decode: vceqq_u8 to find %, bulk copy clean blocks, scalar for the 3-byte %XX decode
percent_encode: same character_set SIMD check inline, bulk append clean 16-byte blocks
percent_encode and percent_encode_index are on the hot path for URL parsing. All three are scalar today
I have been prototyping NEON implementations:
percent_encode_index: 16-byte scan using vqtbl2q_u8 for character_set bitmask lookup, replacing the 8-byte unrolled scalar loop
percent_decode: vceqq_u8 to find %, bulk copy clean blocks, scalar for the 3-byte %XX decode
percent_encode: same character_set SIMD check inline, bulk append clean 16-byte blocks