Conversation
|
Updated 10:49 PM PT - Apr 1st, 2026
❌ @Jarred-Sumner, your commit d8220c9 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 28767That installs a local version of the PR into your bun-28767 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughReplaced a static dyld symbol reference with a runtime dlsym lookup; added 4× unrolled SIMD prologues and new SIMD scan helpers; replaced byte-by-byte ANSI/OSC/CSI scanning with bulk SIMD; switched ANSI stripping to lazy vector allocation; expanded string-width benchmarks and tightened mismatch reporting. Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/bindings/ANSIHelpers.h`:
- Around line 302-322: Align the loop condition style in firstNonAsciiPrintable
with findEscapeCharacter by using a signed comparison: change the test that
currently reads static_cast<size_t>(end - it) >= stride4 to compare the signed
difference (end - it) against a signed stride (e.g.,
static_cast<ptrdiff_t>(stride4)), so the loop uses the same defensive signed
check as findEscapeCharacter; update the loop condition in
firstNonAsciiPrintable (and any analogous loop using
stride4/stride/stride2/stride3) to use the signed comparison with ptrdiff_t to
avoid unsigned wraparound.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9ab591d5-0214-4b6f-b81a-d66f521a691a
📒 Files selected for processing (2)
scripts/build/shims/asan-dyld-shim.csrc/bun.js/bindings/ANSIHelpers.h
There was a problem hiding this comment.
LGTM — clean 4x SIMD unroll and a straightforward dlsym fix for the private symbol link issue.
Extended reasoning...
Overview
Two files changed: scripts/build/shims/asan-dyld-shim.c (build shim, macOS-only) and src/bun.js/bindings/ANSIHelpers.h (SIMD string scanning helpers). The PR is already merged to main as commit afb9b0f.
Security risks
None. The dlsym call uses RTLD_DEFAULT to look up a well-known private Apple symbol, with an explicit NULL guard. The SIMD changes are pure computation with no I/O or allocation.
Level of scrutiny
Both changes are mechanical optimizations with clear correctness arguments. The asan shim change is a one-liner workaround for a linker issue (private symbol absent from .tbd stubs). The SIMD unroll follows a standard libc memchr pattern: OR four chunks, branch once, then narrow in order — correctness is straightforward to verify by inspection.
Other factors
No bugs were found by the automated bug hunting system. The code is well-commented, explaining both the motivation (NEON umov+cbnz hazard) and the correctness invariant (must narrow chunks in-order to find the first match). The tail handling (1x SIMD + scalar) is unchanged from the original and correctly handles remaining bytes.
Replace byte-by-byte state-machine loops in consumeANSI's CSI/OSC/ST parsing states with SIMD scans for terminator bytes. Also switch stripANSI from StringBuilder to a flat Vector<Char> with a manual cursor and direct buffer adoption to skip per-append bookkeeping.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/bindings/stripANSI.cpp`:
- Around line 42-60: The code resets cursor to buffer.begin() only when
foundANSI is false, but foundANSI is only set when a real ANSI sequence is
consumed, so a non-consuming hit (ANSI::consumeANSI returning escPos) can copy
bytes before the first real ANSI and later wipe them when the buffer is lazily
grown; to fix, ensure the output buffer is initialized and cursor set exactly
once on first ESC discovery (e.g., move buffer.grow(...) and cursor =
buffer.begin() into the branch that runs on the first ESC and mark foundANSI =
true at that time so subsequent literal-byte copies don't cause a later grow to
reset cursor), touching the foundANSI flag, buffer.grow, cursor,
ANSI::consumeANSI, escPos and start logic accordingly; also add a regression
test asserting that an input like "A\\x9C\\x1B[31mB" yields "A\\x9CB" (literal
U+009C before first ANSI) to prevent future regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 842ce53b-dd99-41c4-9850-5aa627201216
📒 Files selected for processing (2)
src/bun.js/bindings/ANSIHelpers.hsrc/bun.js/bindings/stripANSI.cpp
Mirror the C++ ANSI helper optimizations into the Zig stringWidth path. 4x-unroll the main loops in visibleLatin1Width and countPrintableAscii16 to amortize the per-chunk addv->fmov reduction hazard, and replace the byte-by-byte CSI/OSC walks in visibleLatin1WidthExcludeANSIColors with inline @vector range/multi-target scans.
stripANSI: guard the lazy buffer allocation on `cursor == nullptr` instead of `!foundANSI`. A broad-mask false positive (e.g. standalone 0x9c) sets the buffer up but leaves foundANSI=false, so the next iteration with a real escape would re-fire the guard and reset the cursor to buffer.begin(), discarding the previously written byte. stringWidth: refactor visibleUTF16WidthFn's per-codepoint ASCII loop to SIMD-scan inside the saw_csi/saw_osc states instead of stepping byte-by-byte. Generalize the existing scanByteInRange/scanAnyByte u8 helpers to scanLaneInRange/scanLaneAnyOf with a comptime element type so they work for both Latin-1 and UTF-16 lanes. Long OSC payloads (hyperlinks with URLs) now skip the entire payload in one bulk SIMD scan rather than walking each byte through the state machine. Update bench/snippets/string-width.mjs to actually exercise the escape parsing path: replace the misleading literal-bracket inputs with real \x1b ESC bytes, and add inputs for short SGR, truecolor SGR (long CSI body), OSC hyperlinks (long OSC body), bash-style dense SGR, CJK, and UTF-16 + escape combinations. The textLabel is now visible in the bench output instead of being truncated.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/string/immutable/visible.zig`:
- Around line 745-748: The OSC fast path dropped C1 ST (0x9C) which leaves OSC
mode unclosed; update the SIMD/target sets and UTF-16 checks to include 0x9C as
a valid OSC terminator. Specifically, restore 0x9C in the Latin-1 target array
used by scanLaneAnyOf (and the equivalent target lists around the other
occurrences you flagged) and update the UTF-16/CP handling that checks for BEL
or ESC '\\' to also treat cp == 0x9C as an OSC terminator so the parser exits
OSC mode and the printable suffix is included in width calculation; apply the
same change at the other instances noted (around the blocks you referenced:
~797-815 and ~1205-1214).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bed50880-7065-43a4-924f-88d7c6b24313
📒 Files selected for processing (3)
bench/snippets/string-width.mjssrc/bun.js/bindings/stripANSI.cppsrc/string/immutable/visible.zig
The Zig stringWidth's OSC parsing previously only recognized BEL (0x07) and 7-bit ST (ESC \\) as terminators, missing C1 ST (0x9C) — the standard ECMA-48 single-byte OSC terminator. The C++ ANSIHelpers consumeANSI used by stripANSI/sliceAnsi/wrapAnsi already handles 0x9C, so this also closes a behavioral gap between Bun's ANSI parsers. Add 0x9C to the Latin-1 OSC SIMD scan target list and to the UTF-16 non-ASCII codepoint OSC check. (0x9C is > 127 so it never appears in the UTF-16 ASCII per-codepoint region, no change needed there.) Update bench/snippets/string-width.mjs to register both bun and npm benchmarks side-by-side instead of picking one runtime, and put the impl marker (`bun`/`npm`) at the front of each label so it survives the runner's truncation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bench/snippets/string-width.mjs`:
- Around line 25-26: The benchmark only uses BEL-terminated OSC strings (e.g.
the existing [`${ESC}]8;;${URL}\x07Bun${ESC}]8;;\x07`, "hyperlink"] entry), so
the ESC '\' (7-bit ST) and C1 ST (0x9C) OSC termination paths are never
exercised; add two more variants next to the existing "hyperlink" entry that use
ST termination: one replacing the \x07 terminator with the 7-bit ST sequence
(ESC + '\\') and one using the C1 ST byte (\x9C), e.g.
`${ESC}]8;;${URL}${ST}Bun${ESC}]8;;${ST}` and the same with `\x9C`, and make the
same additions for the other similar entries referenced around lines 39-40 so
those branches show up in perf runs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: be5fa0d7-d671-49a4-b854-72570d5efc6b
📒 Files selected for processing (2)
bench/snippets/string-width.mjssrc/string/immutable/visible.zig
The previous hyperlink bench inputs only used BEL terminators, so the ESC backslash and C1 ST 0x9C OSC termination branches in the SIMD scan path were never exercised in perf runs. Add two more variants for both the Latin-1 and UTF-16 hyperlink benchmarks, one with the 7-bit ST sequence (ESC + backslash) and one with the C1 ST byte (0x9C). Mark these as bun-only (skipping the npm bench and the cross-impl mismatch check) since npm string-width's ansi-regex only recognizes BEL terminators and would otherwise give incorrect widths and break the mismatch check.
The ban-words check flagged the two new comptime asserts in scanLaneInRange and scanLaneAnyOf. Switch them to bun.assert.
Three pre-existing flakes hit on this branch's CI run: - test/regression/issue/28632.test.ts: the MySQL leak threshold check (RSS growth < 12 MB) is too tight for ASAN runners — got 14.73 MB on one run. ASAN's shadow memory and noisier RSS measurements push the delta over the line. Double the threshold under isASAN. - test/js/third_party/@azure/service-bus/azure-service-bus.test.ts: third-party Azure SDK trips BUN_JSC_validateExceptionChecks with an unchecked exception in JSArray pushInline. Add to no-validate-exceptions.txt. - test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts: ASAN catches a worker/microtask cleanup leak on exit, same family as the existing broadcast-channel.test.ts entry. Add to no-validate-leaksan.txt next to it.
|
|
||
| const size_t reserved = buffer.size(); | ||
| const size_t outputLen = static_cast<size_t>(cursor - buffer.begin()); | ||
| const size_t waste = reserved - outputLen; | ||
| buffer.shrink(outputLen); | ||
|
|
||
| // Free the slack only if we wasted significantly: capacity > 2 * length OR | ||
| // waste > 1 KB. shrinkToFit() reallocates, so for small over-allocations | ||
| // the realloc cost outweighs the memory saved. | ||
| if (reserved > 2 * outputLen || waste * sizeof(Char) > 1024) { | ||
| buffer.shrinkToFit(); | ||
| } | ||
|
|
||
| return String::adopt(std::move(buffer)); |
There was a problem hiding this comment.
🟡 When stripANSI processes input containing only false-positive escape bytes (e.g. standalone 0x9C C1 ST bytes with no real ANSI sequences), it allocates a buffer and returns a String copy instead of std::nullopt, violating the optimization contract documented in the code comment. The fix is to add if (!foundANSI) return std::nullopt; at line 72 (after the while loop, before the buffer management code).
Extended reasoning...
What the bug is and how it manifests
The stripANSI function in stripANSI.cpp has a code comment that explicitly states: "For no-escape input we return std::nullopt and the caller reuses the original JSString with zero copies." The PR introduces this optimization, but it only fires on one exit path from the while loop. When all found escape-like bytes are false positives (e.g. standalone 0x9C C1 ST), the loop exits via the natural condition (start == end) rather than via the !escPos early-exit branch, and the !foundANSI guard is never reached. Execution falls through to return String::adopt(std::move(buffer)) at line 85, returning an allocated copy identical to the input instead of std::nullopt.
The specific code path that triggers it
findEscapeCharacter includes 0x9C in exactEscapeMatch (it appears explicitly in ANSIHelpers.h), so it returns a pointer to any standalone 0x9C byte. However, consumeANSI receives that pointer and enters State::start, where 0x9C is not listed as a valid introducer -- the default: branch fires and immediately returns escPos unchanged (a false positive). The false-positive path in stripANSI copies the byte literally and continues. When start reaches end this way, the while loop exits at its bottom without ever going through the if (!escPos) branch.
Why existing code does not prevent it
The only !foundANSI guard (lines 35-36) is inside the if (!escPos) block, which is only reachable when findEscapeCharacter returns nullptr -- i.e. when there were zero escape-like bytes at all. There is no equivalent guard after the while loop exits via start == end. So foundANSI == false after processing only false-positives, but the code skips straight to String::adopt(buffer) anyway.
Step-by-step proof for input "hello\x9C"
findEscapeCharacterreturns pointer to0x9Cat offset 5.cursor == nullptrtriggersbuffer.grow(6),cursor = buffer.begin().escPos > start:memcpy(cursor, "hello", 5), cursor advances to offset 5.consumeANSI(0x9C, end): entersState::start,default:branch, returnsescPosunchanged.- False-positive path:
*cursor++ = 0x9C,start = escPos + 1 = end. - While loop condition
start != endis now false -- loop exits at the bottom. foundANSIis stillfalse. No!foundANSIcheck is evaluated here.outputLen = 6.return String::adopt(buffer)-- returns allocated copy instead ofstd::nullopt.
Expected: std::nullopt (caller reuses original JSString). Actual: new String("hello\x9C").
Impact
The output is semantically correct (returned bytes match input). The impact is performance only: an unnecessary heap allocation and memcpy for any input that contains standalone 0x9C bytes but no real ANSI sequences. The PR's own code comment promises this allocation will not happen for no-ANSI inputs, but the promise is not kept when false-positive escape bytes are present. 0x9C is uncommon in practice, making this a minor missed optimization.
Addressing the pre-existing-behavior refutation
One verifier correctly notes that the old StringBuilder-based code also allocated and returned a copy for this case (it never returned nullopt at all). This is accurate -- the behavior is pre-existing. However, the PR introduces the std::nullopt optimization with an explicit code comment documenting the contract, and partially implements it: the optimization fires when there are zero escape-like bytes but not when all found bytes are false positives. The incomplete implementation is a nit rather than a regression, but it is a real gap in the new code.
Fix
Add if (!foundANSI) return std::nullopt; at line 72, immediately after the while loop and before the buffer management code. This handles the case where the loop exited via start == end after processing only false-positive bytes.
What does this PR do?
Speeds up
Bun.stripANSI,Bun.stringWidth, and the shared ANSI parsing helpers used byBun.sliceAnsi/Bun.wrapAnsi/ Node'sreadline.getStringWidth. Five changes:1. 4×-unrolled prologue in
findEscapeCharacterandfirstNonAsciiPrintable(C++)The single-chunk SIMD scan loop was bottlenecked on the per-chunk
umaxv → fmov → cbzchain (the NEON→GPR transfer that has to complete before the loop can branch). The 4× prologue ORs four chunks' hit masks together and only does the transfer + branch every 64 bytes (8-bit) or 32 halfwords (16-bit). The compiler fuses the four sequential 16-byte loads into twoldp(load pair) instructions for an efficient 32-byte fetch, and exact-match refinement is now lazy — only runs on a real broad-mask hit. Same pattern as glibc/muslmemchr.2. SIMD terminator scans in
consumeANSI(C++)The CSI/OSC/ST parsing states were stepping byte-by-byte through escape payloads. For OSC sequences (hyperlinks with long URLs), this was the dominant cost. Two new templated helpers —
scanForByteInRange<Lo, Hi>(CSI terminator in[0x40, 0x7E], wrapping subtract + ≤ compare) andscanForAnyByte<Targets...>(OSC/ST terminators via templatedSIMD::equal<a,b,c>) — replace the innerinCsi/inOsc/needStper-byte walks with bulk SIMD calls.3. Flat
Vector<Char>buffer instripANSI(C++)Replaced
WTF::StringBuilderwith aVector<Char>allocated lazily on first ESC (grow(input.size())skips per-element initialization for POD types), filled with rawmemcpyand a manual cursor pointer, then adopted directly into the outputStringviaString::adopt(std::move(buffer)). Skips the per-append bookkeeping and the finaltoString()shrink-copy. ConditionalshrinkToFit()only when over-allocation is significant. (Fixed a follow-up bug where the lazy-allocation guard was!foundANSIrather thancursor == nullptr, causing the cursor to reset on the iteration after a broad-mask false positive — flagged in code review.)4. Mirror the SIMD optimizations in Zig stringWidth (Zig)
Same 4× prologue + lazy reduction in
visibleLatin1WidthandcountPrintableAscii16. Generic SIMD scan helpers (scanLaneInRange<T>/scanLaneAnyOf<T>) replace the byte-by-byte CSI/OSC walks in bothvisibleLatin1WidthExcludeANSIColors(Latin-1) andvisibleUTF16WidthFn(UTF-16). The UTF-16 escape state machine's per-codepoint ASCII loop now bulk-scans insidesaw_csi/saw_oscinstead of stepping each byte through the switch. Also adds C1 ST (0x9C) as an OSC terminator, closing a behavioral gap with the C++consumeANSI(which already handled it) and conforming to the ECMA-48 spec.5. Better stringWidth bench inputs (bench/snippets/string-width.mjs)
Replaced misleading literal-bracket inputs (which had no real
\x1bESC bytes) with inputs that actually exercise the escape parsing path: short SGR, truecolor SGR (long CSI body), OSC hyperlinks (long OSC body, all three terminator variants: BEL / ESC\/ C1 ST0x9C), bash-style dense SGR, CJK, and UTF-16 + escape combinations. Side-by-side bun + npm benchmarks for each input.Benchmarks
stripANSI— vs 1.3.10 baseline and npmstrip-ansiHeadline win:
plainLong65.4ns → 16.9ns (~4x). That's the 4× prologue on the no-escape ASCII fast path — the most common case in real terminal output.hyperlink-24% is the OSC SIMD scan inconsumeANSI. bash benchmarks improved 4-9% — modest because bash output is dominated by short SGR codes (2-5 byte CSI bodies) where the SIMD scan barely fires its vector loop.stringWidth— vs 1.3.10 baselineHeadline win:
hyperlink+emoji~2.0ms → 180µs (~11x). This is the UTF-16 escape state machine refactor doing exactly what it was designed for — long OSC payloads (URLs in hyperlinks embedded in UTF-16 strings with emoji) previously walked byte-by-byte through the per-codepoint state machine, now skip the entire payload in one SIMD scan. The throughput improvement (0.22 → 2.4 chars/ns) brings the UTF-16 path in line with the Latin-1 path (3.4 chars/ns).Other escape-containing benchmarks improved 6-11% — modest because short SGR codes have ~3-byte bodies where the SIMD scan barely beats byte-by-byte. The win scales linearly with escape body length: short SGR codes give ~8%, long OSC payloads give 11x.
Pure-ASCII / no-escape paths (
ascii,emoji,cjk) are within ±5% — that's noise; those paths were already fast and weren't the target of this change.stringWidth— vs npmstring-widthLargest size of each input from
bench/snippets/string-width.mjs:† npm
string-width'sansi-regexonly recognizes BEL (0x07) as an OSC terminator. For ESC\and C1 ST (0x9C) terminators, npm gives wrong widths (it counts the URL bytes as visible content). The speedup column for these rows is "speed of producing wrong output" vs "speed of producing correct output" — not a strict apples-to-apples comparison, but reflects what users would see if they fed real terminal output into npmstring-width.For BEL-terminated input (where both implementations agree on output), bun is 4-822x faster depending on input size and content. The 822x on plain ASCII is npm's regex engine doing per-character work even when there's nothing to strip — bun's path returns immediately after the SIMD ASCII scan.
wrapAnsi— vs 1.3.10 baselineAll benchmarks within ±3% of 1.3.10 (noise). wrapAnsi has its own state machine and the shared ANSI helpers are not on its hot path; the bottleneck there is grapheme cluster + display-width tracking, not ANSI scanning. No regressions, no wins.
How did you verify your code works?
All 640 tests pass across the 7 ANSI/width test suites:
test/js/bun/util/stripANSI.test.tstest/js/bun/util/sliceAnsi.test.tstest/js/bun/util/sliceAnsi-fuzz.test.tstest/js/bun/util/wrapAnsi.test.tstest/js/bun/util/wrapAnsi.npm.test.tstest/js/bun/util/stringWidth.test.tstest/js/node/readline/getStringWidth.test.ts