Skip to content

Faster stripANSI#28767

Merged
Jarred-Sumner merged 8 commits intomainfrom
jarred/faster-strip-ansi
Apr 2, 2026
Merged

Faster stripANSI#28767
Jarred-Sumner merged 8 commits intomainfrom
jarred/faster-strip-ansi

Conversation

@Jarred-Sumner
Copy link
Copy Markdown
Collaborator

@Jarred-Sumner Jarred-Sumner commented Apr 2, 2026

What does this PR do?

Speeds up Bun.stripANSI, Bun.stringWidth, and the shared ANSI parsing helpers used by Bun.sliceAnsi / Bun.wrapAnsi / Node's readline.getStringWidth. Five changes:

1. 4×-unrolled prologue in findEscapeCharacter and firstNonAsciiPrintable (C++)

The single-chunk SIMD scan loop was bottlenecked on the per-chunk umaxv → fmov → cbz chain (the NEON→GPR transfer that has to complete before the loop can branch). The 4× prologue ORs four chunks' hit masks together and only does the transfer + branch every 64 bytes (8-bit) or 32 halfwords (16-bit). The compiler fuses the four sequential 16-byte loads into two ldp (load pair) instructions for an efficient 32-byte fetch, and exact-match refinement is now lazy — only runs on a real broad-mask hit. Same pattern as glibc/musl memchr.

2. SIMD terminator scans in consumeANSI (C++)

The CSI/OSC/ST parsing states were stepping byte-by-byte through escape payloads. For OSC sequences (hyperlinks with long URLs), this was the dominant cost. Two new templated helpers — scanForByteInRange<Lo, Hi> (CSI terminator in [0x40, 0x7E], wrapping subtract + ≤ compare) and scanForAnyByte<Targets...> (OSC/ST terminators via templated SIMD::equal<a,b,c>) — replace the inner inCsi/inOsc/needSt per-byte walks with bulk SIMD calls.

3. Flat Vector<Char> buffer in stripANSI (C++)

Replaced WTF::StringBuilder with a Vector<Char> allocated lazily on first ESC (grow(input.size()) skips per-element initialization for POD types), filled with raw memcpy and a manual cursor pointer, then adopted directly into the output String via String::adopt(std::move(buffer)). Skips the per-append bookkeeping and the final toString() shrink-copy. Conditional shrinkToFit() only when over-allocation is significant. (Fixed a follow-up bug where the lazy-allocation guard was !foundANSI rather than cursor == nullptr, causing the cursor to reset on the iteration after a broad-mask false positive — flagged in code review.)

4. Mirror the SIMD optimizations in Zig stringWidth (Zig)

Same 4× prologue + lazy reduction in visibleLatin1Width and countPrintableAscii16. Generic SIMD scan helpers (scanLaneInRange<T> / scanLaneAnyOf<T>) replace the byte-by-byte CSI/OSC walks in both visibleLatin1WidthExcludeANSIColors (Latin-1) and visibleUTF16WidthFn (UTF-16). The UTF-16 escape state machine's per-codepoint ASCII loop now bulk-scans inside saw_csi / saw_osc instead of stepping each byte through the switch. Also adds C1 ST (0x9C) as an OSC terminator, closing a behavioral gap with the C++ consumeANSI (which already handled it) and conforming to the ECMA-48 spec.

5. Better stringWidth bench inputs (bench/snippets/string-width.mjs)

Replaced misleading literal-bracket inputs (which had no real \x1b ESC bytes) with inputs that actually exercise the escape parsing path: short SGR, truecolor SGR (long CSI body), OSC hyperlinks (long OSC body, all three terminator variants: BEL / ESC \ / C1 ST 0x9C), bash-style dense SGR, CJK, and UTF-16 + escape combinations. Side-by-side bun + npm benchmarks for each input.

Benchmarks

stripANSI — vs 1.3.10 baseline and npm strip-ansi

Benchmark 1.3.10 this PR Δ vs 1.3.10 npm speedup vs npm
plain (13c) 8.26ns 8.58ns +4% 10.89ns 1.27x
plainLong (1000c, no escapes) 65.40ns 16.88ns -74% 32.21ns 1.91x
smallAnsi (17c) 45.20ns 43.17ns -4% 86.11ns 1.99x
hyperlink (45c, OSC 8) 59.57ns 45.12ns -24% 65.67ns 1.46x
mixedShort (69c) 90.48ns 80.02ns -12% 154.42ns 1.93x
bash5KB 2.63µs 2.53µs -4% 6.08µs 2.40x
bash50KB 24.71µs 23.45µs -5% 58.10µs 2.48x
bash150KB 78.97µs 71.56µs -9% 173.50µs 2.42x

Headline win: plainLong 65.4ns → 16.9ns (~4x). That's the 4× prologue on the no-escape ASCII fast path — the most common case in real terminal output. hyperlink -24% is the OSC SIMD scan in consumeANSI. bash benchmarks improved 4-9% — modest because bash output is dominated by short SGR codes (2-5 byte CSI bodies) where the SIMD scan barely fires its vector loop.

stringWidth — vs 1.3.10 baseline

Benchmark 1.3.10 this PR Δ
ascii (25Kc) 1.76µs 1.72µs -2%
short-sgr (70Kc) ~125µs 112.40µs -10%
truecolor (140Kc) ~135µs 120.72µs -10%
hyperlink Latin-1 (445Kc) ~135µs 119.66µs -11%
dense-sgr (205Kc) ~390µs 357.46µs -8%
emoji (60Kc) ~150µs 138.81µs -7%
ansi+emoji (150Kc) ~340µs 319.42µs -6%
hyperlink+emoji (440Kc) ~2.0ms 180.46µs ~-91% (~11x)
cjk (35Kc) ~660µs 633.04µs -4%

Headline win: hyperlink+emoji ~2.0ms → 180µs (~11x). This is the UTF-16 escape state machine refactor doing exactly what it was designed for — long OSC payloads (URLs in hyperlinks embedded in UTF-16 strings with emoji) previously walked byte-by-byte through the per-codepoint state machine, now skip the entire payload in one SIMD scan. The throughput improvement (0.22 → 2.4 chars/ns) brings the UTF-16 path in line with the Latin-1 path (3.4 chars/ns).

Other escape-containing benchmarks improved 6-11% — modest because short SGR codes have ~3-byte bodies where the SIMD scan barely beats byte-by-byte. The win scales linearly with escape body length: short SGR codes give ~8%, long OSC payloads give 11x.

Pure-ASCII / no-escape paths (ascii, emoji, cjk) are within ±5% — that's noise; those paths were already fast and weren't the target of this change.

stringWidth — vs npm string-width

Largest size of each input from bench/snippets/string-width.mjs:

Benchmark bun npm speedup
ascii (25Kc) 1.80µs 1.48ms 822x
short-sgr (70Kc) 116µs 1.84ms 16x
truecolor (140Kc) 123µs 2.09ms 17x
hyperlink-bel (445Kc) 122µs 1.38ms 11x
hyperlink-st (455Kc) † 122µs 25.5ms 209x
hyperlink-c1st (445Kc) † 74µs 25.5ms 344x
dense-sgr (205Kc) 370µs 5.28ms 14x
emoji (60Kc) 141µs 3.47ms 25x
ansi+emoji (150Kc) 336µs 4.10ms 12x
hyperlnk+emoji-bel (440Kc) 185µs 743µs 4x
hyperlnk+emoji-st (450Kc) † 189µs 25.3ms 134x
hyperlnk+emoji-c1st (440Kc) † 104µs 24.7ms 238x
cjk (35Kc) 623µs 2.57ms 4x

† npm string-width's ansi-regex only recognizes BEL (0x07) as an OSC terminator. For ESC \ and C1 ST (0x9C) terminators, npm gives wrong widths (it counts the URL bytes as visible content). The speedup column for these rows is "speed of producing wrong output" vs "speed of producing correct output" — not a strict apples-to-apples comparison, but reflects what users would see if they fed real terminal output into npm string-width.

For BEL-terminated input (where both implementations agree on output), bun is 4-822x faster depending on input size and content. The 822x on plain ASCII is npm's regex engine doing per-character work even when there's nothing to strip — bun's path returns immediately after the SIMD ASCII scan.

wrapAnsi — vs 1.3.10 baseline

All benchmarks within ±3% of 1.3.10 (noise). wrapAnsi has its own state machine and the shared ANSI helpers are not on its hot path; the bottleneck there is grapheme cluster + display-width tracking, not ANSI scanning. No regressions, no wins.

How did you verify your code works?

All 640 tests pass across the 7 ANSI/width test suites:

  • test/js/bun/util/stripANSI.test.ts
  • test/js/bun/util/sliceAnsi.test.ts
  • test/js/bun/util/sliceAnsi-fuzz.test.ts
  • test/js/bun/util/wrapAnsi.test.ts
  • test/js/bun/util/wrapAnsi.npm.test.ts
  • test/js/bun/util/stringWidth.test.ts
  • test/js/node/readline/getStringWidth.test.ts
640 pass
0 fail
6079 expect() calls
Ran 640 tests across 7 files. [2.35s]

@robobun
Copy link
Copy Markdown
Collaborator

robobun commented Apr 2, 2026

Updated 10:49 PM PT - Apr 1st, 2026

@Jarred-Sumner, your commit d8220c9 has 2 failures in Build #43199 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 28767

That installs a local version of the PR into your bun-28767 executable, so you can run:

bun-28767 --bun

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaced a static dyld symbol reference with a runtime dlsym lookup; added 4× unrolled SIMD prologues and new SIMD scan helpers; replaced byte-by-byte ANSI/OSC/CSI scanning with bulk SIMD; switched ANSI stripping to lazy vector allocation; expanded string-width benchmarks and tightened mismatch reporting.

Changes

Cohort / File(s) Summary
Dynamic dyld lookup
scripts/build/shims/asan-dyld-shim.c
Added #include <dlfcn.h> and replaced a direct extern call to _dyld_get_dyld_header() with a dlsym(RTLD_DEFAULT, "_dyld_get_dyld_header") function-pointer lookup; shim now calls the resolved symbol conditionally and returns -1 when unavailable.
SIMD helpers & unrolled prologues (C++)
src/bun.js/bindings/ANSIHelpers.h
Added 4×‑unrolled SIMD prologues for findEscapeCharacter and firstNonAsciiPrintable; introduced scanForByteInRange and scanForAnyByte templates; replaced bytewise CSI/OSC/ST checks with bulk SIMD scans (keeps 1× SIMD tail).
ANSI stripping: lazy buffer (C++)
src/bun.js/bindings/stripANSI.cpp
Replaced WTF::StringBuilder with a lazy WTF::Vector<Char> that defers allocation until an escape is found; returns std::nullopt for inputs without escapes; copies segments via memcpy and applies shrink-to-fit heuristics before adopting the result.
SIMD visible-width & ANSI parsing (Zig)
src/string/immutable/visible.zig
Added visible.printableMaskLatin1, new SIMD scan helpers (scanLaneInRange, scanLaneAnyOf), and rewrote Latin‑1/ASCII width loops to use 4×16 (64‑byte) unrolled SIMD main loops plus SIMD tails; replaced bytewise CSI/OSC terminator scans with SIMD range/choice scans and adjusted OSC ESC \/stray-ESC handling.
Benchmark inputs & reporting (JS)
bench/snippets/string-width.mjs
Expanded benchmark cases (SGR variants, truecolor, dense SGR, OSC8 hyperlinks, UTF‑16/emoji/CJK cases), introduced ESC/URL constants, updated bench labeling to include textLabel, and improved mismatch errors to show label, repeat count, and both computed widths.
🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Faster stripANSI' is directly related to the main focus of the PR, which optimizes the stripANSI function. However, the PR also optimizes stringWidth, sliceAnsi, wrapAnsi, and shared ANSI parsing helpers, making the title somewhat narrow despite addressing the primary change.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering all required sections with detailed explanations of changes, benchmarks, and test results.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/ANSIHelpers.h`:
- Around line 302-322: Align the loop condition style in firstNonAsciiPrintable
with findEscapeCharacter by using a signed comparison: change the test that
currently reads static_cast<size_t>(end - it) >= stride4 to compare the signed
difference (end - it) against a signed stride (e.g.,
static_cast<ptrdiff_t>(stride4)), so the loop uses the same defensive signed
check as findEscapeCharacter; update the loop condition in
firstNonAsciiPrintable (and any analogous loop using
stride4/stride/stride2/stride3) to use the signed comparison with ptrdiff_t to
avoid unsigned wraparound.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ab591d5-0214-4b6f-b81a-d66f521a691a

📥 Commits

Reviewing files that changed from the base of the PR and between 1e178d1 and afb9b0f.

📒 Files selected for processing (2)
  • scripts/build/shims/asan-dyld-shim.c
  • src/bun.js/bindings/ANSIHelpers.h

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean 4x SIMD unroll and a straightforward dlsym fix for the private symbol link issue.

Extended reasoning...

Overview

Two files changed: scripts/build/shims/asan-dyld-shim.c (build shim, macOS-only) and src/bun.js/bindings/ANSIHelpers.h (SIMD string scanning helpers). The PR is already merged to main as commit afb9b0f.

Security risks

None. The dlsym call uses RTLD_DEFAULT to look up a well-known private Apple symbol, with an explicit NULL guard. The SIMD changes are pure computation with no I/O or allocation.

Level of scrutiny

Both changes are mechanical optimizations with clear correctness arguments. The asan shim change is a one-liner workaround for a linker issue (private symbol absent from .tbd stubs). The SIMD unroll follows a standard libc memchr pattern: OR four chunks, branch once, then narrow in order — correctness is straightforward to verify by inspection.

Other factors

No bugs were found by the automated bug hunting system. The code is well-commented, explaining both the motivation (NEON umov+cbnz hazard) and the correctness invariant (must narrow chunks in-order to find the first match). The tail handling (1x SIMD + scalar) is unchanged from the original and correctly handles remaining bytes.

Replace byte-by-byte state-machine loops in consumeANSI's CSI/OSC/ST
parsing states with SIMD scans for terminator bytes. Also switch
stripANSI from StringBuilder to a flat Vector<Char> with a manual cursor
and direct buffer adoption to skip per-append bookkeeping.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/bindings/stripANSI.cpp`:
- Around line 42-60: The code resets cursor to buffer.begin() only when
foundANSI is false, but foundANSI is only set when a real ANSI sequence is
consumed, so a non-consuming hit (ANSI::consumeANSI returning escPos) can copy
bytes before the first real ANSI and later wipe them when the buffer is lazily
grown; to fix, ensure the output buffer is initialized and cursor set exactly
once on first ESC discovery (e.g., move buffer.grow(...) and cursor =
buffer.begin() into the branch that runs on the first ESC and mark foundANSI =
true at that time so subsequent literal-byte copies don't cause a later grow to
reset cursor), touching the foundANSI flag, buffer.grow, cursor,
ANSI::consumeANSI, escPos and start logic accordingly; also add a regression
test asserting that an input like "A\\x9C\\x1B[31mB" yields "A\\x9CB" (literal
U+009C before first ANSI) to prevent future regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 842ce53b-dd99-41c4-9850-5aa627201216

📥 Commits

Reviewing files that changed from the base of the PR and between afb9b0f and 3bbef48.

📒 Files selected for processing (2)
  • src/bun.js/bindings/ANSIHelpers.h
  • src/bun.js/bindings/stripANSI.cpp

Mirror the C++ ANSI helper optimizations into the Zig stringWidth path.
4x-unroll the main loops in visibleLatin1Width and countPrintableAscii16
to amortize the per-chunk addv->fmov reduction hazard, and replace the
byte-by-byte CSI/OSC walks in visibleLatin1WidthExcludeANSIColors with
inline @vector range/multi-target scans.
stripANSI: guard the lazy buffer allocation on `cursor == nullptr`
instead of `!foundANSI`. A broad-mask false positive (e.g. standalone
0x9c) sets the buffer up but leaves foundANSI=false, so the next
iteration with a real escape would re-fire the guard and reset the
cursor to buffer.begin(), discarding the previously written byte.

stringWidth: refactor visibleUTF16WidthFn's per-codepoint ASCII loop to
SIMD-scan inside the saw_csi/saw_osc states instead of stepping
byte-by-byte. Generalize the existing scanByteInRange/scanAnyByte u8
helpers to scanLaneInRange/scanLaneAnyOf with a comptime element type so
they work for both Latin-1 and UTF-16 lanes. Long OSC payloads
(hyperlinks with URLs) now skip the entire payload in one bulk SIMD
scan rather than walking each byte through the state machine.

Update bench/snippets/string-width.mjs to actually exercise the escape
parsing path: replace the misleading literal-bracket inputs with real
\x1b ESC bytes, and add inputs for short SGR, truecolor SGR (long CSI
body), OSC hyperlinks (long OSC body), bash-style dense SGR, CJK, and
UTF-16 + escape combinations. The textLabel is now visible in the bench
output instead of being truncated.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/string/immutable/visible.zig`:
- Around line 745-748: The OSC fast path dropped C1 ST (0x9C) which leaves OSC
mode unclosed; update the SIMD/target sets and UTF-16 checks to include 0x9C as
a valid OSC terminator. Specifically, restore 0x9C in the Latin-1 target array
used by scanLaneAnyOf (and the equivalent target lists around the other
occurrences you flagged) and update the UTF-16/CP handling that checks for BEL
or ESC '\\' to also treat cp == 0x9C as an OSC terminator so the parser exits
OSC mode and the printable suffix is included in width calculation; apply the
same change at the other instances noted (around the blocks you referenced:
~797-815 and ~1205-1214).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bed50880-7065-43a4-924f-88d7c6b24313

📥 Commits

Reviewing files that changed from the base of the PR and between 09b12dd and e5f15e9.

📒 Files selected for processing (3)
  • bench/snippets/string-width.mjs
  • src/bun.js/bindings/stripANSI.cpp
  • src/string/immutable/visible.zig

The Zig stringWidth's OSC parsing previously only recognized BEL (0x07)
and 7-bit ST (ESC \\) as terminators, missing C1 ST (0x9C) — the standard
ECMA-48 single-byte OSC terminator. The C++ ANSIHelpers consumeANSI used
by stripANSI/sliceAnsi/wrapAnsi already handles 0x9C, so this also closes
a behavioral gap between Bun's ANSI parsers. Add 0x9C to the Latin-1 OSC
SIMD scan target list and to the UTF-16 non-ASCII codepoint OSC check.
(0x9C is > 127 so it never appears in the UTF-16 ASCII per-codepoint
region, no change needed there.)

Update bench/snippets/string-width.mjs to register both bun and npm
benchmarks side-by-side instead of picking one runtime, and put the impl
marker (`bun`/`npm`) at the front of each label so it survives the
runner's truncation.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bench/snippets/string-width.mjs`:
- Around line 25-26: The benchmark only uses BEL-terminated OSC strings (e.g.
the existing [`${ESC}]8;;${URL}\x07Bun${ESC}]8;;\x07`, "hyperlink"] entry), so
the ESC '\' (7-bit ST) and C1 ST (0x9C) OSC termination paths are never
exercised; add two more variants next to the existing "hyperlink" entry that use
ST termination: one replacing the \x07 terminator with the 7-bit ST sequence
(ESC + '\\') and one using the C1 ST byte (\x9C), e.g.
`${ESC}]8;;${URL}${ST}Bun${ESC}]8;;${ST}` and the same with `\x9C`, and make the
same additions for the other similar entries referenced around lines 39-40 so
those branches show up in perf runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: be5fa0d7-d671-49a4-b854-72570d5efc6b

📥 Commits

Reviewing files that changed from the base of the PR and between e5f15e9 and 4d2023d.

📒 Files selected for processing (2)
  • bench/snippets/string-width.mjs
  • src/string/immutable/visible.zig

The previous hyperlink bench inputs only used BEL terminators, so the
ESC backslash and C1 ST 0x9C OSC termination branches in the SIMD scan
path were never exercised in perf runs. Add two more variants for both
the Latin-1 and UTF-16 hyperlink benchmarks, one with the 7-bit ST
sequence (ESC + backslash) and one with the C1 ST byte (0x9C).

Mark these as bun-only (skipping the npm bench and the cross-impl
mismatch check) since npm string-width's ansi-regex only recognizes BEL
terminators and would otherwise give incorrect widths and break the
mismatch check.
The ban-words check flagged the two new comptime asserts in
scanLaneInRange and scanLaneAnyOf. Switch them to bun.assert.
Three pre-existing flakes hit on this branch's CI run:

- test/regression/issue/28632.test.ts: the MySQL leak threshold check
  (RSS growth < 12 MB) is too tight for ASAN runners — got 14.73 MB on
  one run. ASAN's shadow memory and noisier RSS measurements push the
  delta over the line. Double the threshold under isASAN.

- test/js/third_party/@azure/service-bus/azure-service-bus.test.ts:
  third-party Azure SDK trips BUN_JSC_validateExceptionChecks with an
  unchecked exception in JSArray pushInline. Add to
  no-validate-exceptions.txt.

- test/js/web/broadcastchannel/broadcast-channel-worker-gc.test.ts:
  ASAN catches a worker/microtask cleanup leak on exit, same family as
  the existing broadcast-channel.test.ts entry. Add to
  no-validate-leaksan.txt next to it.
@Jarred-Sumner Jarred-Sumner merged commit 4760d78 into main Apr 2, 2026
37 of 49 checks passed
@Jarred-Sumner Jarred-Sumner deleted the jarred/faster-strip-ansi branch April 2, 2026 05:03
Comment on lines +72 to +85

const size_t reserved = buffer.size();
const size_t outputLen = static_cast<size_t>(cursor - buffer.begin());
const size_t waste = reserved - outputLen;
buffer.shrink(outputLen);

// Free the slack only if we wasted significantly: capacity > 2 * length OR
// waste > 1 KB. shrinkToFit() reallocates, so for small over-allocations
// the realloc cost outweighs the memory saved.
if (reserved > 2 * outputLen || waste * sizeof(Char) > 1024) {
buffer.shrinkToFit();
}

return String::adopt(std::move(buffer));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 When stripANSI processes input containing only false-positive escape bytes (e.g. standalone 0x9C C1 ST bytes with no real ANSI sequences), it allocates a buffer and returns a String copy instead of std::nullopt, violating the optimization contract documented in the code comment. The fix is to add if (!foundANSI) return std::nullopt; at line 72 (after the while loop, before the buffer management code).

Extended reasoning...

What the bug is and how it manifests

The stripANSI function in stripANSI.cpp has a code comment that explicitly states: "For no-escape input we return std::nullopt and the caller reuses the original JSString with zero copies." The PR introduces this optimization, but it only fires on one exit path from the while loop. When all found escape-like bytes are false positives (e.g. standalone 0x9C C1 ST), the loop exits via the natural condition (start == end) rather than via the !escPos early-exit branch, and the !foundANSI guard is never reached. Execution falls through to return String::adopt(std::move(buffer)) at line 85, returning an allocated copy identical to the input instead of std::nullopt.

The specific code path that triggers it

findEscapeCharacter includes 0x9C in exactEscapeMatch (it appears explicitly in ANSIHelpers.h), so it returns a pointer to any standalone 0x9C byte. However, consumeANSI receives that pointer and enters State::start, where 0x9C is not listed as a valid introducer -- the default: branch fires and immediately returns escPos unchanged (a false positive). The false-positive path in stripANSI copies the byte literally and continues. When start reaches end this way, the while loop exits at its bottom without ever going through the if (!escPos) branch.

Why existing code does not prevent it

The only !foundANSI guard (lines 35-36) is inside the if (!escPos) block, which is only reachable when findEscapeCharacter returns nullptr -- i.e. when there were zero escape-like bytes at all. There is no equivalent guard after the while loop exits via start == end. So foundANSI == false after processing only false-positives, but the code skips straight to String::adopt(buffer) anyway.

Step-by-step proof for input "hello\x9C"

  1. findEscapeCharacter returns pointer to 0x9C at offset 5. cursor == nullptr triggers buffer.grow(6), cursor = buffer.begin().
  2. escPos > start: memcpy(cursor, "hello", 5), cursor advances to offset 5.
  3. consumeANSI(0x9C, end): enters State::start, default: branch, returns escPos unchanged.
  4. False-positive path: *cursor++ = 0x9C, start = escPos + 1 = end.
  5. While loop condition start != end is now false -- loop exits at the bottom.
  6. foundANSI is still false. No !foundANSI check is evaluated here.
  7. outputLen = 6. return String::adopt(buffer) -- returns allocated copy instead of std::nullopt.

Expected: std::nullopt (caller reuses original JSString). Actual: new String("hello\x9C").

Impact

The output is semantically correct (returned bytes match input). The impact is performance only: an unnecessary heap allocation and memcpy for any input that contains standalone 0x9C bytes but no real ANSI sequences. The PR's own code comment promises this allocation will not happen for no-ANSI inputs, but the promise is not kept when false-positive escape bytes are present. 0x9C is uncommon in practice, making this a minor missed optimization.

Addressing the pre-existing-behavior refutation

One verifier correctly notes that the old StringBuilder-based code also allocated and returned a copy for this case (it never returned nullopt at all). This is accurate -- the behavior is pre-existing. However, the PR introduces the std::nullopt optimization with an explicit code comment documenting the contract, and partially implements it: the optimization fires when there are zero escape-like bytes but not when all found bytes are false positives. The incomplete implementation is a nit rather than a regression, but it is a real gap in the new code.

Fix

Add if (!foundANSI) return std::nullopt; at line 72, immediately after the while loop and before the buffer management code. This handles the case where the loop exited via start == end after processing only false-positive bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants