Performance improvement #448
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem & Rationale
Parsing defers field assignments until the winning branch is known. Each
Defercall allocates a freshcontextFieldSetso those captured values survive branch backtracking. Benchmarks showedparseContext.Defer/Branchaccounting for nearly half of total allocations (pprof:Branch~25%,Defer~19%) even on tiny inputs. These structs are short‑lived, small, and have fixed shape, so recycling them avoids steady heap pressure and reduces GC work without touching parser semantics.Fix
This change adds a
sync.PoolofcontextFieldSetobjects.Defernow grabs a zeroed struct from the pool, fills it, andApplyreturns each struct to the pool after invokingsetField. No other behaviour changes, branches still copyapplyslices, and errors propagate the same way.Benchmark
Both participle variants improved about 6–7% in wall time and shed ~350–400 KiB + ~150 allocations per parse (compared to the pre‑change baselines of 127 µs / 172 KB / 2053 allocs and 78 µs / 167 KB / 1817 allocs).
Extending the Technique
Hoping that this technique is sound, it's observable that even after pooling
contextFieldSet, profiling the Thrift benchmark still showedparseContext.Branchdominating allocations: every speculative branch clones an entireparseContext, and failed branches keep their deferred captures alive until GC.go tool pprof -alloc_spaceattributed ~25% of bytes toBranchand ~19% toDefer, so eliminating those short-lived context copies promised another allocation drop.Extending the fix
sync.PoolforparseContextinstances (context.go:37-118) plus small helpers:discardDeferredzeros and returns any unused capture records, andrecyclehands the whole context back to the pool.Acceptnow recycles the accepted branch automatically.nodes.go:263-512) now explicitly callsbranch.recycle(false)when a branch fails, ensuring both the context and any deferred captures are released immediately.Stop,Accept, and error tracking all behave exactly as before; only swapped raw allocations for pooled scratch structs.Second line of Benchmark
With both optimisations:
Compared to the prior (already pooled captures) run at ~119 µs/op with 140 kB / 1902 allocs, the new branch pooling holds throughput steady while cutting another ~40% of heap use (98 kB, 1638 allocs) for the runtime-built parser; the generated parser sees a similar improvement (from 136 kB / 1666 allocs down to 94 kB / 1402 allocs). Go-thrift remains the same, so participle now wins clearly on allocation footprint while matching its earlier speed.