Faster megular expressions #265

MarcoPolo · 2025-02-11T01:42:05Z

This PR stills needs some cleanup but sharing early to get some feedback.

This is overall 4x faster than the original PR for the normal case (no preallocation).

The main changes are:

faster Code() method that avoids an unnecessary copy for the Protocol type. (I feel like this should have been optimized by the compiler though)
Capture state is now a linked list rather than a map. This lets us "fork" states cheaply with structural sharing.
Allocated MatchStates to a slice, and use indices as handles. This is a big change that does introduce a fair amount of complexity. I'll expand a bit on this next.

MatchStates and Index handles:

By allocating to a single slice rather than the heap, we can prepare the memory upfront and reduce our overall memory pressure. We also make the MatchState much smaller (24 bytes from 48 bytes). The smaller size and having the states in contiguous memory enable it to be more cache friendly.

On my machine, with these changes, using Megular expressions normally is about as fast as the ForEach style in v0.14.

Benchmark Results

Assume this system unless otherwise noted:

goos: darwin
goarch: arm64
pkg: github.com/multiformats/go-multiaddr
cpu: Apple M1 Max

go-multiaddr @ v0.14 using .ForEach https://github.com/multiformats/go-multiaddr/blob/marco/v0.14.0-bench/bench_test.go:

BenchmarkIsWebTransportMultiaddrForEach-10          1499752               783.0 ns/op          1912 B/op         13 allocs/op

The original megular expressions PR:

BenchmarkIsWebTransportMultiaddr-10       422496              2503 ns/op            2888 B/op         74 allocs/op

All but the Index handles change (the first two points from above):

BenchmarkIsWebTransportMultiaddrPrealloc-10                      2924878               414.3 ns/op           160 B/op          9 allocs/op
BenchmarkIsWebTransportMultiaddrNoCapturePrealloc-10             5895022               202.8 ns/op             0 B/op          0 allocs/op
BenchmarkIsWebTransportMultiaddrNoCapture-10                     1276366               937.1 ns/op          1144 B/op         28 allocs/op
BenchmarkIsWebTransportMultiaddr-10                               716146              1659 ns/op            1656 B/op         59 allocs/op
BenchmarkIsWebTransportMultiaddrLoop-10                          4707390               252.7 ns/op           136 B/op         12 allocs/op

With the index handles change

BenchmarkIsWebTransportMultiaddrPrealloc-10                      3011650               382.4 ns/op           160 B/op          9 allocs/op
BenchmarkIsWebTransportMultiaddrNoCapturePrealloc-10             6963790               171.7 ns/op             0 B/op          0 allocs/op
BenchmarkIsWebTransportMultiaddrNoCapture-10                     3364348               357.9 ns/op           472 B/op          2 allocs/op
BenchmarkIsWebTransportMultiaddr-10                              1377448               883.8 ns/op           920 B/op         25 allocs/op
BenchmarkIsWebTransportMultiaddrLoop-10                          4797229               250.4 ns/op           136 B/op         12 allocs/op

It's really fast. No surprise

sukunrt · 2025-02-13T11:49:41Z

x/meg/meg.go

-	generation int
-	code       int
+	capture captureFunc
+	next    int


use int64, int is implementation dependent. The bitwise calculations won't work for 32bit machines.

I'm not relying on int64 bitwise calculations here. Maybe you're thinking of the
visited bitset which is explicitly int64. That's used to flip the bit of when
we've visited a matchstate in the array by its index. And it's fine if that
index is int32 or int64.

sukunrt · 2025-02-13T11:50:33Z

Changes LGTM!
Feel free to merge this to the main PR.

sukunrt · 2025-02-13T11:52:41Z

x/meg/meg.go

+			stack = append(stack, task{splitIdx, t.cap})
+			stack = append(stack, task{s.next, t.cap})


IIRC, this was using recursion before and an explicit stack now.
How much better is the stack compared to recursion here?

Recursive

goos: darwin goarch: arm64 pkg: github.com/multiformats/go-multiaddr/x/meg cpu: Apple M1 Max BenchmarkIsWebTransportMultiaddrPrealloc-10 2778994 429.7 ns/op 160 B/op 9 allocs/op BenchmarkIsWebTransportMultiaddrNoCapturePrealloc-10 5164788 231.8 ns/op 0 B/op 0 allocs/op BenchmarkIsWebTransportMultiaddrNoCapture-10 2945781 414.0 ns/op 472 B/op 2 allocs/op BenchmarkIsWebTransportMultiaddr-10 1296012 919.5 ns/op 920 B/op 25 allocs/op

Non-recursive

BenchmarkIsWebTransportMultiaddrPrealloc-10 3170400 380.5 ns/op 160 B/op 9 allocs/op BenchmarkIsWebTransportMultiaddrNoCapturePrealloc-10 6977448 171.9 ns/op 0 B/op 0 allocs/op BenchmarkIsWebTransportMultiaddrNoCapture-10 3290984 353.9 ns/op 472 B/op 2 allocs/op BenchmarkIsWebTransportMultiaddr-10 1374780 863.3 ns/op 920 B/op 25 allocs/op

This is the recursive implementation:

func appendStateRecursive(arr statesAndCaptures, states []MatchState, stateIndex int, c *capture, visitedBitSet []uint64) statesAndCaptures { if stateIndex >= len(states) { return arr } s := states[stateIndex] if visitedBitSet[stateIndex/64]&(1<<(stateIndex%64)) != 0 { return arr } visitedBitSet[stateIndex/64] |= 1 << (stateIndex % 64) if s.codeOrKind < done { arr = appendStateRecursive(arr, states, s.next, c, visitedBitSet) arr = appendStateRecursive(arr, states, restoreSplitIdx(s.codeOrKind), c, visitedBitSet) } else { arr.states = append(arr.states, stateIndex) arr.captures = append(arr.captures, c) } return arr }

It's a bit faster, but if you prefer the simplicity of the recursive approach I'm happy to change it back.

I do prefer the simplicity of the recursive approach

sukunrt · 2025-02-13T12:23:26Z

It would be interesting to have a benchmark for a multiaddr to String and then using a stdlib string regex on it.

sukunrt · 2025-02-13T12:38:51Z

Okay this is much slower:

func BenchmarkIsWebRTCDirectMultiaddrString(b *testing.B) {
	addr := multiaddr.StringCast("/ip4/1.2.3.4/udp/1234/webrtc-direct/")

	b.ResetTimer()
	b.ReportAllocs()
	reg := regexp.MustCompile(`^/ip4/(?P<ip>.+)/udp/(?P<port>.+)/webrtc-direct(/certhash/[^/]+)*$`)
	for i := 0; i < b.N; i++ {
		addrS := addr.String()
		res := reg.FindStringSubmatch(addrS)
		if len(res) != 4 {
			b.Fatal("unexpected result", addrS, len(res))
		}
	}
}

* much cheaper copies of captures * Add a benchmark * allocate to a slice. Use indexes as handles * cleanup * Add nocapture loop benchmark It's really fast. No surprise * cleanup * nits

MarcoPolo added 6 commits February 10, 2025 17:10

much cheaper copies of captures

54d0eb1

Add a benchmark

43bc6cc

allocate to a slice. Use indexes as handles

0592cb0

cleanup

3efa8ba

Add nocapture loop benchmark

f0bfb07

It's really fast. No surprise

cleanup

4dd44d1

sukunrt reviewed Feb 13, 2025

View reviewed changes

sukunrt approved these changes Feb 13, 2025

View reviewed changes

nits

c689277

MarcoPolo marked this pull request as ready for review February 20, 2025 01:20

MarcoPolo merged commit ff4bf42 into marco/match-and-capture Feb 20, 2025

p-shahi mentioned this pull request Feb 26, 2025

implement a new multiaddress API #198

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster megular expressions #265

Faster megular expressions #265

MarcoPolo commented Feb 11, 2025 •

edited

Loading

sukunrt Feb 13, 2025

MarcoPolo Feb 20, 2025

sukunrt commented Feb 13, 2025

sukunrt Feb 13, 2025

MarcoPolo Feb 20, 2025

sukunrt Feb 20, 2025

sukunrt commented Feb 13, 2025

sukunrt commented Feb 13, 2025

		stack = append(stack, task{splitIdx, t.cap})
		stack = append(stack, task{s.next, t.cap})

Faster megular expressions #265

Faster megular expressions #265

Conversation

MarcoPolo commented Feb 11, 2025 • edited Loading

MatchStates and Index handles:

Benchmark Results

sukunrt Feb 13, 2025

Choose a reason for hiding this comment

MarcoPolo Feb 20, 2025

Choose a reason for hiding this comment

sukunrt commented Feb 13, 2025

sukunrt Feb 13, 2025

Choose a reason for hiding this comment

MarcoPolo Feb 20, 2025

Choose a reason for hiding this comment

sukunrt Feb 20, 2025

Choose a reason for hiding this comment

sukunrt commented Feb 13, 2025

sukunrt commented Feb 13, 2025

MarcoPolo commented Feb 11, 2025 •

edited

Loading