extract: speed up complete_ways strategy by yuiseki · Pull Request #313 · osmcode/osmium-tool

yuiseki · 2026-04-25T07:56:29Z

Two independent changes to reduce runtime when running --strategy=complete_ways with multiple extracts on large input files.

Async double-buffered writer

Extract now owns a background writer thread. The main thread fills m_fill_buffer; when it is full the two buffers are swapped and the writer thread flushes m_flush_buffer to osmium::io::Writer while the main thread immediately continues filling the new fill buffer. Exceptions thrown by the writer thread are captured and rethrown on the main thread at the next synchronisation point.

Single-pass way node scan in Pass 1

Pass1::eway() previously scanned way.nodes() up to twice per extract, and was called independently for each extract. For N extracts this meant up to 2N scans per way. The new eway_all() override in strategy_complete_ways scans way.nodes() at most twice regardless of N, using a uint64_t bitmask to track which extracts have claimed the way. This limits the number of supported extracts to 64, which matches the existing limit documented in the man page.

Benchmark

planet-latest.osm.pbf (~86 GB), 16-tile z=2 extraction, same machine and config as the issue:

Version	Elapsed	vs baseline
upstream	51m 36s	—
this PR	41m 18s	-20%

Output verified to be identical to the upstream result for all 16 tiles.

Note

Increasing the write buffer size beyond the current 10 MB improves performance further on large sustained workloads (planet-scale extraction improved to 37m 56s, -26%, with a 64 MB buffer). That is left as a separate follow-up as the right default and whether to make it configurable deserve their own discussion.

joto · 2026-04-28T07:04:24Z

You can please split this up into two PRs? I didn't yet have the time to look at these in detail, but the way node scanning is a much "easier" change, ,so chances are greater to get that merged separately. I am not sure about the asynchronous write output. Maybe this would be better handled in libosmium, so all writes would get the advantage, not just this one command. Also this would need extensive testing. Multithreading code in C++ is tricky to get right, especially if there are exceptions involved and all that, #311 shows some broken behaviour related to multithreading and exception where I can't figure out where the problem is. So this needs a lot more scrutiny.

yuiseki · 2026-04-29T06:22:18Z

Thank you for the detailed feedback and for taking the time to look at this.

I agree with splitting the changes. I'll open a separate PR with the way node scanning change only.

Regarding the async writer, your point about libosmium is well taken — putting it there would benefit all commands rather than just extract. I'll take a look at #311 first and see if I can reproduce and understand the issue before going further in that direction.

I'll close this PR once the new one is ready.

yuiseki · 2026-04-29T11:11:18Z

Close this PR to splitting

extract: scan way nodes once across all extracts in complete_ways pass 1 #314

yuiseki added 2 commits April 25, 2026 16:10

extract: write output asynchronously using a background thread

5f61871

extract: scan way nodes once across all extracts in complete_ways pass 1

83d5070

yuiseki closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extract: speed up complete_ways strategy#313

extract: speed up complete_ways strategy#313
yuiseki wants to merge 2 commits intoosmcode:masterfrom
yuiseki:feat/extract-complete-ways-speedup

yuiseki commented Apr 25, 2026

Uh oh!

joto commented Apr 28, 2026

Uh oh!

yuiseki commented Apr 29, 2026

Uh oh!

yuiseki commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

yuiseki commented Apr 25, 2026

Async double-buffered writer

Single-pass way node scan in Pass 1

Benchmark

Note

Uh oh!

joto commented Apr 28, 2026

Uh oh!

yuiseki commented Apr 29, 2026

Uh oh!

yuiseki commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants