Skip to content

extract: speed up complete_ways strategy#313

Closed
yuiseki wants to merge 2 commits intoosmcode:masterfrom
yuiseki:feat/extract-complete-ways-speedup
Closed

extract: speed up complete_ways strategy#313
yuiseki wants to merge 2 commits intoosmcode:masterfrom
yuiseki:feat/extract-complete-ways-speedup

Conversation

@yuiseki
Copy link
Copy Markdown

@yuiseki yuiseki commented Apr 25, 2026

See #312.

Two independent changes to reduce runtime when running --strategy=complete_ways with multiple extracts on large input files.

Async double-buffered writer

Extract now owns a background writer thread. The main thread fills m_fill_buffer; when it is full the two buffers are swapped and the writer thread flushes m_flush_buffer to osmium::io::Writer while the main thread immediately continues filling the new fill buffer. Exceptions thrown by the writer thread are captured and rethrown on the main thread at the next synchronisation point.

Single-pass way node scan in Pass 1

Pass1::eway() previously scanned way.nodes() up to twice per extract, and was called independently for each extract. For N extracts this meant up to 2N scans per way. The new eway_all() override in strategy_complete_ways scans way.nodes() at most twice regardless of N, using a uint64_t bitmask to track which extracts have claimed the way. This limits the number of supported extracts to 64, which matches the existing limit documented in the man page.

Benchmark

planet-latest.osm.pbf (~86 GB), 16-tile z=2 extraction, same machine and config as the issue:

Version Elapsed vs baseline
upstream 51m 36s
this PR 41m 18s -20%

Output verified to be identical to the upstream result for all 16 tiles.

Note

Increasing the write buffer size beyond the current 10 MB improves performance further on large sustained workloads (planet-scale extraction improved to 37m 56s, -26%, with a 64 MB buffer). That is left as a separate follow-up as the right default and whether to make it configurable deserve their own discussion.

@joto
Copy link
Copy Markdown
Member

joto commented Apr 28, 2026

You can please split this up into two PRs? I didn't yet have the time to look at these in detail, but the way node scanning is a much "easier" change, ,so chances are greater to get that merged separately. I am not sure about the asynchronous write output. Maybe this would be better handled in libosmium, so all writes would get the advantage, not just this one command. Also this would need extensive testing. Multithreading code in C++ is tricky to get right, especially if there are exceptions involved and all that, #311 shows some broken behaviour related to multithreading and exception where I can't figure out where the problem is. So this needs a lot more scrutiny.

@yuiseki
Copy link
Copy Markdown
Author

yuiseki commented Apr 29, 2026

Thank you for the detailed feedback and for taking the time to look at this.

I agree with splitting the changes. I'll open a separate PR with the way node scanning change only.

Regarding the async writer, your point about libosmium is well taken — putting it there would benefit all commands rather than just extract. I'll take a look at #311 first and see if I can reproduce and understand the issue before going further in that direction.

I'll close this PR once the new one is ready.

@yuiseki
Copy link
Copy Markdown
Author

yuiseki commented Apr 29, 2026

@yuiseki yuiseki closed this Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants