extract: speed up complete_ways strategy#313
Conversation
|
You can please split this up into two PRs? I didn't yet have the time to look at these in detail, but the way node scanning is a much "easier" change, ,so chances are greater to get that merged separately. I am not sure about the asynchronous write output. Maybe this would be better handled in libosmium, so all writes would get the advantage, not just this one command. Also this would need extensive testing. Multithreading code in C++ is tricky to get right, especially if there are exceptions involved and all that, #311 shows some broken behaviour related to multithreading and exception where I can't figure out where the problem is. So this needs a lot more scrutiny. |
|
Thank you for the detailed feedback and for taking the time to look at this. I agree with splitting the changes. I'll open a separate PR with the way node scanning change only. Regarding the async writer, your point about libosmium is well taken — putting it there would benefit all commands rather than just I'll close this PR once the new one is ready. |
|
Close this PR to splitting |
See #312.
Two independent changes to reduce runtime when running
--strategy=complete_wayswith multiple extracts on large input files.Async double-buffered writer
Extractnow owns a background writer thread. The main thread fillsm_fill_buffer; when it is full the two buffers are swapped and the writer thread flushesm_flush_buffertoosmium::io::Writerwhile the main thread immediately continues filling the new fill buffer. Exceptions thrown by the writer thread are captured and rethrown on the main thread at the next synchronisation point.Single-pass way node scan in Pass 1
Pass1::eway()previously scannedway.nodes()up to twice per extract, and was called independently for each extract. For N extracts this meant up to 2N scans per way. The neweway_all()override instrategy_complete_waysscansway.nodes()at most twice regardless of N, using auint64_tbitmask to track which extracts have claimed the way. This limits the number of supported extracts to 64, which matches the existing limit documented in the man page.Benchmark
planet-latest.osm.pbf (~86 GB), 16-tile z=2 extraction, same machine and config as the issue:
Output verified to be identical to the upstream result for all 16 tiles.
Note
Increasing the write buffer size beyond the current 10 MB improves performance further on large sustained workloads (planet-scale extraction improved to 37m 56s, -26%, with a 64 MB buffer). That is left as a separate follow-up as the right default and whether to make it configurable deserve their own discussion.