Conversation
0ed698b to
c41df9b
Compare
|
I should add some unit tests for the types in |
c41df9b to
dc9ac57
Compare
38e62dd to
1ffc670
Compare
|
The dependent CLs have been merged. There are two failures in CI. One issue is that this version of MSVC++ seems to be unable to instantiate |
83bfac8 to
29a0ca6
Compare
This was a bug in my code related to I'm still trying to find a workaround for the MSVC++ issue. |
29a0ca6 to
a4d656e
Compare
|
The Verific build is failing because libgmock-dev isn't installed on that system. I don't know how to fix that. |
01b8f9c to
b438afc
Compare
|
CI is clean except that the system that runs Verific tests needs gmock installed. |
|
@mmicko The test-verific job is running in an unmanaged custom environment right? Please try adding gmock, it should fix the unit tests here since they use the convenient matchers like |
|
@widlarizer Update CI docker, all green now |
widlarizer
left a comment
There was a problem hiding this comment.
I've been working on this review for a couple of days so here's an incomplete review. I've also put up a PR into this PR branch with comments that make the data flow clearer
| thread_state.next_batch.emplace_back(std::move(work)); | ||
| if (GetSize(thread_state.next_batch) < batch_size) | ||
| return; | ||
| bool was_empty; |
There was a problem hiding this comment.
Probably needs explicit initialization
There was a problem hiding this comment.
Does it? We initialize it unconditionally three lines later.
|
|
||
| template <typename V> | ||
| struct DefaultCollisionHandler { | ||
| void operator()(typename V::Accumulated &, typename V::Accumulated &) const {} |
There was a problem hiding this comment.
I would prefer for this to error out. It's not meeting the spec of "used to reduce two V::Accumulated values into a single value."
There was a problem hiding this comment.
Arguably it is :-). The default behavior is just "pick one of the two values" (and we pick the 'current' value because that's free). I'll add a comment to DefaultCollisionHandler and you can tell me if it's satisfying :-).
Would you prefer me to merge these changes into my commits (losing attribution) or keep your commit separate in my stack of commits? |
We've already talked about adding this as an alternative to `log_id()`, and we'll need it later in this PR.
`log_error()` causes an exit so we don't have to try too hard here. The main thing is to ensure that we normally are able to exit without causing a stack overflow due to recursive asserts about not being in a `Multithreaded` context.
This causes problems when compiling with fuzzing instrumenation enabled.
…dIndex` We'll use these later in this PR.
We'll use this later in the PR.
We'll use this later in the PR.
We'll use this later in the PR.
We'll use this later in the PR.
We'll use this later in the PR.
We'll use this later in the PR.
We will want to query `keep_cache` from parallel threads. If we compute the results on-demand, that means we need synchronization for cache access in those queries, which adds complexity and overhead. Instead, prefill the cache with the status of all relevant modules. Note that this doesn't actually do more work --- we always consult `keep_cache` for all cells of all selected modules, so scanning all those cells and determining the kept status of all dependency modules is always required. Later in this PR we're going to parallelize `scan_module` itself, and that's also much easier to do when no other parallel threads are running.
Turns out this is not strictly necessary for this PR but it's still a good thing to do and makes it clearer that the stats are not modified in a possibly racy way.
…cache_t::scan_module()` with it
…`, and parallelize `remove_temporary_cells`
b438afc to
7f1b4dc
Compare
If your work is part of a larger effort, please discuss your general plans on Discourse first to align your vision with maintainers.
This is the next step after parallelizing
opt_merge.This PR depends on #5621, #5629 and #5631. Once those are merged, I will clean up this PR. I'm putting it here now in case people want to see what I'm planning and so I can get CI clean.
opt_cleanis much more complex thanopt_mergeand the changes here are correspondingly greater. In particular I need to introduce several more parallel abstractions. I have done my best to preserve the original code structure. I've fuzzed millions of testcases to detect any differences in results between this and the original pass, and as far as I know I've fixed them all.The scalability on large flattened circuits is not quite as impressive as it was for

opt_mergebut it's still pretty good. For a circuit with 3.5M cells withopt_mergealready applied, runningopt_cleanonce removes about 10% of the wires. Then I runopt_cleanagain to measure the scalability when there is nothing to remove (a common case). The results:So with 40 cores we get a 6x speedup in the dirty case and a 9x speedup in the clean case. But the 1-core case is actually 1.6x faster than current yosys
main(a68fee1) and the clean case is 2.1x faster, so the clean case is actually >20x current yosysmain. (The dirty case doesn't parallelize as well because modifying RTLIL has to happen on the main thread and there are a lot of wires to remove in this case. This could be improved but it might not be worth the extra complexity --- removing this much of the design is probably rare.)For smaller circuits there is a penalty for the extra complexity, but I've done my best to mitigate that and offset it with optimizations. For the jpeg synth testcase (
read_verilog -sv -I~/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth), this is very slightly faster than current yosysmainwith or without multicore, on my system:This is a big PR so let me know what I can do to make it easier to swallow!