Parallelize `opt_clean` pass by rocallahan · Pull Request #5664 · YosysHQ/yosys

rocallahan · 2026-02-02T18:27:54Z

If your work is part of a larger effort, please discuss your general plans on Discourse first to align your vision with maintainers.

This is the next step after parallelizing opt_merge.

This PR depends on #5621, #5629 and #5631. Once those are merged, I will clean up this PR. I'm putting it here now in case people want to see what I'm planning and so I can get CI clean.

opt_clean is much more complex than opt_merge and the changes here are correspondingly greater. In particular I need to introduce several more parallel abstractions. I have done my best to preserve the original code structure. I've fuzzed millions of testcases to detect any differences in results between this and the original pass, and as far as I know I've fixed them all.

The scalability on large flattened circuits is not quite as impressive as it was for opt_merge but it's still pretty good. For a circuit with 3.5M cells with opt_merge already applied, running opt_clean once removes about 10% of the wires. Then I run opt_clean again to measure the scalability when there is nothing to remove (a common case). The results:

So with 40 cores we get a 6x speedup in the dirty case and a 9x speedup in the clean case. But the 1-core case is actually 1.6x faster than current yosys main (a68fee1) and the clean case is 2.1x faster, so the clean case is actually >20x current yosys main. (The dirty case doesn't parallelize as well because modifying RTLIL has to happen on the main thread and there are a lot of wires to remove in this case. This could be improved but it might not be worth the extra complexity --- removing this much of the design is probably rare.)

For smaller circuits there is a penalty for the extra complexity, but I've done my best to mitigate that and offset it with optimizations. For the jpeg synth testcase (read_verilog -sv -I~/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth), this is very slightly faster than current yosys main with or without multicore, on my system:

main YOSYS_MAX_THREADS=1:
Benchmark 1: ./yosys -p "read_verilog -sv -I/usr/local/google/home/rocallahan/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth"
  Time (mean ± σ):     17.094 s ±  0.707 s    [User: 16.256 s, System: 0.841 s]
  Range (min … max):   16.333 s … 17.816 s    10 runs
 
main:
Benchmark 1: ./yosys -p "read_verilog -sv -I/usr/local/google/home/rocallahan/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth"
  Time (mean ± σ):     16.888 s ±  0.421 s    [User: 16.038 s, System: 0.858 s]
  Range (min … max):   16.508 s … 17.952 s    10 runs

parallel-opt-clean YOSYS_MAX_THREADS=1:
Benchmark 1: ./yosys -p "read_verilog -sv -I/usr/local/google/home/rocallahan/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth"
  Time (mean ± σ):     16.924 s ±  0.495 s    [User: 15.606 s, System: 1.320 s]
  Range (min … max):   16.563 s … 18.245 s    10 runs
 
parallel-opt-clean:
Benchmark 1: ./yosys -p "read_verilog -sv -I/usr/local/google/home/rocallahan/OpenROAD-flow-scripts/flow/designs/src/jpeg/include ~/OpenROAD-flow-scripts/flow/designs/src/jpeg/*.v; synth"
  Time (mean ± σ):     16.778 s ±  0.150 s    [User: 15.634 s, System: 1.179 s]
  Range (min … max):   16.602 s … 17.101 s    10 runs

This is a big PR so let me know what I can do to make it easier to swallow!

rocallahan · 2026-02-02T19:40:07Z

I should add some unit tests for the types in threading.h.

rocallahan · 2026-02-05T17:39:45Z

The dependent CLs have been merged. There are two failures in CI. One issue is that this version of MSVC++ seems to be unable to instantiate std::unordered_set with move-only elements :-(. The other issue is a crash running tests/svtypes/typedef_package.sv with Verific. I'm not sure how this CL would affect that test specifically and there's no diagnostic information in the CI log.

rocallahan · 2026-02-05T19:25:55Z

The other issue is a crash running tests/svtypes/typedef_package.sv with Verific. I'm not sure how this CL would affect that test specifically and there's no diagnostic information in the CI log.

This was a bug in my code related to init wire attributes which apparently Verific generates but Yosys does not. Fuzzing didn't catch it because the fuzzing grammar didn't generate init attributes. I've updated the PR to extend the grammar with init attributes and verified that fuzzing now catches this case. I've also added an RTLIL testcase for the bug that will catch it when tests are run without Verific.

I'm still trying to find a workaround for the MSVC++ issue.

rocallahan · 2026-02-06T00:14:22Z

The Verific build is failing because libgmock-dev isn't installed on that system. I don't know how to fix that.

rocallahan · 2026-02-08T23:44:16Z

CI is clean except that the system that runs Verific tests needs gmock installed.

widlarizer · 2026-02-11T22:49:29Z

@mmicko The test-verific job is running in an unmanaged custom environment right? Please try adding gmock, it should fix the unit tests here since they use the convenient matchers like UnorderedElementsAre

mmicko · 2026-02-12T07:35:22Z

@widlarizer Update CI docker, all green now

widlarizer

I've been working on this review for a couple of days so here's an incomplete review. I've also put up a PR into this PR branch with comments that make the data flow clearer

kernel/threading.h

widlarizer · 2026-02-11T19:44:33Z

kernel/threading.h

+		thread_state.next_batch.emplace_back(std::move(work));
+		if (GetSize(thread_state.next_batch) < batch_size)
+			return;
+		bool was_empty;


Probably needs explicit initialization

Does it? We initialize it unconditionally three lines later.

passes/opt/opt_clean.cc

widlarizer · 2026-02-13T14:06:03Z

kernel/threading.h


+template <typename V>
+struct DefaultCollisionHandler {
+	void operator()(typename V::Accumulated &, typename V::Accumulated &) const {}


I would prefer for this to error out. It's not meeting the spec of "used to reduce two V::Accumulated values into a single value."

Arguably it is :-). The default behavior is just "pick one of the two values" (and we pick the 'current' value because that's free). I'll add a comment to DefaultCollisionHandler and you can tell me if it's satisfying :-).

rocallahan · 2026-02-17T01:49:31Z

I've also put up a PR into this PR branch with comments that make the data flow clearer

Would you prefer me to merge these changes into my commits (losing attribution) or keep your commit separate in my stack of commits?

We've already talked about adding this as an alternative to `log_id()`, and we'll need it later in this PR.

`log_error()` causes an exit so we don't have to try too hard here. The main thing is to ensure that we normally are able to exit without causing a stack overflow due to recursive asserts about not being in a `Multithreaded` context.

This causes problems when compiling with fuzzing instrumenation enabled.

…dIndex` We'll use these later in this PR.

We'll use this later in the PR.

We will want to query `keep_cache` from parallel threads. If we compute the results on-demand, that means we need synchronization for cache access in those queries, which adds complexity and overhead. Instead, prefill the cache with the status of all relevant modules. Note that this doesn't actually do more work --- we always consult `keep_cache` for all cells of all selected modules, so scanning all those cells and determining the kept status of all dependency modules is always required. Later in this PR we're going to parallelize `scan_module` itself, and that's also much easier to do when no other parallel threads are running.

Turns out this is not strictly necessary for this PR but it's still a good thing to do and makes it clearer that the stats are not modified in a possibly racy way.

…cache_t::scan_module()` with it

…`, and parallelize `remove_temporary_cells`

…hat function

…that function

…e that function

rocallahan force-pushed the parallel-opt-clean branch from 0ed698b to c41df9b Compare February 2, 2026 19:25

rocallahan force-pushed the parallel-opt-clean branch from c41df9b to dc9ac57 Compare February 2, 2026 21:47

widlarizer self-requested a review February 2, 2026 23:07

widlarizer self-assigned this Feb 2, 2026

rocallahan force-pushed the parallel-opt-clean branch from 38e62dd to 1ffc670 Compare February 4, 2026 18:42

rocallahan force-pushed the parallel-opt-clean branch 2 times, most recently from 83bfac8 to 29a0ca6 Compare February 5, 2026 19:24

rocallahan force-pushed the parallel-opt-clean branch from 29a0ca6 to a4d656e Compare February 5, 2026 21:25

rocallahan force-pushed the parallel-opt-clean branch 2 times, most recently from 01b8f9c to b438afc Compare February 8, 2026 22:52

rocallahan marked this pull request as ready for review February 8, 2026 23:43

widlarizer requested changes Feb 16, 2026

View reviewed changes

rocallahan added 10 commits February 17, 2026 03:24

Add IdString::unescape() method

577191e

We've already talked about adding this as an alternative to `log_id()`, and we'll need it later in this PR.

Make log_error() work in a Multithreaded context.

af3bb97

`log_error()` causes an exit so we don't have to try too hard here. The main thing is to ensure that we normally are able to exit without causing a stack overflow due to recursive asserts about not being in a `Multithreaded` context.

Work around std::reverse miscompilation with empty range

ae56948

This causes problems when compiling with fuzzing instrumenation enabled.

Add work_pool_size, IntRange, item_range_for_worker, and `Threa…

61482d3

…dIndex` We'll use these later in this PR.

Add ParallelDispatchThreadPool

d711cf6

We'll use this later in the PR.

Add ShardedVector

8a30051

We'll use this later in the PR.

Add ShardedHashSet

6182db6

We'll use this later in the PR.

Add ConcurrentWorkQueue

1a461f9

We'll use this later in the PR.

Add MonotonicFlag

8ced93b

We'll use this later in the PR.

Add FfInitVals::set_parallel() method

937c7ce

We'll use this later in the PR.

rocallahan added 15 commits February 17, 2026 03:24

Parallelize collect_garbage()

704d110

Parallelize Design::check()

6bf9fd3

Introduce RmStats struct to encapsulate removal statistics

2018946

Turns out this is not strictly necessary for this PR but it's still a good thing to do and makes it clearer that the stats are not modified in a possibly racy way.

Create a toplevel ParallelDispatchThreadPool and parallelize `keep_…

0a64402

…cache_t::scan_module()` with it

Pass the toplevel thread pool to rmunused_module, create a `Subpool…

2659f32

…`, and parallelize `remove_temporary_cells`

Pass the module Subpool to rmunused_module_init and parallelize t…

e213437

…hat function

Pass the module Subpool to rmunused_module_cells and parallelize …

cbb5d8f

…that function

Add test that connects a wire with init to a constant

e2345d1

Pass the module Subpool to rmunused_module_signals and paralleliz…

f234063

…e that function

Add unit-tests for ParallelDispatchThread and friends

722a4fc

Add unit tests for ConcurrentQueue and ThreadPool

71c5b62

Add some tests for ShardedHashSet

e2f939a

Add unit tests for ConcurrentWorkQueue

6b9f152

Add 'init' attributes to RTLIL fuzzing

7f1b4dc

rocallahan force-pushed the parallel-opt-clean branch from b438afc to 7f1b4dc Compare February 17, 2026 03:28

Comments

Conversation

rocallahan commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocallahan commented Feb 2, 2026

Uh oh!

rocallahan commented Feb 5, 2026

Uh oh!

rocallahan commented Feb 5, 2026

Uh oh!

rocallahan commented Feb 6, 2026

Uh oh!

rocallahan commented Feb 8, 2026

Uh oh!

widlarizer commented Feb 11, 2026

Uh oh!

mmicko commented Feb 12, 2026

Uh oh!

widlarizer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

widlarizer Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

rocallahan Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

widlarizer Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

rocallahan Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rocallahan commented Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rocallahan commented Feb 2, 2026 •

edited

Loading

rocallahan Feb 17, 2026 •

edited

Loading