Skip to content

Commit 0196107

Browse files
committed
Auto merge of rust-lang#82127 - tgnottingham:tune-ahead-of-time-codegen, r=varkor
rustc_codegen_ssa: tune codegen according to available concurrency This change tunes ahead-of-time codegening according to the amount of concurrency available, rather than according to the number of CPUs on the system. This can lower memory usage by reducing the number of compiled LLVM modules in memory at once, particularly across several rustc instances. Previously, each rustc instance would assume that it should codegen ahead of time to meet the demand of number-of-CPUs workers. But often, a rustc instance doesn't have nearly that much concurrency available to it, because the concurrency availability is split, via the jobserver, across all active rustc instances spawned by the driving cargo process, and is further limited by the `-j` flag argument. Therefore, each rustc might have had several times the number of LLVM modules in memory than it really needed to meet demand. If the modules were large, the effect on memory usage would be noticeable. With this change, the required amount of ahead-of-time codegen scales up with the actual number of workers running within a rustc instance. Note that the number of workers running can be less than the actual concurrency available to a rustc instance. However, if more concurrency is actually available, workers are spun up quickly as job tokens are acquired, and the ahead-of-time codegen scales up quickly as well.
2 parents 446d453 + 5f243d3 commit 0196107

File tree

3 files changed

+64
-11
lines changed

3 files changed

+64
-11
lines changed

Cargo.lock

-1
Original file line numberDiff line numberDiff line change
@@ -3654,7 +3654,6 @@ dependencies = [
36543654
"jobserver",
36553655
"libc",
36563656
"memmap",
3657-
"num_cpus",
36583657
"pathdiff",
36593658
"rustc_apfloat",
36603659
"rustc_ast",

compiler/rustc_codegen_ssa/Cargo.toml

-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ test = false
1111
bitflags = "1.2.1"
1212
cc = "1.0.1"
1313
itertools = "0.9"
14-
num_cpus = "1.0"
1514
memmap = "0.7"
1615
tracing = "0.1"
1716
libc = "0.2.50"

compiler/rustc_codegen_ssa/src/back/write.rs

+64-9
Original file line numberDiff line numberDiff line change
@@ -1193,7 +1193,6 @@ fn start_executing_work<B: ExtraBackendMethods>(
11931193
// necessary. There's already optimizations in place to avoid sending work
11941194
// back to the coordinator if LTO isn't requested.
11951195
return thread::spawn(move || {
1196-
let max_workers = num_cpus::get();
11971196
let mut worker_id_counter = 0;
11981197
let mut free_worker_ids = Vec::new();
11991198
let mut get_worker_id = |free_worker_ids: &mut Vec<usize>| {
@@ -1253,7 +1252,17 @@ fn start_executing_work<B: ExtraBackendMethods>(
12531252
// For codegenning more CGU or for running them through LLVM.
12541253
if !codegen_done {
12551254
if main_thread_worker_state == MainThreadWorkerState::Idle {
1256-
if !queue_full_enough(work_items.len(), running, max_workers) {
1255+
// Compute the number of workers that will be running once we've taken as many
1256+
// items from the work queue as we can, plus one for the main thread. It's not
1257+
// critically important that we use this instead of just `running`, but it
1258+
// prevents the `queue_full_enough` heuristic from fluctuating just because a
1259+
// worker finished up and we decreased the `running` count, even though we're
1260+
// just going to increase it right after this when we put a new worker to work.
1261+
let extra_tokens = tokens.len().checked_sub(running).unwrap();
1262+
let additional_running = std::cmp::min(extra_tokens, work_items.len());
1263+
let anticipated_running = running + additional_running + 1;
1264+
1265+
if !queue_full_enough(work_items.len(), anticipated_running) {
12571266
// The queue is not full enough, codegen more items:
12581267
if codegen_worker_send.send(Message::CodegenItem).is_err() {
12591268
panic!("Could not send Message::CodegenItem to main thread")
@@ -1529,13 +1538,59 @@ fn start_executing_work<B: ExtraBackendMethods>(
15291538

15301539
// A heuristic that determines if we have enough LLVM WorkItems in the
15311540
// queue so that the main thread can do LLVM work instead of codegen
1532-
fn queue_full_enough(
1533-
items_in_queue: usize,
1534-
workers_running: usize,
1535-
max_workers: usize,
1536-
) -> bool {
1537-
// Tune me, plz.
1538-
items_in_queue > 0 && items_in_queue >= max_workers.saturating_sub(workers_running / 2)
1541+
fn queue_full_enough(items_in_queue: usize, workers_running: usize) -> bool {
1542+
// This heuristic scales ahead-of-time codegen according to available
1543+
// concurrency, as measured by `workers_running`. The idea is that the
1544+
// more concurrency we have available, the more demand there will be for
1545+
// work items, and the fuller the queue should be kept to meet demand.
1546+
// An important property of this approach is that we codegen ahead of
1547+
// time only as much as necessary, so as to keep fewer LLVM modules in
1548+
// memory at once, thereby reducing memory consumption.
1549+
//
1550+
// When the number of workers running is less than the max concurrency
1551+
// available to us, this heuristic can cause us to instruct the main
1552+
// thread to work on an LLVM item (that is, tell it to "LLVM") instead
1553+
// of codegen, even though it seems like it *should* be codegenning so
1554+
// that we can create more work items and spawn more LLVM workers.
1555+
//
1556+
// But this is not a problem. When the main thread is told to LLVM,
1557+
// according to this heuristic and how work is scheduled, there is
1558+
// always at least one item in the queue, and therefore at least one
1559+
// pending jobserver token request. If there *is* more concurrency
1560+
// available, we will immediately receive a token, which will upgrade
1561+
// the main thread's LLVM worker to a real one (conceptually), and free
1562+
// up the main thread to codegen if necessary. On the other hand, if
1563+
// there isn't more concurrency, then the main thread working on an LLVM
1564+
// item is appropriate, as long as the queue is full enough for demand.
1565+
//
1566+
// Speaking of which, how full should we keep the queue? Probably less
1567+
// full than you'd think. A lot has to go wrong for the queue not to be
1568+
// full enough and for that to have a negative effect on compile times.
1569+
//
1570+
// Workers are unlikely to finish at exactly the same time, so when one
1571+
// finishes and takes another work item off the queue, we often have
1572+
// ample time to codegen at that point before the next worker finishes.
1573+
// But suppose that codegen takes so long that the workers exhaust the
1574+
// queue, and we have one or more workers that have nothing to work on.
1575+
// Well, it might not be so bad. Of all the LLVM modules we create and
1576+
// optimize, one has to finish last. It's not necessarily the case that
1577+
// by losing some concurrency for a moment, we delay the point at which
1578+
// that last LLVM module is finished and the rest of compilation can
1579+
// proceed. Also, when we can't take advantage of some concurrency, we
1580+
// give tokens back to the job server. That enables some other rustc to
1581+
// potentially make use of the available concurrency. That could even
1582+
// *decrease* overall compile time if we're lucky. But yes, if no other
1583+
// rustc can make use of the concurrency, then we've squandered it.
1584+
//
1585+
// However, keeping the queue full is also beneficial when we have a
1586+
// surge in available concurrency. Then items can be taken from the
1587+
// queue immediately, without having to wait for codegen.
1588+
//
1589+
// So, the heuristic below tries to keep one item in the queue for every
1590+
// four running workers. Based on limited benchmarking, this appears to
1591+
// be more than sufficient to avoid increasing compilation times.
1592+
let quarter_of_workers = workers_running - 3 * workers_running / 4;
1593+
items_in_queue > 0 && items_in_queue >= quarter_of_workers
15391594
}
15401595

15411596
fn maybe_start_llvm_timer<'a>(

0 commit comments

Comments
 (0)