Skip to content

Commit bc74dd7

Browse files
committed
Auto merge of #77727 - thomcc:mach-info-order, r=Amanieu
Avoid SeqCst or static mut in mach_timebase_info and QueryPerformanceFrequency caches This patch went through a couple iterations but the end result is replacing a pattern where an `AtomicUsize` (updated with many SeqCst ops) guards a `static mut` with a single `AtomicU64` that is known to use 0 as a value indicating that it is not initialized. The code in both places exists to cache values used in the conversion of Instants to Durations on macOS, iOS, and Windows. I have no numbers to prove that this improves performance (It seems a little futile to benchmark something like this), but it's much simpler, safer, and in practice we'd expect it to be faster everywhere where Relaxed operations on AtomicU64 are cheaper than SeqCst operations on AtomicUsize, which is a lot of places. Anyway, it also removes a bunch of unsafe code and greatly simplifies the logic, so IMO that alone would be worth it unless it was a regression. If you want to take a look at the assembly output though, see https://godbolt.org/z/rbr6vn for x86_64, https://godbolt.org/z/cqcbqv for aarch64 (Note that this just the output of the mac side, but i'd expect the windows part to be the same and don't feel like doing another godbolt for it). There are several versions of this function in the godbolt: - `info_new`: version in the current patch - `info_less_new`: version in initial PR - `info_original`: version currently in the tree - `info_orig_but_better_orderings`: a version that just tries to change the original code's orderings from SeqCst to the (probably) minimal orderings required for soundness/correctness. The biggest concern I have here is if we can use AtomicU64, or if there are targets that dont have it that this code supports. AFAICT: no. (If that changes in the future, it's easy enough to do something different for them) r? `@Amanieu` because he caught a couple issues last time I tried to do a patch reducing orderings 😅 --- <details> <summary>I rewrote this whole message so the original is inside here</summary> I happened to notice the code we use for caching the result of mach_timebase_info uses SeqCst exclusively. However, thinking a little more, it's actually pretty easy to avoid the static mut by packing the timebase info into an AtomicU64. This entirely avoids needing to do the compare_exchange. The AtomicU64 can be read/written using Relaxed ops, which on current macos/ios platforms (x86_64/aarch64) have no overhead compared to direct loads/stores. This simplifies the code and makes it a lot safer too. I have no numbers to prove that this improves performance (It seems a little futile to benchmark something like this), although it should do that on both targets it applies to. That said, it also removes a bunch of unsafe code and simplifies the logic (arguably at least — there are only two states now, initialized or not), so I think it's a net win even without concrete numbers. If you want to take a look at the assembly output though, see below. It has the new version, the original, and a version of the original with lower Orderings (which is still worse than the version in this PR) - godbolt.org/z/obfqf9 x86_64-apple-darwin - godbolt.org/z/Wz5cWc aarch64-unknown-linux-gnu (godbolt can't do aarch64-apple-ios but that doesn't matter here) A different (and more efficient) option than this would be to just use the AtomicU64 and use the knowledge that after initialization the denominator should be nonzero... That felt like it's relying on too many things I'm not confident in, so I didn't want to do that. </details>
2 parents c38f001 + 4f37220 commit bc74dd7

File tree

2 files changed

+50
-42
lines changed

2 files changed

+50
-42
lines changed

library/std/src/sys/unix/time.rs

+33-23
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,7 @@ impl Hash for Timespec {
117117
#[cfg(any(target_os = "macos", target_os = "ios"))]
118118
mod inner {
119119
use crate::fmt;
120-
use crate::mem;
121-
use crate::sync::atomic::{AtomicUsize, Ordering::SeqCst};
120+
use crate::sync::atomic::{AtomicU64, Ordering};
122121
use crate::sys::cvt;
123122
use crate::sys_common::mul_div_u64;
124123
use crate::time::Duration;
@@ -233,31 +232,42 @@ mod inner {
233232
}
234233

235234
fn info() -> mach_timebase_info {
236-
static mut INFO: mach_timebase_info = mach_timebase_info { numer: 0, denom: 0 };
237-
static STATE: AtomicUsize = AtomicUsize::new(0);
238-
239-
unsafe {
240-
// If a previous thread has filled in this global state, use that.
241-
if STATE.load(SeqCst) == 2 {
242-
return INFO;
243-
}
235+
// INFO_BITS conceptually is an `Option<mach_timebase_info>`. We can do
236+
// this in 64 bits because we know 0 is never a valid value for the
237+
// `denom` field.
238+
//
239+
// Encoding this as a single `AtomicU64` allows us to use `Relaxed`
240+
// operations, as we are only interested in in the effects on a single
241+
// memory location.
242+
static INFO_BITS: AtomicU64 = AtomicU64::new(0);
243+
244+
// If a previous thread has initialized `INFO_BITS`, use it.
245+
let info_bits = INFO_BITS.load(Ordering::Relaxed);
246+
if info_bits != 0 {
247+
return info_from_bits(info_bits);
248+
}
244249

245-
// ... otherwise learn for ourselves ...
246-
let mut info = mem::zeroed();
247-
extern "C" {
248-
fn mach_timebase_info(info: mach_timebase_info_t) -> kern_return_t;
249-
}
250+
// ... otherwise learn for ourselves ...
251+
extern "C" {
252+
fn mach_timebase_info(info: mach_timebase_info_t) -> kern_return_t;
253+
}
250254

255+
let mut info = info_from_bits(0);
256+
unsafe {
251257
mach_timebase_info(&mut info);
252-
253-
// ... and attempt to be the one thread that stores it globally for
254-
// all other threads
255-
if STATE.compare_exchange(0, 1, SeqCst, SeqCst).is_ok() {
256-
INFO = info;
257-
STATE.store(2, SeqCst);
258-
}
259-
return info;
260258
}
259+
INFO_BITS.store(info_to_bits(info), Ordering::Relaxed);
260+
info
261+
}
262+
263+
#[inline]
264+
fn info_to_bits(info: mach_timebase_info) -> u64 {
265+
((info.denom as u64) << 32) | (info.numer as u64)
266+
}
267+
268+
#[inline]
269+
fn info_from_bits(bits: u64) -> mach_timebase_info {
270+
mach_timebase_info { numer: bits as u32, denom: (bits >> 32) as u32 }
261271
}
262272
}
263273

library/std/src/sys/windows/time.rs

+17-19
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ fn intervals2dur(intervals: u64) -> Duration {
165165

166166
mod perf_counter {
167167
use super::NANOS_PER_SEC;
168-
use crate::sync::atomic::{AtomicUsize, Ordering::SeqCst};
168+
use crate::sync::atomic::{AtomicU64, Ordering};
169169
use crate::sys::c;
170170
use crate::sys::cvt;
171171
use crate::sys_common::mul_div_u64;
@@ -197,27 +197,25 @@ mod perf_counter {
197197
}
198198

199199
fn frequency() -> c::LARGE_INTEGER {
200-
static mut FREQUENCY: c::LARGE_INTEGER = 0;
201-
static STATE: AtomicUsize = AtomicUsize::new(0);
202-
200+
// Either the cached result of `QueryPerformanceFrequency` or `0` for
201+
// uninitialized. Storing this as a single `AtomicU64` allows us to use
202+
// `Relaxed` operations, as we are only interested in the effects on a
203+
// single memory location.
204+
static FREQUENCY: AtomicU64 = AtomicU64::new(0);
205+
206+
let cached = FREQUENCY.load(Ordering::Relaxed);
207+
// If a previous thread has filled in this global state, use that.
208+
if cached != 0 {
209+
return cached as c::LARGE_INTEGER;
210+
}
211+
// ... otherwise learn for ourselves ...
212+
let mut frequency = 0;
203213
unsafe {
204-
// If a previous thread has filled in this global state, use that.
205-
if STATE.load(SeqCst) == 2 {
206-
return FREQUENCY;
207-
}
208-
209-
// ... otherwise learn for ourselves ...
210-
let mut frequency = 0;
211214
cvt(c::QueryPerformanceFrequency(&mut frequency)).unwrap();
212-
213-
// ... and attempt to be the one thread that stores it globally for
214-
// all other threads
215-
if STATE.compare_exchange(0, 1, SeqCst, SeqCst).is_ok() {
216-
FREQUENCY = frequency;
217-
STATE.store(2, SeqCst);
218-
}
219-
frequency
220215
}
216+
217+
FREQUENCY.store(frequency as u64, Ordering::Relaxed);
218+
frequency
221219
}
222220

223221
fn query() -> c::LARGE_INTEGER {

0 commit comments

Comments
 (0)