Skip to content

Commit b373f97

Browse files
committed
Auto merge of rust-lang#105840 - saethlin:ord-cmp, r=<try>
Micro-optimize Ord::cmp for primitives I originally started looking into this because in MIR, `PartialOrd::cmp` is _huge_ and even for trivial types like `u32` which are theoretically a single statement to compare, the `PartialOrd::cmp` impl doesn't inline. A significant contributor to the size of the implementation is that it has two comparisons. And this actually follows through to the final x86_64 codegen too, which is... strange. We don't need two `cmp` instructions in order to do a single Rust-level comparison. So I started tweaking the implementation, and came up with the same thing as rust-lang#64082 (which I didn't know about at the time), I ran `llvm-mca` on it per the issue which was linked in the code to establish that it looked better, and submitted it for a benchmark run. The initial benchmark run regresses basically everything. By looking through the cachegrind diffs in the perf report then the `perf annotate` for regressed functions, I was able to identify one source of the regression: `Ord::min` and `Ord::max` no longer optimize well. Tweaking them to bypass `Ord::cmp` removed some regressions, but not much. Diving back into the cachegrind diffs and disassembly, I found one huge widespread issue was that the codegen for `Span`'s `hash_stable` regressed because `span_data_to_lines_and_cols` no longer inlined into it, because that function does a lot of `Range<BytePos>::contains`. The implementation of `Range::contains` uses `PartialOrd` multiple times, and we had massively regressed the codegen of `Range::contains`. The root problem here seems to be that `PartialOrd` is derived on `BytePos`, which is a simple wrapper around a `u32`. So for `BytePos`, `PartialOrd::{le, lt, ge, gt}` use the default impls, which go through `PartialOrd::cmp`, and LLVM fails to optimize these combinations of methods with the new `Ord::cmp` implementation. At a guess, the new implementation makes LLVM totally loses track of the fact that `<Ord for u32>::cmp` is an elaborate way to compare two integers. So I have low hopes for this overall, because my strategy (which is working) to recover the regressions is to avoid the "faster" implementation that this PR is based around. If we have to settle for an implementation of `Ord::cmp` which is on its own sub-optimal but is optimized better in combination with functions that use its return value in specific ways, so be it. However, one of the runs had an improvement in `coercions`. I don't know if that is jitter or relevant. But I'm still finding threads to pull here, so I'm going to keep at it. For the moment I am hacking up the implementations on `BytePos` instead of modifying the code that `derive(PartialOrd, Ord)` expands to because that would be hard, and it would also mean that we would just expand to more code, perhaps regressing compile time for that reason, even if the generated assembly is more efficient. --- Hacking up the remainder of the `PartialOrd`/`Ord` methods on `BytePos` took us down to 3 regressions and 6 improvements, which is interesting. All the improvements are in `coercions`, so I'm sure this improved _something_ but whether it matters... hard to say. Based on the findings of `@joboet,` I'm going to cherry-pick rust-lang#106065 onto this branch, because that strategy seems to improve `PartialOrd::lt` and `PartialOrd::ge` back to the original codegen, even when they are using our new `Ord::cmp` impl. If the remaining perf regressions are due to de-optimizing a `PartialOrd::lt` not on `BytePos`, this might be a further improvement. --- Okay, that cherry-pick brought us down to 2 regressions but that might be noise. We still have the same 6 improvements, all on `coercions`. I think the next thing to try here is modifying the implementation of `derive(PartialOrd)` to automatically emit the modifications that I made to `BytePos` (directly implementing all the methods for newtypes). But even if that works, I think the effect of this change is so mixed that it's probably not worth merging with current LLVM. What I'm afraid of is that this change currently pessimizes matching on `Ordering`, and that is the most natural thing to do with an enum. So I'm not closing this yet, but I think without a change from LLVM, I have other priorities at the moment. r? `@ghost`
2 parents ee9c7c9 + dcea8b1 commit b373f97

File tree

3 files changed

+142
-24
lines changed

3 files changed

+142
-24
lines changed

compiler/rustc_span/src/lib.rs

+47-3
Original file line numberDiff line numberDiff line change
@@ -2191,6 +2191,50 @@ macro_rules! impl_pos {
21912191
$(#[$attr])*
21922192
$vis struct $ident($inner_vis $inner_ty);
21932193

2194+
impl ::std::cmp::Ord for $ident {
2195+
#[inline(always)]
2196+
fn cmp(&self, other: &Self) -> ::std::cmp::Ordering {
2197+
self.0.cmp(&other.0)
2198+
}
2199+
2200+
#[inline(always)]
2201+
fn min(self, other: Self) -> Self {
2202+
Self(self.0.min(other.0))
2203+
}
2204+
2205+
#[inline(always)]
2206+
fn max(self, other: Self) -> Self {
2207+
Self(self.0.max(other.0))
2208+
}
2209+
}
2210+
2211+
impl ::std::cmp::PartialOrd for $ident {
2212+
#[inline(always)]
2213+
fn partial_cmp(&self, other: &Self) -> Option<::std::cmp::Ordering> {
2214+
self.0.partial_cmp(&other.0)
2215+
}
2216+
2217+
#[inline(always)]
2218+
fn lt(&self, other: &Self) -> bool {
2219+
self.0.lt(&other.0)
2220+
}
2221+
2222+
#[inline(always)]
2223+
fn le(&self, other: &Self) -> bool {
2224+
self.0.le(&other.0)
2225+
}
2226+
2227+
#[inline(always)]
2228+
fn gt(&self, other: &Self) -> bool {
2229+
self.0.gt(&other.0)
2230+
}
2231+
2232+
#[inline(always)]
2233+
fn ge(&self, other: &Self) -> bool {
2234+
self.0.ge(&other.0)
2235+
}
2236+
}
2237+
21942238
impl Pos for $ident {
21952239
#[inline(always)]
21962240
fn from_usize(n: usize) -> $ident {
@@ -2238,19 +2282,19 @@ impl_pos! {
22382282
/// A byte offset.
22392283
///
22402284
/// Keep this small (currently 32-bits), as AST contains a lot of them.
2241-
#[derive(Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Debug)]
2285+
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
22422286
pub struct BytePos(pub u32);
22432287

22442288
/// A byte offset relative to file beginning.
2245-
#[derive(Clone, Copy, PartialEq, Eq, Hash, PartialOrd, Ord, Debug)]
2289+
#[derive(Clone, Copy, PartialEq, Eq, Hash, Debug)]
22462290
pub struct RelativeBytePos(pub u32);
22472291

22482292
/// A character offset.
22492293
///
22502294
/// Because of multibyte UTF-8 characters, a byte offset
22512295
/// is not equivalent to a character offset. The [`SourceMap`] will convert [`BytePos`]
22522296
/// values to `CharPos` values as necessary.
2253-
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Debug)]
2297+
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
22542298
pub struct CharPos(pub usize);
22552299
}
22562300

library/core/src/cmp.rs

+46-21
Original file line numberDiff line numberDiff line change
@@ -401,12 +401,18 @@ impl Ordering {
401401
/// assert_eq!(Ordering::Equal.is_eq(), true);
402402
/// assert_eq!(Ordering::Greater.is_eq(), false);
403403
/// ```
404-
#[inline]
404+
#[inline(always)]
405405
#[must_use]
406406
#[rustc_const_stable(feature = "ordering_helpers", since = "1.53.0")]
407407
#[stable(feature = "ordering_helpers", since = "1.53.0")]
408408
pub const fn is_eq(self) -> bool {
409-
matches!(self, Equal)
409+
// Implementation note: It appears (as of 2022-12) that LLVM has an
410+
// easier time with a comparison against zero like this, as opposed
411+
// to looking for the `Less` value (-1) specifically, maybe because
412+
// it's not always obvious to it that -2 isn't possible.
413+
// Thus this and its siblings below are written this way, rather
414+
// than the potentially-more-obvious `matches!` version.
415+
(self as i8) == 0
410416
}
411417

412418
/// Returns `true` if the ordering is not the `Equal` variant.
@@ -420,12 +426,12 @@ impl Ordering {
420426
/// assert_eq!(Ordering::Equal.is_ne(), false);
421427
/// assert_eq!(Ordering::Greater.is_ne(), true);
422428
/// ```
423-
#[inline]
429+
#[inline(always)]
424430
#[must_use]
425431
#[rustc_const_stable(feature = "ordering_helpers", since = "1.53.0")]
426432
#[stable(feature = "ordering_helpers", since = "1.53.0")]
427433
pub const fn is_ne(self) -> bool {
428-
!matches!(self, Equal)
434+
(self as i8) != 0
429435
}
430436

431437
/// Returns `true` if the ordering is the `Less` variant.
@@ -439,12 +445,12 @@ impl Ordering {
439445
/// assert_eq!(Ordering::Equal.is_lt(), false);
440446
/// assert_eq!(Ordering::Greater.is_lt(), false);
441447
/// ```
442-
#[inline]
448+
#[inline(always)]
443449
#[must_use]
444450
#[rustc_const_stable(feature = "ordering_helpers", since = "1.53.0")]
445451
#[stable(feature = "ordering_helpers", since = "1.53.0")]
446452
pub const fn is_lt(self) -> bool {
447-
matches!(self, Less)
453+
(self as i8) < 0
448454
}
449455

450456
/// Returns `true` if the ordering is the `Greater` variant.
@@ -458,12 +464,12 @@ impl Ordering {
458464
/// assert_eq!(Ordering::Equal.is_gt(), false);
459465
/// assert_eq!(Ordering::Greater.is_gt(), true);
460466
/// ```
461-
#[inline]
467+
#[inline(always)]
462468
#[must_use]
463469
#[rustc_const_stable(feature = "ordering_helpers", since = "1.53.0")]
464470
#[stable(feature = "ordering_helpers", since = "1.53.0")]
465471
pub const fn is_gt(self) -> bool {
466-
matches!(self, Greater)
472+
(self as i8) > 0
467473
}
468474

469475
/// Returns `true` if the ordering is either the `Less` or `Equal` variant.
@@ -477,12 +483,12 @@ impl Ordering {
477483
/// assert_eq!(Ordering::Equal.is_le(), true);
478484
/// assert_eq!(Ordering::Greater.is_le(), false);
479485
/// ```
480-
#[inline]
486+
#[inline(always)]
481487
#[must_use]
482488
#[rustc_const_stable(feature = "ordering_helpers", since = "1.53.0")]
483489
#[stable(feature = "ordering_helpers", since = "1.53.0")]
484490
pub const fn is_le(self) -> bool {
485-
!matches!(self, Greater)
491+
(self as i8) <= 0
486492
}
487493

488494
/// Returns `true` if the ordering is either the `Greater` or `Equal` variant.
@@ -496,12 +502,12 @@ impl Ordering {
496502
/// assert_eq!(Ordering::Equal.is_ge(), true);
497503
/// assert_eq!(Ordering::Greater.is_ge(), true);
498504
/// ```
499-
#[inline]
505+
#[inline(always)]
500506
#[must_use]
501507
#[rustc_const_stable(feature = "ordering_helpers", since = "1.53.0")]
502508
#[stable(feature = "ordering_helpers", since = "1.53.0")]
503509
pub const fn is_ge(self) -> bool {
504-
!matches!(self, Less)
510+
(self as i8) >= 0
505511
}
506512

507513
/// Reverses the `Ordering`.
@@ -1169,7 +1175,7 @@ pub trait PartialOrd<Rhs: ?Sized = Self>: PartialEq<Rhs> {
11691175
#[must_use]
11701176
#[stable(feature = "rust1", since = "1.0.0")]
11711177
fn lt(&self, other: &Rhs) -> bool {
1172-
matches!(self.partial_cmp(other), Some(Less))
1178+
if let Some(ordering) = self.partial_cmp(other) { ordering.is_lt() } else { false }
11731179
}
11741180

11751181
/// This method tests less than or equal to (for `self` and `other`) and is used by the `<=`
@@ -1186,7 +1192,7 @@ pub trait PartialOrd<Rhs: ?Sized = Self>: PartialEq<Rhs> {
11861192
#[must_use]
11871193
#[stable(feature = "rust1", since = "1.0.0")]
11881194
fn le(&self, other: &Rhs) -> bool {
1189-
matches!(self.partial_cmp(other), Some(Less | Equal))
1195+
if let Some(ordering) = self.partial_cmp(other) { ordering.is_le() } else { false }
11901196
}
11911197

11921198
/// This method tests greater than (for `self` and `other`) and is used by the `>` operator.
@@ -1202,7 +1208,7 @@ pub trait PartialOrd<Rhs: ?Sized = Self>: PartialEq<Rhs> {
12021208
#[must_use]
12031209
#[stable(feature = "rust1", since = "1.0.0")]
12041210
fn gt(&self, other: &Rhs) -> bool {
1205-
matches!(self.partial_cmp(other), Some(Greater))
1211+
if let Some(ordering) = self.partial_cmp(other) { ordering.is_gt() } else { false }
12061212
}
12071213

12081214
/// This method tests greater than or equal to (for `self` and `other`) and is used by the `>=`
@@ -1219,7 +1225,7 @@ pub trait PartialOrd<Rhs: ?Sized = Self>: PartialEq<Rhs> {
12191225
#[must_use]
12201226
#[stable(feature = "rust1", since = "1.0.0")]
12211227
fn ge(&self, other: &Rhs) -> bool {
1222-
matches!(self.partial_cmp(other), Some(Greater | Equal))
1228+
if let Some(ordering) = self.partial_cmp(other) { ordering.is_ge() } else { false }
12231229
}
12241230
}
12251231

@@ -1563,12 +1569,31 @@ mod impls {
15631569
impl Ord for $t {
15641570
#[inline]
15651571
fn cmp(&self, other: &$t) -> Ordering {
1566-
// The order here is important to generate more optimal assembly.
1567-
// See <https://github.com/rust-lang/rust/issues/63758> for more info.
1568-
if *self < *other { Less }
1569-
else if *self == *other { Equal }
1570-
else { Greater }
1572+
let mut res = 0i8;
1573+
res -= (*self < *other) as i8;
1574+
res += (*self > *other) as i8;
1575+
// SAFETY: The discriminants of Ord were chosen to permit this
1576+
unsafe { crate::mem::transmute(res) }
15711577
}
1578+
1579+
#[inline]
1580+
fn max(self, other: Self) -> Self {
1581+
if self > other {
1582+
self
1583+
} else {
1584+
other
1585+
}
1586+
}
1587+
1588+
#[inline]
1589+
fn min(self, other: Self) -> Self {
1590+
if self > other {
1591+
other
1592+
} else {
1593+
self
1594+
}
1595+
}
1596+
15721597
}
15731598
)*)
15741599
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
// The `derive(PartialOrd)` for a newtype doesn't override `lt`/`le`/`gt`/`ge`.
2+
// This double-checks that the `Option<Ordering>` intermediate values used
3+
// in the operators for such a type all optimize away.
4+
5+
// compile-flags: -C opt-level=1
6+
// min-llvm-version: 15.0
7+
8+
#![crate_type = "lib"]
9+
10+
use std::cmp::Ordering;
11+
12+
#[derive(PartialOrd, PartialEq)]
13+
pub struct Foo(u16);
14+
15+
// CHECK-LABEL: @check_lt
16+
// CHECK-SAME: (i16 %[[A:.+]], i16 %[[B:.+]])
17+
#[no_mangle]
18+
pub fn check_lt(a: Foo, b: Foo) -> bool {
19+
// CHECK: %[[R:.+]] = icmp ult i16 %[[A]], %[[B]]
20+
// CHECK-NEXT: ret i1 %[[R]]
21+
a < b
22+
}
23+
24+
// CHECK-LABEL: @check_le
25+
// CHECK-SAME: (i16 %[[A:.+]], i16 %[[B:.+]])
26+
#[no_mangle]
27+
pub fn check_le(a: Foo, b: Foo) -> bool {
28+
// CHECK: %[[R:.+]] = icmp ule i16 %[[A]], %[[B]]
29+
// CHECK-NEXT: ret i1 %[[R]]
30+
a <= b
31+
}
32+
33+
// CHECK-LABEL: @check_gt
34+
// CHECK-SAME: (i16 %[[A:.+]], i16 %[[B:.+]])
35+
#[no_mangle]
36+
pub fn check_gt(a: Foo, b: Foo) -> bool {
37+
// CHECK: %[[R:.+]] = icmp ugt i16 %[[A]], %[[B]]
38+
// CHECK-NEXT: ret i1 %[[R]]
39+
a > b
40+
}
41+
42+
// CHECK-LABEL: @check_ge
43+
// CHECK-SAME: (i16 %[[A:.+]], i16 %[[B:.+]])
44+
#[no_mangle]
45+
pub fn check_ge(a: Foo, b: Foo) -> bool {
46+
// CHECK: %[[R:.+]] = icmp uge i16 %[[A]], %[[B]]
47+
// CHECK-NEXT: ret i1 %[[R]]
48+
a >= b
49+
}

0 commit comments

Comments
 (0)