Skip to content

Commit 796e6ac

Browse files
committed
rust-demangler tool strips crate disambiguators with < 16 digits
Addresses Issue #77615.
1 parent 6f62766 commit 796e6ac

File tree

1 file changed

+62
-4
lines changed

1 file changed

+62
-4
lines changed

src/tools/rust-demangler/main.rs

+62-4
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,41 @@
2121
//! $ "${TARGET}"/llvm/bin/llvm-cov show --Xdemangler="${TARGET}"/stage0-tools-bin/rust-demangler \
2222
//! --instr-profile=main.profdata ./main --show-line-counts-or-regions
2323
//! ```
24+
//!
25+
//! Note regarding crate disambiguators:
26+
//!
27+
//! Some demangled symbol paths can include "crate disambiguator" suffixes, represented as a large
28+
//! hexadecimal value enclosed in square braces, and appended to the name of the crate. a suffix to the
29+
//! original crate name. For example, the `core` crate, here, includes a disambiguator:
30+
//!
31+
//! ```rust
32+
//! <generics::Firework<f64> as core[a7a74cee373f048]::ops::drop::Drop>::drop
33+
//! ```
34+
//!
35+
//! These disambiguators are known to vary depending on environmental circumstances. As a result,
36+
//! tests that compare results including demangled names can fail across development environments,
37+
//! particularly with cross-platform testing. Also, the resulting crate paths are not syntactically
38+
//! valid, and don't match the original source symbol paths, which can impact development tools.
39+
//!
40+
//! For these reasons, by default, `rust-demangler` uses a heuristic to remove crate disambiguators
41+
//! from their original demangled representation before printing them to standard output. If crate
42+
//! disambiguators are required, add the `-d` (or `--disambiguators`) flag, and the disambiguators
43+
//! will not be removed.
44+
//!
45+
//! Also note that the disambiguators are stripped by a Regex pattern that is tolerant to some
46+
//! variation in the number of hexadecimal digits. The disambiguators come from a hash value, which
47+
//! typically generates a 16-digit hex representation on a 64-bit architecture; however, leading
48+
//! zeros are not included, which can shorten the hex digit length, and a different hash algorithm
49+
//! that might also be dependent on the architecture, might shorten the length even further. A
50+
//! minimum length of 5 digits is assumed, which should be more than sufficient to support hex
51+
//! representations that generate only 8-digits of precision with an extremely rare (but not
52+
//! impossible) result with up to 3 leading zeros.
53+
//!
54+
//! Using a minimum number of digits less than 5 risks the possibility of stripping demangled name
55+
//! components with a similar pattern. For example, some closures instantiated multiple times
56+
//! include their own disambiguators, demangled as non-hashed zero-based indexes in square brackets.
57+
//! These disambiguators seem to have more analytical value (for instance, in coverage analysis), so
58+
//! they are not removed.
2459
2560
use regex::Regex;
2661
use rustc_demangle::demangle;
@@ -29,7 +64,25 @@ use std::io::{self, Read, Write};
2964
const REPLACE_COLONS: &str = "::";
3065

3166
fn main() -> io::Result<()> {
32-
let mut strip_crate_disambiguators = Some(Regex::new(r"\[[a-f0-9]{16}\]::").unwrap());
67+
// FIXME(richkadel): In Issue #77615 discussed updating the `rustc-demangle` library, to provide
68+
// an option to generate demangled names without including crate disambiguators. If that
69+
// happens, update this tool to use that option (if the `-d` flag is not set) instead stripping
70+
// them via the Regex heuristic. The update the doc comments and help.
71+
72+
// Strip hashed hexadecimal crate disambiguators. Leading zeros are not enforced, and can be
73+
// different across different platform/architecture types, so while 16 hex digits are common,
74+
// they can also be shorter.
75+
//
76+
// Also note that a demangled symbol path may include the `[<digits>]` pattern, with zero-based
77+
// indexes (such as for closures, and possibly for types defined in anonymous scopes). Preferably
78+
// these should not be stripped.
79+
//
80+
// The minimum length of 5 digits supports the possibility that some target architecture (maybe
81+
// a 32-bit or smaller architecture) could generate a hash value with a maximum of 8 digits,
82+
// and more than three leading zeros should be extremely unlikely. Conversely, it should be
83+
// sufficient to assume the zero-based indexes for closures and anonymous scopes will never
84+
// exceed the value 9999.
85+
let mut strip_crate_disambiguators = Some(Regex::new(r"\[[a-f0-9]{5,16}\]::").unwrap());
3386

3487
let mut args = std::env::args();
3588
let progname = args.next().unwrap();
@@ -41,14 +94,19 @@ fn main() -> io::Result<()> {
4194
eprintln!("Usage: {} [-d|--disambiguators]", progname);
4295
eprintln!();
4396
eprintln!(
44-
"This tool converts a list of Rust mangled symbols (one per line) into a\n
97+
"This tool converts a list of Rust mangled symbols (one per line) into a\n\
4598
corresponding list of demangled symbols."
4699
);
47100
eprintln!();
48101
eprintln!(
49102
"With -d (--disambiguators), Rust symbols mangled with the v0 symbol mangler may\n\
50-
include crate disambiguators (a 16 character hex value in square brackets).\n\
51-
Crate disambiguators are removed by default."
103+
include crate disambiguators (a hexadecimal hash value, typically up to 16 digits\n\
104+
long, enclosed in square brackets)."
105+
);
106+
eprintln!();
107+
eprintln!(
108+
"By default, crate disambiguators are removed, using a heuristics-based regular\n\
109+
expression. (See the `rust-demangler` doc comments for more information.)"
52110
);
53111
eprintln!();
54112
std::process::exit(1)

0 commit comments

Comments
 (0)