Use U+03BC μ instead of U+00B5 µ in `impl Debug for Duration` #120415

Jules-Bertholet · 2024-01-27T06:28:09Z

https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".

@rustbot label A-unicode

rustbot · 2024-01-27T06:28:16Z

(rustbot has picked a reviewer for you, use r? to override)

Noratrieb

Unicode does call the one character "MICRO SIGN" but the Unicode report does say that the new one is preferred, so changing this makes sense to me. Will surely confuse anyone relying on it seeing breakage 😆 (not that this was ever guaranteed).

There's been lots of bikeshedding about this character before, see #75065. #75065 (comment) suggested that the old byte sequence is a lot more compatible than the new one, which may be worth considering.

cc @Manishearth as our Unicode expert^^

library/core/src/time.rs

the8472 · 2024-01-27T11:36:34Z

Note that some standard keyboard layouts have a key combo (often altgr+m) that produces the U+00B5 μ MICRO SIGN output, so this is easier to type. We also don't use U+2212 − MINUS SIGN for operators.

U+03BC is specified as the compatibility decomposition of U+00B5, and https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".

Manishearth · 2024-01-28T00:18:18Z

suggested that the old byte sequence is a lot more compatible than the new one

Yeah so it was added in the era Unicode was trying hard for 1:1 mappings with other encodings. Quite often these encodings were 8-bit encodings using the upper "unused" half of ASCII, and Unicode often encoded these characters with the same lower byte as their original encoding.

In particular, this comes from Latin-1 and Windows-1252: In U+00B5 MICRO SIGN, in UTF8, will render as Âµ (since it's C2 B5). These encodings also happen to be the most common way for a system to be configured when it's not configured for Unicode.

Unicode is actually at this moment working on publishing data for "do not emit" code points and sequences, which lists code points and sequences that should be recognized by software but not produced. The draft is here, it doesn't currently include the micro sign but that could be an oversight since it does include other deprecated characters. It was compiled from various scattered unicode sources and it's likely that UAX 25 was not one of them. I brought it up with the author of that file.

I'm ambivalent here: I think ideally we should not produce legacy characters, but "wanting better compat with legacy encodings" is exactly why these exist (and aren't marked as being banned entirely), and from that other thread this is clearly something people care about.

So it's a matter of how much "we should not produce legacy characters" weighs up against "we want stuff to still work okayishly well on misconfigured terminals". And also whether Âµ counts as "okayishly well".

Personally I'm not a huge fan of trying hard to appease misconfigurations but at the same time I have often been frustrated with how hard it is to correctly configure stuff, especially when there are multiple steps in the pipeline that need configuring. I've mentioned this before in #75065, but "i am using screen over ssh over my local terminal emulator" is a situation i've been in often where figuring out what needs to be configured correctly can take a lot of time.

So, anyway, with my unicode expert hat on, I think there are arguments for going both ways, it's up to the relevant team to weigh them.

Manishearth · 2024-01-30T02:38:06Z

I checked, UAX 25 is not a normative standard, and it's largely focused on the context of mathematical stuff only.

I would say that overall while UAX 25 does consider it a legacy character in the scope of stuff it is talking about, Unicode does not consider it a legacy character overall. I think it's fine to use the greek letter for micro, but it's also fine to use the micro sign.

It is unlikely the micro sign will make it into the DoNotEmit file I mentioned earlier.

m-ou-se · 2024-02-08T10:26:02Z

@Jules-Bertholet What is the motivation for this change? Did you run into any situation where U+00B5 didn't work as well as U+03BC?

(As far as I know, U+00B5 works in more situations than U+03BC, which is why I think we should prefer U+00B5.)

Jules-Bertholet · 2024-02-08T14:02:50Z

What is the motivation for this change?

Just trying to follow UTR 25's recommendation.

m-ou-se · 2024-02-09T16:53:15Z

Just trying to follow UTR 25's recommendation.

In that case, I'm going to close this, since there are concrete downsides to using U+03BC. Feel free to re-open if you think Rust users experience any practical downsides from using U+00B5.

rustbot assigned m-ou-se Jan 27, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 27, 2024

Noratrieb reviewed Jan 27, 2024

View reviewed changes

library/core/src/time.rs Outdated Show resolved Hide resolved

Use U+03BC μ instead of U+00B5 µ in impl Debug for Duration

a62a0fc

U+03BC is specified as the compatibility decomposition of U+00B5, and https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".

Jules-Bertholet force-pushed the unicode-greek-mu branch from b85c331 to a62a0fc Compare January 27, 2024 14:45

rustbot added the A-Unicode Area: Unicode label Feb 6, 2024

m-ou-se closed this Feb 9, 2024

Jules-Bertholet deleted the unicode-greek-mu branch February 9, 2024 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use U+03BC μ instead of U+00B5 µ in `impl Debug for Duration` #120415

Use U+03BC μ instead of U+00B5 µ in `impl Debug for Duration` #120415

Uh oh!

Jules-Bertholet commented Jan 27, 2024 •

edited

Loading

Uh oh!

rustbot commented Jan 27, 2024

Uh oh!

Noratrieb left a comment

Uh oh!

Uh oh!

the8472 commented Jan 27, 2024

Uh oh!

Manishearth commented Jan 28, 2024 •

edited

Loading

Uh oh!

Manishearth commented Jan 30, 2024

Uh oh!

m-ou-se commented Feb 8, 2024

Uh oh!

Jules-Bertholet commented Feb 8, 2024

Uh oh!

m-ou-se commented Feb 9, 2024

Uh oh!

Uh oh!

Use U+03BC μ instead of U+00B5 µ in impl Debug for Duration #120415

Use U+03BC μ instead of U+00B5 µ in impl Debug for Duration #120415

Uh oh!

Conversation

Jules-Bertholet commented Jan 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Jan 27, 2024

Uh oh!

Noratrieb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

the8472 commented Jan 27, 2024

Uh oh!

Manishearth commented Jan 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Manishearth commented Jan 30, 2024

Uh oh!

m-ou-se commented Feb 8, 2024

Uh oh!

Jules-Bertholet commented Feb 8, 2024

Uh oh!

m-ou-se commented Feb 9, 2024

Uh oh!

Uh oh!

Use U+03BC μ instead of U+00B5 µ in `impl Debug for Duration` #120415

Use U+03BC μ instead of U+00B5 µ in `impl Debug for Duration` #120415

Jules-Bertholet commented Jan 27, 2024 •

edited

Loading

Manishearth commented Jan 28, 2024 •

edited

Loading