Skip to content

Use U+03BC μ instead of U+00B5 µ in impl Debug for Duration #120415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Jules-Bertholet
Copy link
Contributor

@Jules-Bertholet Jules-Bertholet commented Jan 27, 2024

https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".

@rustbot label A-unicode

@rustbot
Copy link
Collaborator

rustbot commented Jan 27, 2024

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 27, 2024
Copy link
Member

@Noratrieb Noratrieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unicode does call the one character "MICRO SIGN" but the Unicode report does say that the new one is preferred, so changing this makes sense to me. Will surely confuse anyone relying on it seeing breakage 😆 (not that this was ever guaranteed).

There's been lots of bikeshedding about this character before, see #75065. #75065 (comment) suggested that the old byte sequence is a lot more compatible than the new one, which may be worth considering.

cc @Manishearth as our Unicode expert^^

@the8472
Copy link
Member

the8472 commented Jan 27, 2024

Note that some standard keyboard layouts have a key combo (often altgr+m) that produces the U+00B5 μ MICRO SIGN output, so this is easier to type. We also don't use U+2212 − MINUS SIGN for operators.

U+03BC is specified as the compatibility decomposition of U+00B5, and
https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".
@Manishearth
Copy link
Member

Manishearth commented Jan 28, 2024

suggested that the old byte sequence is a lot more compatible than the new one

Yeah so it was added in the era Unicode was trying hard for 1:1 mappings with other encodings. Quite often these encodings were 8-bit encodings using the upper "unused" half of ASCII, and Unicode often encoded these characters with the same lower byte as their original encoding.

In particular, this comes from Latin-1 and Windows-1252: In U+00B5 MICRO SIGN, in UTF8, will render as µ (since it's C2 B5). These encodings also happen to be the most common way for a system to be configured when it's not configured for Unicode.

Unicode is actually at this moment working on publishing data for "do not emit" code points and sequences, which lists code points and sequences that should be recognized by software but not produced. The draft is here, it doesn't currently include the micro sign but that could be an oversight since it does include other deprecated characters. It was compiled from various scattered unicode sources and it's likely that UAX 25 was not one of them. I brought it up with the author of that file.


I'm ambivalent here: I think ideally we should not produce legacy characters, but "wanting better compat with legacy encodings" is exactly why these exist (and aren't marked as being banned entirely), and from that other thread this is clearly something people care about.

So it's a matter of how much "we should not produce legacy characters" weighs up against "we want stuff to still work okayishly well on misconfigured terminals". And also whether µ counts as "okayishly well".

Personally I'm not a huge fan of trying hard to appease misconfigurations but at the same time I have often been frustrated with how hard it is to correctly configure stuff, especially when there are multiple steps in the pipeline that need configuring. I've mentioned this before in #75065, but "i am using screen over ssh over my local terminal emulator" is a situation i've been in often where figuring out what needs to be configured correctly can take a lot of time.

So, anyway, with my unicode expert hat on, I think there are arguments for going both ways, it's up to the relevant team to weigh them.

@Manishearth
Copy link
Member

I checked, UAX 25 is not a normative standard, and it's largely focused on the context of mathematical stuff only.

I would say that overall while UAX 25 does consider it a legacy character in the scope of stuff it is talking about, Unicode does not consider it a legacy character overall. I think it's fine to use the greek letter for micro, but it's also fine to use the micro sign.

It is unlikely the micro sign will make it into the DoNotEmit file I mentioned earlier.

@rustbot rustbot added the A-Unicode Area: Unicode label Feb 6, 2024
@m-ou-se
Copy link
Member

m-ou-se commented Feb 8, 2024

@Jules-Bertholet What is the motivation for this change? Did you run into any situation where U+00B5 didn't work as well as U+03BC?

(As far as I know, U+00B5 works in more situations than U+03BC, which is why I think we should prefer U+00B5.)

@m-ou-se m-ou-se added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs Relevant to the library team, which will review and decide on the PR/issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 8, 2024
@Jules-Bertholet
Copy link
Contributor Author

What is the motivation for this change?

Just trying to follow UTR 25's recommendation.

@m-ou-se
Copy link
Member

m-ou-se commented Feb 9, 2024

Just trying to follow UTR 25's recommendation.

In that case, I'm going to close this, since there are concrete downsides to using U+03BC. Feel free to re-open if you think Rust users experience any practical downsides from using U+00B5.

@m-ou-se m-ou-se closed this Feb 9, 2024
@Jules-Bertholet Jules-Bertholet deleted the unicode-greek-mu branch February 9, 2024 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Unicode Area: Unicode S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants