-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Use U+03BC μ instead of U+00B5 µ in impl Debug for Duration
#120415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
r? @m-ou-se (rustbot has picked a reviewer for you, use r? to override) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unicode does call the one character "MICRO SIGN" but the Unicode report does say that the new one is preferred, so changing this makes sense to me. Will surely confuse anyone relying on it seeing breakage 😆 (not that this was ever guaranteed).
There's been lots of bikeshedding about this character before, see #75065. #75065 (comment) suggested that the old byte sequence is a lot more compatible than the new one, which may be worth considering.
cc @Manishearth as our Unicode expert^^
Note that some standard keyboard layouts have a key combo (often altgr+m) that produces the |
U+03BC is specified as the compatibility decomposition of U+00B5, and https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".
b85c331
to
a62a0fc
Compare
Yeah so it was added in the era Unicode was trying hard for 1:1 mappings with other encodings. Quite often these encodings were 8-bit encodings using the upper "unused" half of ASCII, and Unicode often encoded these characters with the same lower byte as their original encoding. In particular, this comes from Latin-1 and Windows-1252: In U+00B5 MICRO SIGN, in UTF8, will render as µ (since it's Unicode is actually at this moment working on publishing data for "do not emit" code points and sequences, which lists code points and sequences that should be recognized by software but not produced. The draft is here, it doesn't currently include the micro sign but that could be an oversight since it does include other deprecated characters. It was compiled from various scattered unicode sources and it's likely that UAX 25 was not one of them. I brought it up with the author of that file. I'm ambivalent here: I think ideally we should not produce legacy characters, but "wanting better compat with legacy encodings" is exactly why these exist (and aren't marked as being banned entirely), and from that other thread this is clearly something people care about. So it's a matter of how much "we should not produce legacy characters" weighs up against "we want stuff to still work okayishly well on misconfigured terminals". And also whether Personally I'm not a huge fan of trying hard to appease misconfigurations but at the same time I have often been frustrated with how hard it is to correctly configure stuff, especially when there are multiple steps in the pipeline that need configuring. I've mentioned this before in #75065, but "i am using So, anyway, with my unicode expert hat on, I think there are arguments for going both ways, it's up to the relevant team to weigh them. |
I checked, UAX 25 is not a normative standard, and it's largely focused on the context of mathematical stuff only. I would say that overall while UAX 25 does consider it a legacy character in the scope of stuff it is talking about, Unicode does not consider it a legacy character overall. I think it's fine to use the greek letter for micro, but it's also fine to use the micro sign. It is unlikely the micro sign will make it into the DoNotEmit file I mentioned earlier. |
@Jules-Bertholet What is the motivation for this change? Did you run into any situation where U+00B5 didn't work as well as U+03BC? (As far as I know, U+00B5 works in more situations than U+03BC, which is why I think we should prefer U+00B5.) |
Just trying to follow UTR 25's recommendation. |
In that case, I'm going to close this, since there are concrete downsides to using U+03BC. Feel free to re-open if you think Rust users experience any practical downsides from using U+00B5. |
https://www.unicode.org/reports/tr25/ section 2.5 deems the former as "the preferred character in a Unicode context".
@rustbot label A-unicode