Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer accidentally(?) does not use is_ascii_whitespace for literal whitespace in string continuations #136600

Open
hkBst opened this issue Feb 5, 2025 · 2 comments
Labels
A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@hkBst
Copy link
Member

hkBst commented Feb 5, 2025

#108403 proposed to fix this, but it was claimed that the current behavior was documented in the reference in this comment. Incorrectly, as far as I can see, as that page only describes whitespace escapes as being \r, \t, and \n and the fix was about literal whitespace in string continuations. Now https://doc.rust-lang.org/reference/expressions/literal-expr.html#string-continuation-escapes does describe this behavior, but this was added later in Jan 2024. Indeed, this PR shows the reference documented skipping all whitespace, until Jun 13, 2022.

Current behavior has this ui test. It seems like this behavior was once implemented like it is now, then got claimed to be canon then got documented as canon. Anyway, I'm not sure why not all unicode whitespace is skipped, but just almost all ascii whitespace, but it seems important to pick an existing whitespace set, instead of using an old bad manual implementation of is_ascii_whitespace...

Perhaps we can see a crater run at least...

@hkBst hkBst added the C-bug Category: This is a bug. label Feb 5, 2025
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 5, 2025
@jieyouxu jieyouxu added A-parser Area: The parsing of Rust source code to an AST T-lang Relevant to the language team, which will review and decide on the PR/issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 6, 2025
@jieyouxu
Copy link
Member

jieyouxu commented Feb 6, 2025

cc @petrochenkov as I think you're familiar with this specific quirk

@petrochenkov
Copy link
Contributor

I'm aware of it, but don't have any opinion on whether the behavior here should be changed or not (I also didn't do the archeology to find out why it is like it is).

@saethlin saethlin removed the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants