Skip to content

Commit 2e7034e

Browse files
committed
Auto merge of #106505 - Nilstrieb:format-args-string-literal-episode-2, r=petrochenkov
Properly allow macro expanded `format_args` invocations to uses captures Originally, this was kinda half-allowed. There were some primitive checks in place that looked at the span to see whether the input was likely a literal. These "source literal" checks are needed because the spans created during `format_args` parsing only make sense when it is indeed a literal that was written in the source code directly. This is orthogonal to the restriction that the first argument must be a "direct literal", not being exanpanded from macros. This restriction was imposed by [RFC 2795] on the basis of being too confusing. But this was only concerned with the argument of the invocation being a literal, not whether it was a source literal (maybe in spirit it meant it being a source literal, this is not clear to me). Since the original check only really cared about source literals (which is good enough to deny the `format_args!(concat!())` example), macros expanding to `format_args` invocations were able to use implicit captures if they spanned the string in a way that lead back to a source string. The "source literal" checks were not strict enough and caused ICEs in certain cases (see #106191). So I tightened it up in #106195 to really only work if it's a direct source literal. This caused the `indoc` crate to break. `indoc` transformed the source literal by removing whitespace, which made it not a "source literal" anymore (which is required to fix the ICE). But since `indoc` spanned the literal in ways that made the old check think that it's a literal, it was able to use implicit captures (which is useful and nice for the users of `indoc`). This commit properly seperates the previously introduced concepts of "source literal" and "direct literal" and therefore allows `indoc` invocations, which don't create "source literals" to use implicit captures again. Fixes #106191 [RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
2 parents 669e751 + 427aceb commit 2e7034e

11 files changed

+233
-63
lines changed

compiler/rustc_builtin_macros/src/format.rs

+44-35
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,21 @@ enum PositionUsedAs {
3636
}
3737
use PositionUsedAs::*;
3838

39+
struct MacroInput {
40+
fmtstr: P<Expr>,
41+
args: FormatArguments,
42+
/// Whether the first argument was a string literal or a result from eager macro expansion.
43+
/// If it's not a string literal, we disallow implicit arugment capturing.
44+
///
45+
/// This does not correspond to whether we can treat spans to the literal normally, as the whole
46+
/// invocation might be the result of another macro expansion, in which case this flag may still be true.
47+
///
48+
/// See [RFC 2795] for more information.
49+
///
50+
/// [RFC 2795]: https://rust-lang.github.io/rfcs/2795-format-args-implicit-identifiers.html#macro-hygiene
51+
is_direct_literal: bool,
52+
}
53+
3954
/// Parses the arguments from the given list of tokens, returning the diagnostic
4055
/// if there's a parse error so we can continue parsing other format!
4156
/// expressions.
@@ -45,11 +60,7 @@ use PositionUsedAs::*;
4560
/// ```text
4661
/// Ok((fmtstr, parsed arguments))
4762
/// ```
48-
fn parse_args<'a>(
49-
ecx: &mut ExtCtxt<'a>,
50-
sp: Span,
51-
tts: TokenStream,
52-
) -> PResult<'a, (P<Expr>, FormatArguments)> {
63+
fn parse_args<'a>(ecx: &mut ExtCtxt<'a>, sp: Span, tts: TokenStream) -> PResult<'a, MacroInput> {
5364
let mut args = FormatArguments::new();
5465

5566
let mut p = ecx.new_parser_from_tts(tts);
@@ -59,25 +70,21 @@ fn parse_args<'a>(
5970
}
6071

6172
let first_token = &p.token;
62-
let fmtstr = match first_token.kind {
63-
token::TokenKind::Literal(token::Lit {
64-
kind: token::LitKind::Str | token::LitKind::StrRaw(_),
65-
..
66-
}) => {
67-
// If the first token is a string literal, then a format expression
68-
// is constructed from it.
69-
//
70-
// This allows us to properly handle cases when the first comma
71-
// after the format string is mistakenly replaced with any operator,
72-
// which cause the expression parser to eat too much tokens.
73-
p.parse_literal_maybe_minus()?
74-
}
75-
_ => {
76-
// Otherwise, we fall back to the expression parser.
77-
p.parse_expr()?
78-
}
73+
74+
let fmtstr = if let token::Literal(lit) = first_token.kind && matches!(lit.kind, token::Str | token::StrRaw(_)) {
75+
// This allows us to properly handle cases when the first comma
76+
// after the format string is mistakenly replaced with any operator,
77+
// which cause the expression parser to eat too much tokens.
78+
p.parse_literal_maybe_minus()?
79+
} else {
80+
// Otherwise, we fall back to the expression parser.
81+
p.parse_expr()?
7982
};
8083

84+
// Only allow implicit captures to be used when the argument is a direct literal
85+
// instead of a macro expanding to one.
86+
let is_direct_literal = matches!(fmtstr.kind, ExprKind::Lit(_));
87+
8188
let mut first = true;
8289

8390
while p.token != token::Eof {
@@ -147,17 +154,19 @@ fn parse_args<'a>(
147154
}
148155
}
149156
}
150-
Ok((fmtstr, args))
157+
Ok(MacroInput { fmtstr, args, is_direct_literal })
151158
}
152159

153-
pub fn make_format_args(
160+
fn make_format_args(
154161
ecx: &mut ExtCtxt<'_>,
155-
efmt: P<Expr>,
156-
mut args: FormatArguments,
162+
input: MacroInput,
157163
append_newline: bool,
158164
) -> Result<FormatArgs, ()> {
159165
let msg = "format argument must be a string literal";
160-
let unexpanded_fmt_span = efmt.span;
166+
let unexpanded_fmt_span = input.fmtstr.span;
167+
168+
let MacroInput { fmtstr: efmt, mut args, is_direct_literal } = input;
169+
161170
let (fmt_str, fmt_style, fmt_span) = match expr_to_spanned_string(ecx, efmt, msg) {
162171
Ok(mut fmt) if append_newline => {
163172
fmt.0 = Symbol::intern(&format!("{}\n", fmt.0));
@@ -208,11 +217,11 @@ pub fn make_format_args(
208217
}
209218
}
210219

211-
let is_literal = parser.is_literal;
220+
let is_source_literal = parser.is_source_literal;
212221

213222
if !parser.errors.is_empty() {
214223
let err = parser.errors.remove(0);
215-
let sp = if is_literal {
224+
let sp = if is_source_literal {
216225
fmt_span.from_inner(InnerSpan::new(err.span.start, err.span.end))
217226
} else {
218227
// The format string could be another macro invocation, e.g.:
@@ -230,7 +239,7 @@ pub fn make_format_args(
230239
if let Some(note) = err.note {
231240
e.note(&note);
232241
}
233-
if let Some((label, span)) = err.secondary_label && is_literal {
242+
if let Some((label, span)) = err.secondary_label && is_source_literal {
234243
e.span_label(fmt_span.from_inner(InnerSpan::new(span.start, span.end)), label);
235244
}
236245
if err.should_be_replaced_with_positional_argument {
@@ -256,7 +265,7 @@ pub fn make_format_args(
256265
}
257266

258267
let to_span = |inner_span: rustc_parse_format::InnerSpan| {
259-
is_literal.then(|| {
268+
is_source_literal.then(|| {
260269
fmt_span.from_inner(InnerSpan { start: inner_span.start, end: inner_span.end })
261270
})
262271
};
@@ -304,7 +313,7 @@ pub fn make_format_args(
304313
// Name not found in `args`, so we add it as an implicitly captured argument.
305314
let span = span.unwrap_or(fmt_span);
306315
let ident = Ident::new(name, span);
307-
let expr = if is_literal {
316+
let expr = if is_direct_literal {
308317
ecx.expr_ident(span, ident)
309318
} else {
310319
// For the moment capturing variables from format strings expanded from macros is
@@ -814,7 +823,7 @@ fn report_invalid_references(
814823
// for `println!("{7:7$}", 1);`
815824
indexes.sort();
816825
indexes.dedup();
817-
let span: MultiSpan = if !parser.is_literal || parser.arg_places.is_empty() {
826+
let span: MultiSpan = if !parser.is_source_literal || parser.arg_places.is_empty() {
818827
MultiSpan::from_span(fmt_span)
819828
} else {
820829
MultiSpan::from_spans(invalid_refs.iter().filter_map(|&(_, span, _, _)| span).collect())
@@ -855,8 +864,8 @@ fn expand_format_args_impl<'cx>(
855864
) -> Box<dyn base::MacResult + 'cx> {
856865
sp = ecx.with_def_site_ctxt(sp);
857866
match parse_args(ecx, sp, tts) {
858-
Ok((efmt, args)) => {
859-
if let Ok(format_args) = make_format_args(ecx, efmt, args, nl) {
867+
Ok(input) => {
868+
if let Ok(format_args) = make_format_args(ecx, input, nl) {
860869
MacEager::expr(ecx.expr(sp, ExprKind::FormatArgs(P(format_args))))
861870
} else {
862871
MacEager::expr(DummyResult::raw_expr(sp, true))

compiler/rustc_parse_format/src/lib.rs

+45-9
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
// We want to be able to build this crate with a stable compiler, so no
1515
// `#![feature]` attributes should be added.
1616

17+
use rustc_lexer::unescape;
1718
pub use Alignment::*;
1819
pub use Count::*;
1920
pub use Piece::*;
@@ -234,8 +235,10 @@ pub struct Parser<'a> {
234235
last_opening_brace: Option<InnerSpan>,
235236
/// Whether the source string is comes from `println!` as opposed to `format!` or `print!`
236237
append_newline: bool,
237-
/// Whether this formatting string is a literal or it comes from a macro.
238-
pub is_literal: bool,
238+
/// Whether this formatting string was written directly in the source. This controls whether we
239+
/// can use spans to refer into it and give better error messages.
240+
/// N.B: This does _not_ control whether implicit argument captures can be used.
241+
pub is_source_literal: bool,
239242
/// Start position of the current line.
240243
cur_line_start: usize,
241244
/// Start and end byte offset of every line of the format string. Excludes
@@ -262,7 +265,7 @@ impl<'a> Iterator for Parser<'a> {
262265
} else {
263266
let arg = self.argument(lbrace_end);
264267
if let Some(rbrace_pos) = self.must_consume('}') {
265-
if self.is_literal {
268+
if self.is_source_literal {
266269
let lbrace_byte_pos = self.to_span_index(pos);
267270
let rbrace_byte_pos = self.to_span_index(rbrace_pos);
268271

@@ -302,7 +305,7 @@ impl<'a> Iterator for Parser<'a> {
302305
_ => Some(String(self.string(pos))),
303306
}
304307
} else {
305-
if self.is_literal {
308+
if self.is_source_literal {
306309
let span = self.span(self.cur_line_start, self.input.len());
307310
if self.line_spans.last() != Some(&span) {
308311
self.line_spans.push(span);
@@ -322,8 +325,8 @@ impl<'a> Parser<'a> {
322325
append_newline: bool,
323326
mode: ParseMode,
324327
) -> Parser<'a> {
325-
let input_string_kind = find_width_map_from_snippet(snippet, style);
326-
let (width_map, is_literal) = match input_string_kind {
328+
let input_string_kind = find_width_map_from_snippet(s, snippet, style);
329+
let (width_map, is_source_literal) = match input_string_kind {
327330
InputStringKind::Literal { width_mappings } => (width_mappings, true),
328331
InputStringKind::NotALiteral => (Vec::new(), false),
329332
};
@@ -339,7 +342,7 @@ impl<'a> Parser<'a> {
339342
width_map,
340343
last_opening_brace: None,
341344
append_newline,
342-
is_literal,
345+
is_source_literal,
343346
cur_line_start: 0,
344347
line_spans: vec![],
345348
}
@@ -532,13 +535,13 @@ impl<'a> Parser<'a> {
532535
'{' | '}' => {
533536
return &self.input[start..pos];
534537
}
535-
'\n' if self.is_literal => {
538+
'\n' if self.is_source_literal => {
536539
self.line_spans.push(self.span(self.cur_line_start, pos));
537540
self.cur_line_start = pos + 1;
538541
self.cur.next();
539542
}
540543
_ => {
541-
if self.is_literal && pos == self.cur_line_start && c.is_whitespace() {
544+
if self.is_source_literal && pos == self.cur_line_start && c.is_whitespace() {
542545
self.cur_line_start = pos + c.len_utf8();
543546
}
544547
self.cur.next();
@@ -890,6 +893,7 @@ impl<'a> Parser<'a> {
890893
/// written code (code snippet) and the `InternedString` that gets processed in the `Parser`
891894
/// in order to properly synthesise the intra-string `Span`s for error diagnostics.
892895
fn find_width_map_from_snippet(
896+
input: &str,
893897
snippet: Option<string::String>,
894898
str_style: Option<usize>,
895899
) -> InputStringKind {
@@ -902,8 +906,27 @@ fn find_width_map_from_snippet(
902906
return InputStringKind::Literal { width_mappings: Vec::new() };
903907
}
904908

909+
// Strip quotes.
905910
let snippet = &snippet[1..snippet.len() - 1];
906911

912+
// Macros like `println` add a newline at the end. That technically doens't make them "literals" anymore, but it's fine
913+
// since we will never need to point our spans there, so we lie about it here by ignoring it.
914+
// Since there might actually be newlines in the source code, we need to normalize away all trailing newlines.
915+
// If we only trimmed it off the input, `format!("\n")` would cause a mismatch as here we they actually match up.
916+
// Alternatively, we could just count the trailing newlines and only trim one from the input if they don't match up.
917+
let input_no_nl = input.trim_end_matches('\n');
918+
let Some(unescaped) = unescape_string(snippet) else {
919+
return InputStringKind::NotALiteral;
920+
};
921+
922+
let unescaped_no_nl = unescaped.trim_end_matches('\n');
923+
924+
if unescaped_no_nl != input_no_nl {
925+
// The source string that we're pointing at isn't our input, so spans pointing at it will be incorrect.
926+
// This can for example happen with proc macros that respan generated literals.
927+
return InputStringKind::NotALiteral;
928+
}
929+
907930
let mut s = snippet.char_indices();
908931
let mut width_mappings = vec![];
909932
while let Some((pos, c)) = s.next() {
@@ -986,6 +1009,19 @@ fn find_width_map_from_snippet(
9861009
InputStringKind::Literal { width_mappings }
9871010
}
9881011

1012+
fn unescape_string(string: &str) -> Option<string::String> {
1013+
let mut buf = string::String::new();
1014+
let mut ok = true;
1015+
unescape::unescape_literal(string, unescape::Mode::Str, &mut |_, unescaped_char| {
1016+
match unescaped_char {
1017+
Ok(c) => buf.push(c),
1018+
Err(_) => ok = false,
1019+
}
1020+
});
1021+
1022+
ok.then_some(buf)
1023+
}
1024+
9891025
// Assert a reasonable size for `Piece`
9901026
#[cfg(all(target_arch = "x86_64", target_pointer_width = "64"))]
9911027
rustc_data_structures::static_assert_size!(Piece<'_>, 16);

tests/ui/fmt/auxiliary/format-string-proc-macro.rs

+26-10
Original file line numberDiff line numberDiff line change
@@ -28,25 +28,41 @@ pub fn err_with_input_span(input: TokenStream) -> TokenStream {
2828
TokenStream::from(TokenTree::Literal(lit))
2929
}
3030

31+
fn build_format(args: impl Into<TokenStream>) -> TokenStream {
32+
TokenStream::from_iter([
33+
TokenTree::from(Ident::new("format", Span::call_site())),
34+
TokenTree::from(Punct::new('!', Spacing::Alone)),
35+
TokenTree::from(Group::new(Delimiter::Parenthesis, args.into())),
36+
])
37+
}
3138

3239
#[proc_macro]
3340
pub fn respan_to_invalid_format_literal(input: TokenStream) -> TokenStream {
3441
let mut s = Literal::string("{");
3542
s.set_span(input.into_iter().next().unwrap().span());
36-
TokenStream::from_iter([
37-
TokenTree::from(Ident::new("format", Span::call_site())),
38-
TokenTree::from(Punct::new('!', Spacing::Alone)),
39-
TokenTree::from(Group::new(Delimiter::Parenthesis, TokenTree::from(s).into())),
40-
])
43+
44+
build_format(TokenTree::from(s))
4145
}
4246

4347
#[proc_macro]
4448
pub fn capture_a_with_prepended_space_preserve_span(input: TokenStream) -> TokenStream {
4549
let mut s = Literal::string(" {a}");
4650
s.set_span(input.into_iter().next().unwrap().span());
47-
TokenStream::from_iter([
48-
TokenTree::from(Ident::new("format", Span::call_site())),
49-
TokenTree::from(Punct::new('!', Spacing::Alone)),
50-
TokenTree::from(Group::new(Delimiter::Parenthesis, TokenTree::from(s).into())),
51-
])
51+
52+
build_format(TokenTree::from(s))
53+
}
54+
55+
#[proc_macro]
56+
pub fn format_args_captures(_: TokenStream) -> TokenStream {
57+
r#"{ let x = 5; format!("{x}") }"#.parse().unwrap()
58+
}
59+
60+
#[proc_macro]
61+
pub fn bad_format_args_captures(_: TokenStream) -> TokenStream {
62+
r#"{ let x = 5; format!(concat!("{x}")) }"#.parse().unwrap()
63+
}
64+
65+
#[proc_macro]
66+
pub fn identity_pm(input: TokenStream) -> TokenStream {
67+
input
5268
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
// aux-build:format-string-proc-macro.rs
2+
3+
#[macro_use]
4+
extern crate format_string_proc_macro;
5+
6+
macro_rules! identity_mbe {
7+
($tt:tt) => {
8+
$tt
9+
//~^ ERROR there is no argument named `a`
10+
};
11+
}
12+
13+
fn main() {
14+
let a = 0;
15+
16+
format!(identity_pm!("{a}"));
17+
//~^ ERROR there is no argument named `a`
18+
format!(identity_mbe!("{a}"));
19+
format!(concat!("{a}"));
20+
//~^ ERROR there is no argument named `a`
21+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
error: there is no argument named `a`
2+
--> $DIR/format-args-capture-first-literal-is-macro.rs:16:26
3+
|
4+
LL | format!(identity_pm!("{a}"));
5+
| ^^^^^
6+
|
7+
= note: did you intend to capture a variable `a` from the surrounding scope?
8+
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
9+
10+
error: there is no argument named `a`
11+
--> $DIR/format-args-capture-first-literal-is-macro.rs:8:9
12+
|
13+
LL | $tt
14+
| ^^^
15+
|
16+
= note: did you intend to capture a variable `a` from the surrounding scope?
17+
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
18+
19+
error: there is no argument named `a`
20+
--> $DIR/format-args-capture-first-literal-is-macro.rs:19:13
21+
|
22+
LL | format!(concat!("{a}"));
23+
| ^^^^^^^^^^^^^^
24+
|
25+
= note: did you intend to capture a variable `a` from the surrounding scope?
26+
= note: to avoid ambiguity, `format_args!` cannot capture variables when the format string is expanded from a macro
27+
= note: this error originates in the macro `concat` (in Nightly builds, run with -Z macro-backtrace for more info)
28+
29+
error: aborting due to 3 previous errors
30+

0 commit comments

Comments
 (0)