Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Format multi-line strings and string interpolation. #1362

Merged
merged 3 commits into from
Jan 23, 2024

Conversation

munificent
Copy link
Member

@munificent munificent commented Jan 17, 2024

In the old style, the formatter has some special code to discard line splits that occur inside string interpolation expressions. That's largely for historical reasons because the formatter initially didn't support formatting of string interpolation expressions at all and I didn't want too much churn when adding support for formatting them.

In the new style here, we don't do that: The contents of a string interpolation expression are split like any other expression. In practice, it doesn't matter much since users generally reorganize their code to avoid long strings and splits in string interpolation. This way leads to less special case code in the formatter.

This change is somewhat large because I also reorganized how newlines inside lexemes are handled in general. Previously, TextPiece stored a list of "lines" to handle things like line comments preceding or following a token. But it was also possible for a single "line" string in that list to internally contain newline characters because of multi-line strings or block comments.

But those internal newlines also need to force surrounding code to split, so there was this "_containsNewline" bit that had to be plumbed through and tracked. Even so, there were latent bugs where the column calculation in CodeWriter would be incorrect if a line contained internal newlines because it just used to the length of the entire "line" string.

With this change, the "_lines" list in TextPiece really is a list of lines. We eagerly split any incoming lexeme into multiple lines before writing it to the TextPiece. I think the resulting code is simpler, it fixes the column calculation in CodeWriter, and it means the formatter will correctly normalize line endings even when they occur inside block comments or multiline strings.

This was a good time to test the line ending code, so I copied those existing tests over from short_format_test.dart. I went ahead and copied all of the unit tests from that file, even the ones not related to line endings, since they're all working and passing now.

This PR does not handle adjacent strings. Those have a decent amount of special handling not related to what's going on here, so I'll do those separately.

In the old style, the formatter has some special code to discard line
splits that occur inside string interpolation expressions. That's
largely for historical reasons because the formatter initially didn't
support formatting of string interpolation expressions *at all* and I
didn't want too much churn when adding support for formatting them.

In the new style here, we don't do that: The contents of a string
interpolation expression are split like any other expression. In
practice, it doesn't matter much since users generally reorganize their
code to avoid long strings and splits in string interpolation. This way
leads to less special case code in the formatter.

This change is somewhat large because I also reorganized how newlines
inside lexemes are handled in general. Previously, TextPiece stored a
list of "lines" to handle things like line comments preceding or
following a token. But it was also possible for a single "line" string
in that list to internally contain newline characters because of
multi-line strings or block comments.

But those internal newlines also need to force surrounding code to
split, so there was this "_containsNewline" bit that had to be plumbed
through and tracked. Even so, there were latent bugs where the column
calculation in CodeWriter would be incorrect if a line contained
internal newlines because it just used to the length of the entire
"line" string.

With this change, the "_lines" list in TextPiece really is a list of
lines. We eagerly split any incoming lexeme into multiple lines before
writing it to the TextPiece. I think the resulting code is simpler, it
fixes the column calculation in CodeWriter, and it means the formatter
will correctly normalize line endings even when they occur inside block
comments or multiline strings.

This was a good time to test the line ending code, so I copied those
existing tests over from short_format_test.dart. I went ahead and
copied all of the unit tests from that file, even the ones not related
to line endings, since they're all working and passing now.

This PR does *not* handle adjacent strings. Those have a decent amount
of special handling not related to what's going on here, so I'll do
those separately.
@munificent munificent merged commit 94f81dd into main Jan 23, 2024
7 checks passed
@munificent munificent deleted the format-multiline-and-interpolated-strings branch January 23, 2024 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants