Format multi-line strings and string interpolation. #1362

munificent · 2024-01-17T22:54:59Z

In the old style, the formatter has some special code to discard line splits that occur inside string interpolation expressions. That's largely for historical reasons because the formatter initially didn't support formatting of string interpolation expressions at all and I didn't want too much churn when adding support for formatting them.

In the new style here, we don't do that: The contents of a string interpolation expression are split like any other expression. In practice, it doesn't matter much since users generally reorganize their code to avoid long strings and splits in string interpolation. This way leads to less special case code in the formatter.

This change is somewhat large because I also reorganized how newlines inside lexemes are handled in general. Previously, TextPiece stored a list of "lines" to handle things like line comments preceding or following a token. But it was also possible for a single "line" string in that list to internally contain newline characters because of multi-line strings or block comments.

But those internal newlines also need to force surrounding code to split, so there was this "_containsNewline" bit that had to be plumbed through and tracked. Even so, there were latent bugs where the column calculation in CodeWriter would be incorrect if a line contained internal newlines because it just used to the length of the entire "line" string.

With this change, the "_lines" list in TextPiece really is a list of lines. We eagerly split any incoming lexeme into multiple lines before writing it to the TextPiece. I think the resulting code is simpler, it fixes the column calculation in CodeWriter, and it means the formatter will correctly normalize line endings even when they occur inside block comments or multiline strings.

This was a good time to test the line ending code, so I copied those existing tests over from short_format_test.dart. I went ahead and copied all of the unit tests from that file, even the ones not related to line endings, since they're all working and passing now.

This PR does not handle adjacent strings. Those have a decent amount of special handling not related to what's going on here, so I'll do those separately.

In the old style, the formatter has some special code to discard line splits that occur inside string interpolation expressions. That's largely for historical reasons because the formatter initially didn't support formatting of string interpolation expressions *at all* and I didn't want too much churn when adding support for formatting them. In the new style here, we don't do that: The contents of a string interpolation expression are split like any other expression. In practice, it doesn't matter much since users generally reorganize their code to avoid long strings and splits in string interpolation. This way leads to less special case code in the formatter. This change is somewhat large because I also reorganized how newlines inside lexemes are handled in general. Previously, TextPiece stored a list of "lines" to handle things like line comments preceding or following a token. But it was also possible for a single "line" string in that list to internally contain newline characters because of multi-line strings or block comments. But those internal newlines also need to force surrounding code to split, so there was this "_containsNewline" bit that had to be plumbed through and tracked. Even so, there were latent bugs where the column calculation in CodeWriter would be incorrect if a line contained internal newlines because it just used to the length of the entire "line" string. With this change, the "_lines" list in TextPiece really is a list of lines. We eagerly split any incoming lexeme into multiple lines before writing it to the TextPiece. I think the resulting code is simpler, it fixes the column calculation in CodeWriter, and it means the formatter will correctly normalize line endings even when they occur inside block comments or multiline strings. This was a good time to test the line ending code, so I copied those existing tests over from short_format_test.dart. I went ahead and copied all of the unit tests from that file, even the ones not related to line endings, since they're all working and passing now. This PR does *not* handle adjacent strings. Those have a decent amount of special handling not related to what's going on here, so I'll do those separately.

test/expression/string.stmt

lib/src/ast_extensions.dart

lib/src/back_end/code_writer.dart

lib/src/front_end/piece_writer.dart

lib/src/piece/piece.dart

test/tall_format_test.dart

munificent assigned natebosch and kallentu Jan 17, 2024

munificent requested review from natebosch and kallentu January 18, 2024 15:28

munificent unassigned natebosch and kallentu Jan 18, 2024

kallentu approved these changes Jan 18, 2024

View reviewed changes

test/expression/string.stmt Outdated Show resolved Hide resolved

Add test description.

f09bdc1

natebosch approved these changes Jan 23, 2024

View reviewed changes

Apply review feedback.

f914d0f

munificent merged commit 94f81dd into main Jan 23, 2024
7 checks passed

munificent deleted the format-multiline-and-interpolated-strings branch January 23, 2024 03:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Format multi-line strings and string interpolation. #1362

Format multi-line strings and string interpolation. #1362

munificent commented Jan 17, 2024 •

edited

Loading

Format multi-line strings and string interpolation. #1362

Format multi-line strings and string interpolation. #1362

Conversation

munificent commented Jan 17, 2024 • edited Loading

munificent commented Jan 17, 2024 •

edited

Loading