-
Notifications
You must be signed in to change notification settings - Fork 131
Unclosed string literal not properly handled by StdLexical #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Have been debugging this with @JakobLyngPetersen and seem to have found the cause. Will make a pull request. |
@SethTisue do you use milestones for the project and if yes, which should be assigned to this issue (something like 2.0.1)? |
We're not using milestones currently, we're just merging pull requests as they come in and doing releases when someone asks. |
@SethTisue we now have a fix for unterminated string literals. Note about EofChFirst of all, it seems Fix for Unclosed StringsTo fix the unclosed strings, instead of having these token rules… class StdLexical extends Lexical with StdTokens {
// see `token` in `Scanners`
def token: Parser[Token] =
...
| '\'' ~ rep( chrExcept('\'', '\n', EofCh) ) ~ '\'' ^^ { case '\'' ~ chars ~ '\'' => StringLit(chars mkString "") }
| '\"' ~ rep( chrExcept('\"', '\n', EofCh) ) ~ '\"' ^^ { case '\"' ~ chars ~ '\"' => StringLit(chars mkString "") }
| EofCh ^^^ EOF
| '\'' ~> failure("unclosed string literal")
| '\"' ~> failure("unclosed string literal")
...
)
...
} …we now have: class StdLexical extends Lexical with StdTokens {
// see `token` in `Scanners`
def token: Parser[Token] =
...
| '\'' ~> rep( chrExcept('\'', '\n') ) >> { chars => stringEnd('\'', chars) }
| '\"' ~> rep( chrExcept('\"', '\n') ) >> { chars => stringEnd('\"', chars) }
| EofCh ^^^ EOF
...
)
...
/** Parses the final quote of a string literal or fails if it is unterminated. */
protected def stringEnd(quoteChar: Char, chars: List[Char]): Parser[Token] = {
{ elem(quoteChar) ^^^ StringLit(chars mkString "") } | err("unclosed string literal")
}
...
} This works, breaks no existing tests, makes our new tests happy, and Fix for Unclosed Multi-Line CommentsWe have tried to implement a similar fix for unclosed multi-line comments by changing this def whitespace: Parser[Any] = rep[Any](
whitespaceChar
| '/' ~ '*' ~ comment
| '/' ~ '/' ~ rep( chrExcept(EofCh, '\n') )
| '/' ~ '*' ~ failure("unclosed comment")
) to this def whitespace: Parser[Any] = rep[Any](
whitespaceChar
| '/' ~ '*' ~ comment
| '/' ~ '/' ~ rep( chrExcept(EofCh, '\n') )
| '/' ~ '*' ~ rep( elem("", _ => true) ) ~> err("unclosed comment")
) This works, makes our new tests happy, and the However, this breaks test
expects:
but now gets:
Basically the position of the error is reported at the end of the comment instead of at the beginning – probably because the whitespace rule previously did not return an Test The question is whether the test behaviour with the changed comment handling in What to do?We could leave out the comment fix and put it in a separate issue and only make a PR for the string issue for now. What do you think? |
@SethTisue I have created a separate issue for unclosed comment handling in |
fixed by #402 |
In
scala.util.parsing.combinator.lexical.StdLexical
handling of unclosed / unterminated string literals does not seem to work as expected.The token parser declared at the top of the code looks like this and is supposed to handle unclosed string literals by returning the value
ErrorToken(unclosed string literal)
:Here is a simple setup trying to use
StdLexical
:Now, calling that with legal input works, here recognising an identifier and a string literal:
Passing an illegal character also works as expected:
However, the rule for an unterminated double (and single) qouted string does not seem to work and the lexer produces an ErrorToken(end of input) instead of the expected ErrorToken(unclosed string literal):
I guessed the problem was that the rules for unterminated strings use the
failure
parser that allows backtracking but that should have sent us to the lastfailure("illegal character")
and btw inserting a cut (~!
) or alternatively usingerr
instead offailure
doesn't fix the issue.EDIT
Parsing a string with a single quote character at the very end returns the expected token:
But adding any character to the unclosed string literal causes
ErrorToken(end of input)
to be emitted.The text was updated successfully, but these errors were encountered: