Unclosed string literal not properly handled by StdLexical

In `scala.util.parsing.combinator.lexical.StdLexical` handling of unclosed / unterminated string literals does not seem to work as expected.

The token parser declared at the top of the code looks like this and is supposed to handle unclosed string literals by returning the value  `ErrorToken(unclosed string literal)`:

```scala
def token: Parser[Token] =
  ( identChar ~ rep( identChar | digit )              ^^ { case first ~ rest => processIdent(first :: rest mkString "") }
  | digit ~ rep( digit )                              ^^ { case first ~ rest => NumericLit(first :: rest mkString "") }
  | '\'' ~ rep( chrExcept('\'', '\n', EofCh) ) ~ '\'' ^^ { case '\'' ~ chars ~ '\'' => StringLit(chars mkString "") }
  | '\"' ~ rep( chrExcept('\"', '\n', EofCh) ) ~ '\"' ^^ { case '\"' ~ chars ~ '\"' => StringLit(chars mkString "") }
  | EofCh                                             ^^^ EOF
  | '\'' ~> failure("unclosed string literal")
  | '\"' ~> failure("unclosed string literal")
  | delim
  | failure("illegal character")
  )
```

Here is a simple setup trying to use `StdLexical`:

```scala
object Lexer extends App {
    def lex(input: String) = {
        val lexer = new StdLexical
        var scanner: Reader[lexer.Token] = new lexer.Scanner(input)
        while (!scanner.atEnd) {
            println(scanner.first)
            scanner = scanner.rest
        }
    }
}
```

Now, calling that with legal input works, here recognising an identifier and a string literal:

```
> lex(""" hello "world" """)
identifier hello
"world"
```

Passing an illegal character also works as expected:

```
> lex(""" hello € "world" """)
identifier hello
ErrorToken(illegal character)
"world"
```

However, the rule for an unterminated double (and single) qouted string does not seem to work and the lexer produces an ErrorToken(end of input) instead of the expected ErrorToken(unclosed string literal):

```
> lex(""" hello € "unterminated """)
identifier hello
ErrorToken(illegal character)
ErrorToken(end of input)
```

I guessed the problem was that the rules for unterminated strings use the `failure` parser that allows backtracking but that should have sent us to the last `failure("illegal character")` and btw inserting a cut (`~!`) or alternatively using `err` instead of `failure` doesn't fix the issue.

**EDIT**

Parsing a string with a single quote character at the very end returns the expected token:

```
> lex(""" hello € """")
identifier hello
ErrorToken(illegal character)
ErrorToken(unclosed string literal)
```

But adding any character to the unclosed string literal causes  `ErrorToken(end of input)` to be emitted.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unclosed string literal not properly handled by StdLexical #397

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unclosed string literal not properly handled by StdLexical #397

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions