Skip to content

Tight binding of SwiLex to SwiParse makes some jobs difficult. #1

@dabrahams

Description

@dabrahams

For example, I want to keep track of source location (line, column, etc.) and bundle that in with my tokens. One way I can do that is to not have any separators, recognize whitespace explicitly,

  case horizontal_whitespace = #"[ \t\r]+"#
  case newlines =              #"\n+"#
  case illegal =               #"."#

and then filter the tokens myself:

/// Returns a `Token` for each lexically significant unit in `sourceText`.
func tokenize(sourceText: String) -> [MyToken] {
  var r: [MyToken] = []
  var tokenStart = PositionInSourceFile(line: 1, column: 1)
  var scanner = SwiLex<Terminal>()
  
  for t in try! scanner.lex(input: sourceText) {
    switch t.type {
    case .one_line_comment, .eof, .none:
      () // ignored
    case .horizontal_whitespace:
      tokenStart.column += t.value.count
    case .newlines:
      tokenStart.column = 1
      tokenStart.line += t.value.utf8.count
    default:
      var tokenEnd = tokenStart
      tokenEnd.column += t.value.count
      r.append(
        MyToken(kind: t.type, text: t.value, location: tokenStart..<tokenEnd))
    }
  }
  return r 
}

But there's really no way to interpose any of this into the process. Ideally the parser would be able to operate on a generic token type and could take a token stream rather than a string as input.

I can probably get out of this particular jam by mapping substring indices back into source locations, but it's quite awkward, and I'm pretty sure people will want to make their own lexers.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions