-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
For example, I want to keep track of source location (line, column, etc.) and bundle that in with my tokens. One way I can do that is to not have any separators, recognize whitespace explicitly,
case horizontal_whitespace = #"[ \t\r]+"#
case newlines = #"\n+"#
case illegal = #"."#and then filter the tokens myself:
/// Returns a `Token` for each lexically significant unit in `sourceText`.
func tokenize(sourceText: String) -> [MyToken] {
var r: [MyToken] = []
var tokenStart = PositionInSourceFile(line: 1, column: 1)
var scanner = SwiLex<Terminal>()
for t in try! scanner.lex(input: sourceText) {
switch t.type {
case .one_line_comment, .eof, .none:
() // ignored
case .horizontal_whitespace:
tokenStart.column += t.value.count
case .newlines:
tokenStart.column = 1
tokenStart.line += t.value.utf8.count
default:
var tokenEnd = tokenStart
tokenEnd.column += t.value.count
r.append(
MyToken(kind: t.type, text: t.value, location: tokenStart..<tokenEnd))
}
}
return r
}But there's really no way to interpose any of this into the process. Ideally the parser would be able to operate on a generic token type and could take a token stream rather than a string as input.
I can probably get out of this particular jam by mapping substring indices back into source locations, but it's quite awkward, and I'm pretty sure people will want to make their own lexers.
Metadata
Metadata
Assignees
Labels
No labels