Releases: ruby/prism
v0.6.0
v0.5.0
v0.4.0
This release brings us very close to being able to parse most of our test targets. The remaining failures are tracked in #562. As of this tag we are at:
ruby/spec - 4662/4663 - 99.98%
rails/rails - 3081/3083 - 99.94%
discourse/discourse - 4423/4434 - 99.75%
ruby/ruby - 7823/7848 = 99.68%
The only syntax feature we have left is pattern matching. The rest of the issues are bug fixes. At this point we feel comfortable beginning the work of experimenting with integrating this parser into existing tools while we continue to finish the other pieces.
v0.3.0
New stuff
- We now have a
MissingNode
that is inserted when an expression is expected but not present - We now keep a stack of contexts around as we're parsing. If a token is found that is unexpected that would close out a parent context, we put the parser into "recovering" mode and insert missing nodes until we get back up to the parent node
- We now expose
CommentNode
nodes in the parse result, along with handling__END__
and=begin..=end
syntax - We now support multiple encodings, and parse encoding magic comments to determine the encoding
- Much more documentation has been added
- Parse symbols in
SymbolNode
andInterpolatedSymbolNode
- There's now a pack parser baked into YARP
- We now template out Java classes based on the serialization
- We now parse regular expressions to get out locals generated from capture groups
AliasNode
andUndefNode
RestParameterNode
,KeywordParameterNode
,BlockParameterNode
,ForwardingParameterNode
,NoKeywordsParameterNode
BeginNode
SClassNode
ForNode
MultiTargetNode
which is the beginnings of parsing multiple assignmentParenthesesNode
- We now support providing the escaped version of a string
Changed stuff
- String lists with interpolation are now part of
ArrayNode
IfNode
now also functions in theelsif
context- Now we parse
else
afterunless
- Use 32 bits for locations and lengths to shorten serialized string
- We now support endless method definitions
Fixed stuff
- Handle more underscores in number literals
- Better handle identifiers that end in
=
- Better handle \r\n line terminators
- Handle escaped terminators in strings
v0.2.0
Lots of updates since the last release 2 weeks ago, so I felt like it was time for another tag.
- I updated the codebase to try to enforce a little consistency in the function naming. Basically each struct should have an
_alloc
,_init
, and_free
variant. By separating out_alloc
and_init
it makes it a little more clear what's going on when some structs can be either nested or living on their own. - I tried to simplify the config a bit to make it more maintainable going forward. The list of nodes is growing and it's getting more difficult to read at a glance. I also added comments to all of the nodes that we have so far to hopefully better communicate what they each mean semantically.
- I've added tokens that can be optional on nodes, which means we now have a
YP_TOKEN_NOT_PROVIDED
. This token type is meant to indicate that a token could be here but wasn't found in the source and it's not an error (think parentheses on method calls). - I added some code to walk the tree in C and prettyprint the nodes/tokens/strings. It makes it much easier to debug what's going on when developing.
- We're now using
rake
to handle everything, which makes for a much streamlined development experience. Thanks @tenderlove! - We now correctly parse superclasses, after fixing a binding power issue.
- I started work on encodings. We have ASCII and UTF-8 baked in now. I'm not 100% sure if this is the way I actually want to go, but it works for now. Lots more experimentation needed on this.
Now for the new nodes:
BreakNode
,NextNode
,SuperNode
,YieldNode
ForwardingSuperNode
StringListNode
,StringNode
,InterpolatedStringNode
,StringInterpolatedNode
ConstantPathWriteNode
RangeNode
DefinedNode
ElseNode
DefNode
RequiredParameterNode
,OptionalParameterNode
,KeywordParameterNode
And a couple of changes:
CharacterLiteralNode
no longer exists, as it's just aStringNode
.CallNode
now works for methods with?
and!
, operators of::
, and the.()
shorthandCallNode
now tracks its parentheses
v0.1.0
Creating a new release to give a status update of where we're at, and where we're going.
I want to say first that there are a lot of things in progress, and a lot of things that will be removed/changed by the time we move forward. Those things include:
- Currently the entire parser is modeled as a pratt parser, which is straight up incorrect. The Ruby grammar doesn't work that way. That being said, it's much easier to add new nodes and test things without getting into the intricate details of where they are allowed to show up in the tree. So for now, it's staying. But it will be paired down to just what is known as the "arg" production rule in the current CRuby grammar eventually.
- We're generating a lot of code at the moment. Some of that code I'm going to want to manage manually. For example, we're currently generating the functions that allocate every node type, and they have some basic knowledge of how to store their location information. This is all well and good, but the details of how to create these nodes is a little more complicated than is allowed by our templating engine. Eventually I'd like to not be generating that stuff. But since it's very easy to generate new nodes right now, I'm leaving it.
- There are numerous bugs that I'm just ignoring at the moment. For example when you parse a class node, if you put a superclass the constant path gets parsed as a
<
method call. There are lots of these examples; we'll get to them all eventually.
That being said, here are some features that have been added since this project was started that are the beginnings of what we're going to have in the final product:
- We're generating a shared library that has no dependencies on external projects our libraries. This is in place, being generated by a custom makefile. At the moment it's called librubyparser, but I'm open to literally any other name.
- The shared library basically has two workflows:
yp_parse
- accepts a parser, returns a pointer to the root node of the tree that was parsed. Parse errors are added to the parser's error list as parsing is performed. The user is then free to use this node as they please.yp_serialize
- accepts a parser, a node, and a buffer and serializes the node to a binary string on the buffer. The user is then free to use this buffer as they please.
- We're generating a Ruby native extension library that allows interacting with the shared library from a Ruby context. We're using this for testing our parsing, which makes it not only helpful but necessary. This library includes definitions for all of the nodes in the tree much like syntax tree. All of the nodes can be queried, walked, and deconstructed. With the nodes in place, the library also provides a deserialization procedure for reading the binary string dumped by
yp_serialize
. - We've begun providing more documentation, and I intend on adding a lot more. Some nodes have documentation now, but I want every node to have clear, concise examples before this thing gets shipped. We've also been adding documentation to all of the C functions.
- We have some basic error recovery, with plans for much more as we get further underway. At the moment, if a token is expected in a particular position, we can recover from that by replacing it with a missing token. We have decent error reporting now, which can be accessed through the C and Ruby interfaces.
- We're comparing our lex output to ripper's at the moment and getting close to parity. There's a lot of state stored in the CRuby lexer (probably to make it easier to interface with bison) that we're not storing, which means it's difficult to get full compatibility, but we'll get there eventually.
- We're tracking the scope of local variables in various scope nodes through the tree (currently 1 at the top level, 1 for each class node, and one for each module node). This is necessary for parsing super complicated examples like the ones we saw in tric this year.
- A lot of the nodes in the tree have been simplified and made more semantic than their ripper/syntax tree counterparts. For example,
Assignment
is a node in syntax tree, but in YARP it has been split into{Class,Global,Local,Instance}VariableWrite
andCallNode
where appropriate. On the other hand, some nodes have been eliminated by collapsing them into common ones, likeIf
andIfModifier
being a part of the same node. The #1 goal here is making it easier and faster to compile once this is integrated into CRuby.
There are a ton of things we're still working on, but top-of-mind for the near future includes:
- Error recovery at the node level. This involves tracking context as a stack and allowing parent nodes to recover from unexpected tokens if one is found. This largely mirrors the approach described here.
- Just more nodes. We have a ton of stuff still to implement. I'd like to get method definitions in place soon, because that'll start to allow us to parse very basic full files. Also because method definitions involve a ton of different subnodes like *, **, &, massign, etc.