Releases · ruby/prism

This release brings us very close to being able to parse most of our test targets. The remaining failures are tracked in #562. As of this tag we are at:

ruby/spec - 4662/4663 - 99.98%
rails/rails - 3081/3083 - 99.94%
discourse/discourse - 4423/4434 - 99.75%
ruby/ruby - 7823/7848 = 99.68%

The only syntax feature we have left is pattern matching. The rest of the issues are bug fixes. At this point we feel comfortable beginning the work of experimenting with integrating this parser into existing tools while we continue to finish the other pieces.

Assets 2

06 Jan 21:23

kddnewton

v0.3.0

ec22bc7

v0.3.0

New stuff

We now have a MissingNode that is inserted when an expression is expected but not present
We now keep a stack of contexts around as we're parsing. If a token is found that is unexpected that would close out a parent context, we put the parser into "recovering" mode and insert missing nodes until we get back up to the parent node
We now expose CommentNode nodes in the parse result, along with handling __END__ and =begin..=end syntax
We now support multiple encodings, and parse encoding magic comments to determine the encoding
Much more documentation has been added
Parse symbols in SymbolNode and InterpolatedSymbolNode
There's now a pack parser baked into YARP
We now template out Java classes based on the serialization
We now parse regular expressions to get out locals generated from capture groups
AliasNode and UndefNode
RestParameterNode, KeywordParameterNode, BlockParameterNode, ForwardingParameterNode, NoKeywordsParameterNode
BeginNode
SClassNode
ForNode
MultiTargetNode which is the beginnings of parsing multiple assignment
ParenthesesNode
We now support providing the escaped version of a string

Changed stuff

String lists with interpolation are now part of ArrayNode
IfNode now also functions in the elsif context
Now we parse else after unless
Use 32 bits for locations and lengths to shorten serialized string
We now support endless method definitions

Fixed stuff

Handle more underscores in number literals
Better handle identifiers that end in =
Better handle \r\n line terminators
Handle escaped terminators in strings

Assets 2

21 Oct 20:47

kddnewton

v0.2.0

906b5b3

v0.2.0

Lots of updates since the last release 2 weeks ago, so I felt like it was time for another tag.

I updated the codebase to try to enforce a little consistency in the function naming. Basically each struct should have an _alloc, _init, and _free variant. By separating out _alloc and _init it makes it a little more clear what's going on when some structs can be either nested or living on their own.
I tried to simplify the config a bit to make it more maintainable going forward. The list of nodes is growing and it's getting more difficult to read at a glance. I also added comments to all of the nodes that we have so far to hopefully better communicate what they each mean semantically.
I've added tokens that can be optional on nodes, which means we now have a YP_TOKEN_NOT_PROVIDED. This token type is meant to indicate that a token could be here but wasn't found in the source and it's not an error (think parentheses on method calls).
I added some code to walk the tree in C and prettyprint the nodes/tokens/strings. It makes it much easier to debug what's going on when developing.
We're now using rake to handle everything, which makes for a much streamlined development experience. Thanks @tenderlove!
We now correctly parse superclasses, after fixing a binding power issue.
I started work on encodings. We have ASCII and UTF-8 baked in now. I'm not 100% sure if this is the way I actually want to go, but it works for now. Lots more experimentation needed on this.

Now for the new nodes:

BreakNode, NextNode, SuperNode, YieldNode
ForwardingSuperNode
StringListNode, StringNode, InterpolatedStringNode, StringInterpolatedNode
ConstantPathWriteNode
RangeNode
DefinedNode
ElseNode
DefNode
RequiredParameterNode, OptionalParameterNode, KeywordParameterNode

And a couple of changes:

CharacterLiteralNode no longer exists, as it's just a StringNode.
CallNode now works for methods with ? and !, operators of ::, and the .() shorthand
CallNode now tracks its parentheses

Contributors

tenderlove

Assets 2

07 Oct 17:36

kddnewton

v0.1.0

4ae838d

v0.1.0

Creating a new release to give a status update of where we're at, and where we're going.

I want to say first that there are a lot of things in progress, and a lot of things that will be removed/changed by the time we move forward. Those things include:

Currently the entire parser is modeled as a pratt parser, which is straight up incorrect. The Ruby grammar doesn't work that way. That being said, it's much easier to add new nodes and test things without getting into the intricate details of where they are allowed to show up in the tree. So for now, it's staying. But it will be paired down to just what is known as the "arg" production rule in the current CRuby grammar eventually.
We're generating a lot of code at the moment. Some of that code I'm going to want to manage manually. For example, we're currently generating the functions that allocate every node type, and they have some basic knowledge of how to store their location information. This is all well and good, but the details of how to create these nodes is a little more complicated than is allowed by our templating engine. Eventually I'd like to not be generating that stuff. But since it's very easy to generate new nodes right now, I'm leaving it.
There are numerous bugs that I'm just ignoring at the moment. For example when you parse a class node, if you put a superclass the constant path gets parsed as a < method call. There are lots of these examples; we'll get to them all eventually.

That being said, here are some features that have been added since this project was started that are the beginnings of what we're going to have in the final product:

We're generating a shared library that has no dependencies on external projects our libraries. This is in place, being generated by a custom makefile. At the moment it's called librubyparser, but I'm open to literally any other name.
The shared library basically has two workflows:
- yp_parse - accepts a parser, returns a pointer to the root node of the tree that was parsed. Parse errors are added to the parser's error list as parsing is performed. The user is then free to use this node as they please.
- yp_serialize - accepts a parser, a node, and a buffer and serializes the node to a binary string on the buffer. The user is then free to use this buffer as they please.
We're generating a Ruby native extension library that allows interacting with the shared library from a Ruby context. We're using this for testing our parsing, which makes it not only helpful but necessary. This library includes definitions for all of the nodes in the tree much like syntax tree. All of the nodes can be queried, walked, and deconstructed. With the nodes in place, the library also provides a deserialization procedure for reading the binary string dumped by yp_serialize.
We've begun providing more documentation, and I intend on adding a lot more. Some nodes have documentation now, but I want every node to have clear, concise examples before this thing gets shipped. We've also been adding documentation to all of the C functions.
We have some basic error recovery, with plans for much more as we get further underway. At the moment, if a token is expected in a particular position, we can recover from that by replacing it with a missing token. We have decent error reporting now, which can be accessed through the C and Ruby interfaces.
We're comparing our lex output to ripper's at the moment and getting close to parity. There's a lot of state stored in the CRuby lexer (probably to make it easier to interface with bison) that we're not storing, which means it's difficult to get full compatibility, but we'll get there eventually.
We're tracking the scope of local variables in various scope nodes through the tree (currently 1 at the top level, 1 for each class node, and one for each module node). This is necessary for parsing super complicated examples like the ones we saw in tric this year.
A lot of the nodes in the tree have been simplified and made more semantic than their ripper/syntax tree counterparts. For example, Assignment is a node in syntax tree, but in YARP it has been split into {Class,Global,Local,Instance}VariableWrite and CallNode where appropriate. On the other hand, some nodes have been eliminated by collapsing them into common ones, like If and IfModifier being a part of the same node. The #1 goal here is making it easier and faster to compile once this is integrated into CRuby.

There are a ton of things we're still working on, but top-of-mind for the near future includes:

Error recovery at the node level. This involves tracking context as a stack and allowing parent nodes to recover from unexpected tokens if one is found. This largely mirrors the approach described here.
Just more nodes. We have a ton of stuff still to implement. I'd like to get method definitions in place soon, because that'll start to allow us to parse very basic full files. Also because method definitions involve a ton of different subnodes like *, **, &, massign, etc.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New stuff

Changed stuff

Fixed stuff

Contributors

Releases: ruby/prism

v0.6.0

v0.5.0

v0.4.0

v0.3.0

New stuff

Changed stuff

Fixed stuff

v0.2.0

Contributors

v0.1.0