-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing Structural Types on the JVM #19
Comments
Byte RepresentationsFor the record, I also wanted to clarify some other ideas I've had. Another option would be to compact records into
Right, ok, there is one possibility. We implement e.g.
There are two main advantages of this: it's more compact than the uniform representation above; and, it's still a generic uniform representation (i.e. we don't have boundless Wow, this actually could work!!! |
This old runtime classes are no longer needed. At the moment, we just have the uniform type "Struct".
Improved Non-Uniform RepresentationAnother idea occurred. One of the problems with the non-uniform representation is that there are so many potential combinations we cannot pregenerate them all. In particular, the space of field names is gigantic. But, actually, we can attack this problem by employing uniform field names. For example, a record We can further reduce the space of
So, the collection of
This really reduces the space of possible implementations to the point where we could pre-generate all implementations of interest. Some issues:
But, overall, this approach has clear merit. |
The current pain points for the Java backend are: record and union types. I'm going to discuss the former here, since that is particularly awkward. The basic question is how to implement efficiently the following on the JVM:
The challenge, of course, is that Java does not give us any concept similar to a structural type. An
interface
is perhaps the closest thing we have. There are two main schools of thought here:Uniform Representation
In this case, we have a single type (e.g.
Struct
) for representing all record types. Under the scenes, this could use aHashMap
for example. Thus, our above program generates this Java:It should be pretty obvious that this is not particularly efficient. Some points:
HashMap
look up against aString
key.int
must be boxed asInteger
when stored in theStruct
.Point
and the anonymous record{int x, int y}
.Improvements. We can try to improve performance by avoiding an underlying
HashMap
. For example, by using anObject[]
where each field position determines the array index. Thus, in{int x, int y}
, fieldx
has index0
and fieldy
has index1
. This means a coercion is necessary when flowing into a type{int y, int x}
, but no coercion is needed when e.g.Point
flows into{int x, int y}
(note: sorting helps here). However, open records are now more complicated as, by definition, these do not correspond to an access with a known offset. We can mitigate this by including aString[]
reference which identifies each field name. Then, accessing an open record is a linear search through this array. We could try to sort this even, to reduce the overall time.Class Representation
Instead of a
struct
type, Java provides theclass
type. Therefore, it makes sense to try and use this by generating classes on demand. Under this scheme, our above program becomes:Here,
Point
is generated as expected with appropriate fields. Furthermore,Struct0
is generated to represent the anonymous record. It should be clear that this representation offer potential efficiency gains:Point
and the anonymous record{int x, int y}
we must turn an instance ofPoint
into an instance ofStruct0
. This is particular ominous when identical records are used across module. For example,Struct0
in this module is not the same asStruct0
in another module.get
/set
methods which acceptString
arguments. Then, lookup is performed on an underlyingString[]
which is not too expensive as this can be shared. Though it means every record has an extra reference.Whilst it seems this approach is potentially the most efficient, there are some genuine challenges. For example, how does this translate:
Do we create a new class for
Position
or just reusePoint
? LIkewise, what does this translate into:It seems the best translation we could do would be this:
Then, the type
LinkedList
just translates to an anonymousUnion
.Non-Uniform Representation
Finally, we come to what is potentially the ideal solution. The basic problem with the class-based representation is simply the non-uniformity between named and unnamed types. That is, we get
class Point
forPoint
butUnion
forLinkedList
. Notice that, for most other data types, we get a "structural" representation. For example, this:Translates to something like this (depending on how records are handled):
Therefore, what we want is a "structural" representation for records. This then means that all datatypes have a corresponding "structural" representation and this is what we see in the Java source. Whilst Java doesn't give us such a structural representation, we can attempt to mimick one. In fact, we already did that above by generating the
Struct0
andStruct1
types. There are two main parts:To understand how this works, we initially make a whole program assumption. That is, we assume at compile time we have access to the whole program and, in particular, all datatypes used. The compiler then emits a "runtime" file (e.g.
wy.java
or similar) which contains the structural representation of all records used within the program. This provides a structural representation for every type. The user can potentially control the name of the runtime file, etc. Eitherway, we need to compile the program or against it.We now relax the whole program assumption. We assume that dependencies form a tree. Thus, the runtime file provided with each dependency gives the representations needed for that dependency. For all those defined in an earlier dependency we can just reuse their implementations (as they must exist by definition). This generally works well, though in the case of a dependency dag we can encounter some inefficiencies.
We now consider a concrete example for illustration. Let us consider how the type
{int x, int y}
is implemented. Each of our structural representations extend a common base class:Then, our implementation looks like this:
Whilst this is a fairly heavy weight example, it does seem to work. We might also include all valid coercions betwee types in the runtime, though this could grow exponentially.
Finally, we might want a suitable naming scheme for our
Structs
. For{int x, int y}
we could useStruct$ix$iy
as a uniquely identifying name. This actually works quite well, to be fair.The text was updated successfully, but these errors were encountered: