-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/operations/{operationId} output doesn't contain textual transformation expressions #801
Comments
Hi @metametametameta,
So instead of capturing expression text we capture its semantics to be able to render the most suitable human friendly representation on the UI depending on the use-case. |
Hi @wajda Thanks for explaining the rationale behind the structured representation vs. string representation. I can possibly do some tree -> text translation on my end, but just wanted to know if there's a metamodel I can work with? You mention structural attribute/expression model and ASG model which I'm not familiar with - is there a spec of some sort available for that? P.S. I looked around in the spline-ui code, and I'm guessing this is your metamodel for OpExpression? |
Yes, pretty much. But please keep in mind that this will change as of Spline 0.6. We're going to introduce an |
I don't expect to be dealing with anything other that Spark lineage right now and I think I can do something simple with the existing JSON model in 0.5.5 for the operation expressions. I interpreted your response to mean that new fields (or totally new REST endpoints) will be added in 0.6.0 which is great. But the REST output itself in 0.6.0 (for a Spark use case) will be backward compatible with 0.5.5 though, right? I aim to use any new REST API expression level output when they appear, but am also assuming that simply upgrading to 0.6.0 won't break my REST calls and JSON analysis. Is that a safe assumption? |
We always strive for maintaining the backward compatibility although it's not always easy, especially when introducing breaking changes to the domain model. Server-wise, it will be able to communicate to any agent starting from ver 0.4+. For the database, there is a Spline admin tool that will take care of the database migration, so your existing lineage data should be safe. Although minor discrepancies could occur in some places, but at least it should be safe on the datasets, jobs, and operations level. We'll do our best to preserve as many details as possible, but if some part of migration is too difficult or long to implement, we might decide to cut it there. After all we start our version numbers with 0 for a reason :) Agent-wise, 0.6 will be able to talk to 0.5 servers, but expression details will probably be sacrificed (we won't implement model downgrade). Producer REST model should be (fully or almost) identical up to the operation level. |
@wajda Thanks. I'll keep the expected changes in mind when upgrading to 0.6 |
Background
When a field in a project list is created or transformed in some fashion, there tends to be some sort of an original "expression" string that represents the transformation. For example, here is a simple field that gets created from two existing fields via a concat expression (Java sample)
So basically "name" is derived from "lname" and "fname" via some transformation. The attribute lineage itself is easy enough to figure out by analyzing the result of the /operations/{operationId} REST call.
However, it's not so easy to get back the human readable "text" of the transformation.
Example
In the JSON returned by the /operations/{operationId}, I see something like
Proposed Solution
Solution Ideas
As you can see, the original expression concat(df.col("lname"), lit(", "), df.col("fname")) is only represented in the REST JSON by its complex, structured version, which is not suitable for display to the user viewing the lineage. There is nothing else in the JSON I could find that has a textual version of the field-level transformation.
P.S. Note that in the Web UI for spline - you already seem to do something along these lines by showing an expression.
(Shown in Spline UI - but doesn't appear to be there in REST API output)
Transformations
λ = lname
λ = fname
λ = concatlname, , , fname AS name
`
So why not make the textual expressions above available via the REST API itself as that makes it useful for building a custom UI based purely on the REST API. I imagine adding an "expression" attribute (see my JSON snippet above) or similar would do the trick. "concatlname, , , fname AS name" above is obviously useful, but ideally having the original "concat(df.col("lname"), lit(", "), df.col("fname"))" is ideal if available (or even both versions)
The text was updated successfully, but these errors were encountered: