-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up PyDough subcollection nodes in AST module #19
Set up PyDough subcollection nodes in AST module #19
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! See my comments and possible typos I found, but overall this looks good.
): | ||
super().__init__(subcollection_property.other_collection) | ||
self._parent: PyDoughAST = parent | ||
self._subcollection_property: SubcollectionRelationshipMetadata = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this issue of being too verbose I think we could split this into declaration and assignment. Something like
_subcollection_property: SubcollectionRelationshipMetadata
def __init__(
self,
parent: PyDoughCollectionAST,
subcollection_property: SubcollectionRelationshipMetadata,
):
_subcollection_property = subcollection_property
Not urgent but might help with readability, especially since some of these assignments are split onto multiple lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would make it a class attribute rather than an instance-level one (unless we use @dataclass
). I think the best I can do is:
def __init__(
self,
parent: PyDoughCollectionAST,
subcollection_property: SubcollectionRelationshipMetadata,
):
self._subcollection_property: SubcollectionRelationshipMetadata
self._subcollection_property = subcollection_property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UGH... I just realized that doing the way I suggested causes problems with the type annotation propagation if I do it to other things, like this:
self._collection: CollectionMetadata = collection
self._ancestor: PyDoughCollectionAST = ancestor
self._predecessor: PyDoughCollectionAST | None = predecessor
self._properties: Dict[str, Tuple[int | None, PyDoughAST]] | None = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best we can do maybe is just split it up sometimes :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay this is a very minor issue. No need to waste time on this.
the subcollection chain, as well as the name it had within that | ||
regular collection. | ||
""" | ||
self.properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a hacky trick to ensure that the lazy evaluation of self.properties
was completed before this property was accessed, since the first time self.properties
is evaluated it populates this property.
Do you have a suggestion of an alternative way to do this that isn't so funky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to push back on this, but I think I would remove _inheritance_sources
all together. If you have a common function that "computes" all the properties I would just have that return a map/json and do something like return self.compute_properties["inheritance_sources"]
Then I would change @property
to @cached_property
since what you are doing is basically just caching the output, so it better for readability that it happens behind the scenes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that it is only derivable as part of the datastructures that are calculated during the derivation of the entirety of self.properties
; I don't think it can be calculated standalone without repeating all of the recursive computations.
The list of subcollection accesses used to define the compound | ||
relationship. | ||
""" | ||
self.properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a typo.
Adding more collections to the AST module for subcollection relationships (both regulars & compounds).
SubCollection
which is a subclass ofTableCollection
(inherits the same lazy scheme of property evaluation) with an associated property & parent collection AST node (the ancestor)CompoundSubCollection
which is a subclass ofSubCollection
for compound relationships. It fully unravels the compound relationship & any sub-compounds within it into a sequence ofSubCollection
invocations, with each inherited property having its location in the sequence & original name at that location defined. The key to this unravelling is a recursive methodpopulate_subcollection_chain
which is only called as part of the lazy evaluation ofproperties
.TableCollection
to now handle subcollections & compounds (still lazy, to avoid infinite cycles)SubCollectionInfo
(works for both regular & compounds)The handling of BACK references, including to inherited properties, will be handled in another PR. For now, the inherited properties are accessed "normally", but the information is available to be able to derive where they came from.