Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up PyDough subcollection nodes in AST module #19

Merged
merged 93 commits into from
Nov 4, 2024

Conversation

knassre-bodo
Copy link
Contributor

@knassre-bodo knassre-bodo commented Oct 29, 2024

Adding more collections to the AST module for subcollection relationships (both regulars & compounds).

  • A new Collection AST class SubCollection which is a subclass of TableCollection (inherits the same lazy scheme of property evaluation) with an associated property & parent collection AST node (the ancestor)
  • A new Collection AST class CompoundSubCollection which is a subclass of SubCollection for compound relationships. It fully unravels the compound relationship & any sub-compounds within it into a sequence of SubCollection invocations, with each inherited property having its location in the sequence & original name at that location defined. The key to this unravelling is a recursive method populate_subcollection_chain which is only called as part of the lazy evaluation of properties.
  • A way to build subcollections via the node builder.
  • Modifying the lazy evaluation of TableCollection to now handle subcollections & compounds (still lazy, to avoid infinite cycles)
  • A new TestInfo class SubCollectionInfo (works for both regular & compounds)

The handling of BACK references, including to inherited properties, will be handled in another PR. For now, the inherited properties are accessed "normally", but the information is available to be able to derive where they came from.

@knassre-bodo knassre-bodo requested a review from njriasan October 30, 2024 19:17
Copy link
Contributor

@njriasan njriasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! See my comments and possible typos I found, but overall this looks good.

):
super().__init__(subcollection_property.other_collection)
self._parent: PyDoughAST = parent
self._subcollection_property: SubcollectionRelationshipMetadata = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid this issue of being too verbose I think we could split this into declaration and assignment. Something like

_subcollection_property: SubcollectionRelationshipMetadata

def __init__(
         self,
         parent: PyDoughCollectionAST,
         subcollection_property: SubcollectionRelationshipMetadata,
     ):
         _subcollection_property = subcollection_property

Not urgent but might help with readability, especially since some of these assignments are split onto multiple lines.

Copy link
Contributor Author

@knassre-bodo knassre-bodo Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make it a class attribute rather than an instance-level one (unless we use @dataclass). I think the best I can do is:

def __init__(
         self,
         parent: PyDoughCollectionAST,
         subcollection_property: SubcollectionRelationshipMetadata,
     ):
         self._subcollection_property: SubcollectionRelationshipMetadata
         self._subcollection_property = subcollection_property

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UGH... I just realized that doing the way I suggested causes problems with the type annotation propagation if I do it to other things, like this:

        self._collection: CollectionMetadata = collection
        self._ancestor: PyDoughCollectionAST = ancestor
        self._predecessor: PyDoughCollectionAST | None = predecessor
        self._properties: Dict[str, Tuple[int | None, PyDoughAST]] | None = None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best we can do maybe is just split it up sometimes :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this is a very minor issue. No need to waste time on this.

the subcollection chain, as well as the name it had within that
regular collection.
"""
self.properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typo.

Copy link
Contributor Author

@knassre-bodo knassre-bodo Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a hacky trick to ensure that the lazy evaluation of self.properties was completed before this property was accessed, since the first time self.properties is evaluated it populates this property.

Do you have a suggestion of an alternative way to do this that isn't so funky?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to push back on this, but I think I would remove _inheritance_sources all together. If you have a common function that "computes" all the properties I would just have that return a map/json and do something like return self.compute_properties["inheritance_sources"]

Then I would change @property to @cached_property since what you are doing is basically just caching the output, so it better for readability that it happens behind the scenes.

Copy link
Contributor Author

@knassre-bodo knassre-bodo Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it is only derivable as part of the datastructures that are calculated during the derivation of the entirety of self.properties; I don't think it can be calculated standalone without repeating all of the recursive computations.

The list of subcollection accesses used to define the compound
relationship.
"""
self.properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typo.

Base automatically changed from kian/init_collections to kian/ast_expr_collection_part_1 November 4, 2024 21:27
@knassre-bodo knassre-bodo merged commit 616e1cf into kian/ast_expr_collection_part_1 Nov 4, 2024
2 checks passed
@knassre-bodo knassre-bodo deleted the kian/setup_collections branch November 4, 2024 21:30
knassre-bodo added a commit that referenced this pull request Nov 5, 2024
Combination of the following PRs:
- #17
- #20
- #19
- #21
- #22
- #24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants