Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up PyDough collection nodes in AST module #20

Merged
merged 71 commits into from
Nov 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
71 commits
Select commit Hold shift + click to select a range
4c861a9
Setting up AST module, simple verifier, and basic test [RUN CI]
knassre-bodo Oct 24, 2024
3144e94
Added deducer initial impl and tests
knassre-bodo Oct 24, 2024
5b25976
Fixing AST invocation
knassre-bodo Oct 24, 2024
4b5b0b0
Merge branch 'kian/setup_type_verifiers' into kian/setup_type_deducers
knassre-bodo Oct 24, 2024
8c8eff0
Merge branch 'main' into kian/setup_type_verifiers
knassre-bodo Oct 25, 2024
bd4e5a2
Added messages to type test assertions
knassre-bodo Oct 25, 2024
be5bb3a
Updating after pulling from main [RUN CI]
knassre-bodo Oct 25, 2024
c4aa476
Merge branch 'kian/setup_type_verifiers' into kian/setup_type_deducers
knassre-bodo Oct 25, 2024
0a2a3a8
Revisions [RUN CI]
knassre-bodo Oct 25, 2024
2fd66f1
Added expression operators, binary operators, simple binop fixture-ba…
knassre-bodo Oct 25, 2024
9fb5101
Fixing tpye inference
knassre-bodo Oct 25, 2024
80c980f
Revising verifiers
knassre-bodo Oct 25, 2024
d70ae7d
Updating type deducers
knassre-bodo Oct 25, 2024
2626fe9
Removing merge conflicts [RUN CI]
knassre-bodo Oct 25, 2024
9cf3512
Adding operator registry import magic [RUN CI]
knassre-bodo Oct 28, 2024
a64349f
Adding operator registries, expression function calls
knassre-bodo Oct 28, 2024
657a2d4
[RUN CI]
knassre-bodo Oct 28, 2024
6331886
Initial impl WIP
knassre-bodo Oct 28, 2024
d35a934
Renaming expression function operators accordingly
knassre-bodo Oct 28, 2024
5c7a018
Merge branch 'kian/setup_expression_operators' into kian/setup_expres…
knassre-bodo Oct 28, 2024
1167458
Adding valid typing test
knassre-bodo Oct 28, 2024
5273f04
Adding node builder, info class for tests, and refactoring imports
knassre-bodo Oct 28, 2024
d12058a
Fixing import bug
knassre-bodo Oct 28, 2024
dd4c3bc
Compressing test imports
knassre-bodo Oct 28, 2024
1d501de
Refactoring metadata imports
knassre-bodo Oct 28, 2024
5eff4ac
Updating metadata imports
knassre-bodo Oct 28, 2024
c2b7d7c
Minor revisions
knassre-bodo Oct 28, 2024
8b92a7f
[RUN CI]
knassre-bodo Oct 28, 2024
1337863
[RUN CI]
knassre-bodo Oct 28, 2024
51b99d8
Updating types
knassre-bodo Oct 28, 2024
624e277
[RUN CI]
knassre-bodo Oct 28, 2024
5221fb0
Merge branch 'kian/update_metadata_imports' into kian/setup_type_dedu…
knassre-bodo Oct 28, 2024
6262a60
Merge branch 'kian/update_metadata_imports' into kian/setup_type_veri…
knassre-bodo Oct 28, 2024
01dad04
Adding module-level docstrings
knassre-bodo Oct 28, 2024
b4b13ff
Updating module level docstrings and inits [RUN CI]
knassre-bodo Oct 28, 2024
5a8a7a0
Resolving upstream conflicts
knassre-bodo Oct 28, 2024
275470d
Resolving conflicts
knassre-bodo Oct 28, 2024
3b8e8d9
Adding class docstring
knassre-bodo Oct 28, 2024
b3877b7
Adding class docstring
knassre-bodo Oct 28, 2024
aac48f3
Merge branch 'kian/setup_type_verifiers' into kian/setup_type_deducers
knassre-bodo Oct 28, 2024
0388683
Updating import paths [RUN CI]
knassre-bodo Oct 28, 2024
f9214cf
Resolivng conflicts and imports
knassre-bodo Oct 28, 2024
fedc9ea
Adding TODO strings
knassre-bodo Oct 28, 2024
7f7fae4
Squishing imports
knassre-bodo Oct 28, 2024
e605f7b
Removing duplicate
knassre-bodo Oct 28, 2024
39b6aae
Resolving conflicts
knassre-bodo Oct 28, 2024
92bbb48
Adjusting imports
knassre-bodo Oct 28, 2024
691e6ca
Adjusting testing setup with tpch_node_builder
knassre-bodo Oct 28, 2024
66dfb07
Added more info/builder-based tests [RUN CI]
knassre-bodo Oct 28, 2024
671a399
Adding/implementing collections semantics WIP
knassre-bodo Oct 29, 2024
c8438b9
Resolving conflicts [RUN CI]
knassre-bodo Oct 29, 2024
d1b788d
Resolving conflicts
knassre-bodo Oct 29, 2024
8095ee6
Removing collection logic
knassre-bodo Oct 29, 2024
2cc521b
Pushing subcollection components into next PR
knassre-bodo Oct 29, 2024
7a86de3
Set up collection info pipelining for tests
knassre-bodo Oct 29, 2024
c7b8733
Minor update
knassre-bodo Oct 29, 2024
f293f5f
Minor update
knassre-bodo Oct 29, 2024
d2022bf
Renaming files [RUN CI]
knassre-bodo Oct 29, 2024
d6dd999
Experimenting with tree string
knassre-bodo Oct 29, 2024
2d17449
Experimenting with tree string
knassre-bodo Oct 29, 2024
e2f013d
Cleaning up TestInfo classes
knassre-bodo Oct 30, 2024
65cb2b7
Fixing test info [RUN CI]
knassre-bodo Oct 30, 2024
2bcffe9
Pulling up downstream changes to expressions [RUN CI]
knassre-bodo Oct 31, 2024
52ede47
Pulling downstream changes [RUN CI]
knassre-bodo Oct 31, 2024
f3e4ce7
Pulling up minor binary operator change
knassre-bodo Nov 1, 2024
13aa372
Test additions WIP
knassre-bodo Nov 1, 2024
9cdd534
Adding more nested binop string tests [RUN CI]
knassre-bodo Nov 1, 2024
852813c
Merge branch 'kian/setup_expression_nodes' into kian/init_collections
knassre-bodo Nov 1, 2024
dddc592
Refactoring drastically to have global context that table collections…
knassre-bodo Nov 3, 2024
9f0fef6
Revisions [RUN CI]
knassre-bodo Nov 4, 2024
25adaae
Resolving conflicts
knassre-bodo Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions pydough/pydough_ast/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
"Literal",
"ExpressionFunctionCall",
"PyDoughASTException",
"PyDoughCollectionAST",
"TableCollection",
"SubCollection",
"Calc",
]

from .abstract_pydough_ast import PyDoughAST
Expand All @@ -20,4 +24,5 @@
Literal,
ExpressionFunctionCall,
)
from .collections import PyDoughCollectionAST, TableCollection, Calc
from .node_builder import AstNodeBuilder
1 change: 1 addition & 0 deletions pydough/pydough_ast/collections/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
TODO: COMPLETE THIS README FOR THE PYDOUGH AST COLLECTIONS MODULE
17 changes: 17 additions & 0 deletions pydough/pydough_ast/collections/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
"""
TODO: add module-level docstring
"""

__all__ = [
"PyDoughCollectionAST",
"TableCollection",
"Calc",
"GlobalContext",
"CollectionAccess",
]

from .collection_ast import PyDoughCollectionAST
from .table_collection import TableCollection
from .calc import Calc
from .global_context import GlobalContext
from .collection_access import CollectionAccess
126 changes: 126 additions & 0 deletions pydough/pydough_ast/collections/calc.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
"""
TODO: add file-level docstring
"""

__all__ = ["Calc"]


from typing import Dict, List, Tuple, Set

from pydough.pydough_ast.abstract_pydough_ast import PyDoughAST
from pydough.pydough_ast.errors import PyDoughASTException
from pydough.pydough_ast.expressions import PyDoughExpressionAST
from .collection_ast import PyDoughCollectionAST


class Calc(PyDoughCollectionAST):
"""
The AST node implementation class representing a CALC expression.
"""

def __init__(
self,
predecessor: PyDoughCollectionAST,
):
self._predecessor: PyDoughCollectionAST = predecessor
# Not initialized until with_terms is called
self._calc_term_indices: Dict[str, Tuple[int, PyDoughExpressionAST]] | None = (
None
)
self._all_terms: Dict[str, PyDoughExpressionAST] = {}

def with_terms(self, terms: List[Tuple[str, PyDoughExpressionAST]]) -> "Calc":
"""
Specifies the terms that are calculated inside of a CALC node,
returning the mutated CALC node afterwards. This is called after the
CALC node is created so that the terms can be expressions that
reference child nodes of the CALC. However, this must be called
on the CALC node before any properties are accessed by `calc_terms`,
`all_terms`, `to_string`, etc.

Args:
`terms`: the list of terms calculated in the CALC node as a list of
tuples in the form `(name, expression)`. Each `expression` can
contain `ChildReference` instances that refer to an property of one
of the children of the CALC node.

Returns:
The mutated CALC node (which has also been modified in-place).

Raises:
`PyDoughASTException` if the terms have already been added to the
CALC node.
"""
if self._calc_term_indices is not None:
raise PyDoughASTException(
"Cannot call `with_terms` on a CALC node more than once"
)
self._calc_term_indices = {name: idx for idx, (name, _) in enumerate(terms)}
# Include terms from the predecessor, with the terms from this CALC
# added in (overwriting any preceding properties with the same name)
self._all_terms: Dict[str, PyDoughExpressionAST] = {}
for name in self.preceding_context.all_terms:
self._all_terms[name] = self.preceding_context.get_term(name)
for name, property in terms:
self._all_terms[name] = property
return self

@property
def calc_term_indices(self) -> Dict[str, Tuple[int, PyDoughExpressionAST]]:
"""
Mapping of each named expression of the CALC to a tuple (idx, expr)
where idx is the ordinal position of the property when included
in a CALC and property is the AST node representing the property.
"""
if self._calc_term_indices is None:
raise PyDoughASTException(
"Cannot access `calc_term_indices` of a Calc node before adding calc terms with `with_terms`"
)
return self._calc_term_indices

@property
def ancestor_context(self) -> PyDoughCollectionAST | None:
return self._predecessor.ancestor_context

@property
def preceding_context(self) -> PyDoughCollectionAST | None:
return self._predecessor

@property
def calc_terms(self) -> Set[str]:
return set(self.calc_term_indices)

@property
def all_terms(self) -> Set[str]:
if self._calc_term_indices is None:
raise PyDoughASTException(
"Cannot access `all_terms` of a Calc node before adding calc terms with `with_terms`"
)
return set(self._all_terms)

def get_expression_position(self, expr_name: str) -> int:
if expr_name not in self.calc_term_indices:
raise PyDoughASTException(f"Unrecognized CALC term: {expr_name!r}")
return self.calc_term_indices[expr_name]

def get_term(self, term_name: str) -> PyDoughAST:
if term_name not in self.all_terms:
raise PyDoughASTException(f"Unrecognized term: {term_name!r}")
return self._all_terms[term_name]

def to_string(self) -> str:
kwarg_strings: List[str] = []
for name in self._calc_term_indices:
expr: PyDoughExpressionAST = self.get_term(name)
kwarg_strings.append(f"{name}={expr.to_string()}")
return f"{self.preceding_context.to_string()}({', '.join(kwarg_strings)})"

def to_tree_form(self) -> None:
raise NotImplementedError

def equals(self, other: "Calc") -> bool:
return (
super().equals(other)
and self.preceding_context == other.preceding_context
and self.terms == other.terms
)
119 changes: 119 additions & 0 deletions pydough/pydough_ast/collections/collection_access.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""
TODO: add file-level docstring
"""

__all__ = ["CollectionAccess"]


from typing import Dict, List, Tuple, Set

from pydough.metadata import (
CollectionMetadata,
PropertyMetadata,
)
from pydough.pydough_ast.abstract_pydough_ast import PyDoughAST
from pydough.pydough_ast.errors import PyDoughASTException
from pydough.pydough_ast.expressions import ColumnProperty
from .collection_ast import PyDoughCollectionAST


class CollectionAccess(PyDoughCollectionAST):
"""
The AST node implementation class representing a table collection accessed
either directly or as a subcollection of another collection.
"""

def __init__(
self,
collection: CollectionMetadata,
ancestor: PyDoughCollectionAST,
predecessor: PyDoughCollectionAST | None = None,
):
self._collection: CollectionMetadata = collection
self._ancestor: PyDoughCollectionAST = ancestor
self._predecessor: PyDoughCollectionAST | None = predecessor
self._properties: Dict[str, Tuple[int | None, PyDoughAST]] | None = None
self._calc_counter: int = 0

@property
def collection(self) -> CollectionMetadata:
"""
The metadata for the table that is being referenced by the collection
node.
"""
return self._collection

@property
def properties(self) -> Dict[str, Tuple[int | None, PyDoughAST]]:
"""
Mapping of each property of the table to a tuple (idx, property)
where idx is the ordinal position of the property when included
in a CALC (None for subcollections), and property is the AST node
representing the property.

The properties are evaluated lazily & cached to prevent ping-ponging
between two tables that consider each other subcollections.
"""

if self._properties is None:
self._properties = {}
# Ensure the properties are added in the same order they were
# defined in the metadata in to ensure dependencies are handled.
ordered_properties: List[str] = sorted(
self.collection.get_property_names(),
key=lambda p: self.collection.definition_order[p],
)
for property_name in ordered_properties:
property: PropertyMetadata = self.collection.get_property(property_name)
calc_idx: int | None
expression: PyDoughAST
if property.is_subcollection:
# TODO: implement subcollections properly
continue
else:
calc_idx = self._calc_counter
expression = ColumnProperty(property)
self._calc_counter += 1
self._properties[property_name] = (calc_idx, expression)
return self._properties

@property
def ancestor_context(self) -> PyDoughCollectionAST | None:
return self._ancestor

@property
def preceding_context(self) -> PyDoughCollectionAST | None:
return self._predecessor

@property
def calc_terms(self) -> Set[str]:
# The calc terms are just all of the column properties (the ones
# that have an index)
return {name for name, (idx, _) in self.properties.items() if idx is not None}

@property
def all_terms(self) -> Set[str]:
return set(self.properties)

def get_expression_position(self, expr_name: str) -> int:
if expr_name not in self.properties:
raise PyDoughASTException(
f"Unrecognized term of {self.collection.error_name}: {expr_name!r}"
)
idx, _ = self.properties[expr_name]
if idx is None:
raise PyDoughASTException(
f"Cannot call get_expression_position on non-CALC term: {expr_name!r}"
)
return idx

def get_term(self, term_name: str) -> PyDoughAST:
if term_name not in self.properties:
raise PyDoughASTException(
f"Unrecognized term of {self.collection.error_name}: {term_name!r}"
)
_, term = self.properties[term_name]
return term

def equals(self, other: "CollectionAccess") -> bool:
return super().equals(other) and self.collection == other.collection
Loading