-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BSE-4155] Add support for the Scan relational node #36
Conversation
pydough/relational/abstract.py
Outdated
@abstractmethod | ||
def equals(self, other: "Relational") -> bool: | ||
""" | ||
Determine if two relational nodes are exactly identical, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've determine that trying to do recursive merging could get really nasty really quickly, so I added a concept of equality and will currently only allow merging nodes with at least identical inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some big hesitations on how this is implemented.
tests/test_relational.py
Outdated
def make_column(name: str) -> Column: | ||
""" | ||
Make an Int64 column with the given name. This is used | ||
for generating various relational nodes. | ||
|
||
Note: This doesn't handle renaming a column. | ||
|
||
Args: | ||
name (str): The name of the column in both the input and the | ||
current node. | ||
|
||
Returns: | ||
Column: The output column. | ||
""" | ||
return Column(name, make_simple_column_reference(name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, we should handle this as one function in the following form:
def make_column(name: str, typ: PyDoughType | None) -> Column:
pydough_type = typ if typ is not None else UnknownType()
return Column(name, SimpleColumnReference(name, typ))
We also probably want to rename this to something that makes it clearer that this is for columns that are names, as opposed to general columns (I also vote for renaming Column
to something else bc its very confusing; perhaps Expr
?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to call this a RelationalColumn because this is the column for any "node"/step of a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay actually we opted to replace this with a dictionary so I will remove this.
This implements the base changes, but I need to add testing directly for the columns themselves (and any future expressions). I'll update this in the morning and then start to propagate all of the changes to the followup PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lot of minor stuff I'd like to see tinkered with, but overall LGTM.
@abstractmethod | ||
def to_sqlglot(self) -> SQLGlotExpression: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this take in a dialect value also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting removed in a future PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to ignore feedback on this API for right now.
@property | ||
def is_aggregation(self) -> bool: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need this, since it only exists in the AST form since there is no other distinction besides this between scalar vs aggregate functions. I think what we just need, when we get to aggregates, is a special implementation class to denote agg function calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably simpler to have this be a functional call property in general and have one class (at least at this time). I agree it doesn't need to be in the base class though.
Convert the relational expression to a string. | ||
|
||
Returns: | ||
str: A string representation of the this expression including converting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: 80 line char
after any alterations, for example commuting the inputs. | ||
|
||
Args: | ||
other (RelationalExpression): The other relational expression to compare against. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: 80 line char
def __eq__(self, other: Any) -> bool: | ||
return isinstance(other, RelationalExpression) and self.equals(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also do __hash__
, so that individual subclasses can have cached properties? If so, hash(repr(self))
would work in a pinch if we can ensure that both equals
and to_string
are coherent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this when add the caching so that its tested. I see no harm in delaying this. If the node is slightly more fleshed out we will need less code change.
pydough/relational/abstract.py
Outdated
@@ -1,17 +1,17 @@ | |||
""" | |||
This module contains the abstract base classes for the relational | |||
This file contains the abstract base classes for the relational |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: confusing to have 2 files both named abstract.py
in the relational
module, and also a bit clunky to have the expressions as a subdirectory of the same folder containing the relational nodes. Instead, can we have it structured like this?
relational
relational_expressions
abstract_expression.py
column_reference.py
relational_nodes
abstract_node.py
scan.py
(This is more-or-less how we structure similar things in the rest of PyDough, so it also adds structural consistency in the codebase).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do this rename for file searching purposes.
tests/test_relational_expressions.py
Outdated
id="different_type", | ||
), | ||
], | ||
# TODO: Add a test for different types when we add literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that already here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added literal expressions in the followup PR.
], | ||
# TODO: Add a test for different types when we add literals | ||
) | ||
def test_column_reference_equals( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: add tiny docstrings to both of these test functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add it, but in general it feels like a bit of overkill since the function name gives a clear description.
), | ||
], | ||
) | ||
def test_column_reference_to_string(column_ref: ColumnReference, output: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: 80 line hcar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 79 characters :)
from pydough.relational.relational_expressions import RelationalExpression | ||
from pydough.relational.relational_expressions.column_reference import ColumnReference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: all of this stuff makes sense to include in the __init__.py
files in such a manner that you can import all of them directly from pydough.relational
.
Adds support for the Scan relational node definition.