Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BSE-4155] Add support for the Scan relational node #36

Merged
merged 22 commits into from
Nov 14, 2024

Conversation

njriasan
Copy link
Contributor

@njriasan njriasan commented Nov 8, 2024

Adds support for the Scan relational node definition.

@njriasan njriasan requested review from knassre-bodo and removed request for knassre-bodo November 8, 2024 21:05
@njriasan njriasan changed the base branch from main to nick/relational_abstract November 8, 2024 21:06
@abstractmethod
def equals(self, other: "Relational") -> bool:
"""
Determine if two relational nodes are exactly identical,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've determine that trying to do recursive merging could get really nasty really quickly, so I added a concept of equality and will currently only allow merging nodes with at least identical inputs.

Base automatically changed from nick/relational_abstract to main November 11, 2024 16:56
Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some big hesitations on how this is implemented.

Comment on lines 30 to 44
def make_column(name: str) -> Column:
"""
Make an Int64 column with the given name. This is used
for generating various relational nodes.

Note: This doesn't handle renaming a column.

Args:
name (str): The name of the column in both the input and the
current node.

Returns:
Column: The output column.
"""
return Column(name, make_simple_column_reference(name))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, we should handle this as one function in the following form:

def make_column(name: str, typ: PyDoughType | None) -> Column:
 pydough_type = typ if typ is not None else UnknownType()
 return Column(name, SimpleColumnReference(name, typ))

We also probably want to rename this to something that makes it clearer that this is for columns that are names, as opposed to general columns (I also vote for renaming Column to something else bc its very confusing; perhaps Expr?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to call this a RelationalColumn because this is the column for any "node"/step of a query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay actually we opted to replace this with a dictionary so I will remove this.

@njriasan
Copy link
Contributor Author

This implements the base changes, but I need to add testing directly for the columns themselves (and any future expressions). I'll update this in the morning and then start to propagate all of the changes to the followup PRs.

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lot of minor stuff I'd like to see tinkered with, but overall LGTM.

Comment on lines 60 to 61
@abstractmethod
def to_sqlglot(self) -> SQLGlotExpression:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this take in a dialect value also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting removed in a future PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ignore feedback on this API for right now.

Comment on lines 26 to 28
@property
def is_aggregation(self) -> bool:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this, since it only exists in the AST form since there is no other distinction besides this between scalar vs aggregate functions. I think what we just need, when we get to aggregates, is a special implementation class to denote agg function calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably simpler to have this be a functional call property in general and have one class (at least at this time). I agree it doesn't need to be in the base class though.

Convert the relational expression to a string.

Returns:
str: A string representation of the this expression including converting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: 80 line char

after any alterations, for example commuting the inputs.

Args:
other (RelationalExpression): The other relational expression to compare against.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: 80 line char

Comment on lines 44 to 45
def __eq__(self, other: Any) -> bool:
return isinstance(other, RelationalExpression) and self.equals(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also do __hash__, so that individual subclasses can have cached properties? If so, hash(repr(self)) would work in a pinch if we can ensure that both equals and to_string are coherent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this when add the caching so that its tested. I see no harm in delaying this. If the node is slightly more fleshed out we will need less code change.

@@ -1,17 +1,17 @@
"""
This module contains the abstract base classes for the relational
This file contains the abstract base classes for the relational
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: confusing to have 2 files both named abstract.py in the relational module, and also a bit clunky to have the expressions as a subdirectory of the same folder containing the relational nodes. Instead, can we have it structured like this?

  • relational
    • relational_expressions
      • abstract_expression.py
      • column_reference.py
    • relational_nodes
      • abstract_node.py
      • scan.py

(This is more-or-less how we structure similar things in the rest of PyDough, so it also adds structural consistency in the codebase).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do this rename for file searching purposes.

id="different_type",
),
],
# TODO: Add a test for different types when we add literals
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that already here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added literal expressions in the followup PR.

],
# TODO: Add a test for different types when we add literals
)
def test_column_reference_equals(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: add tiny docstrings to both of these test functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add it, but in general it feels like a bit of overkill since the function name gives a clear description.

),
],
)
def test_column_reference_to_string(column_ref: ColumnReference, output: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: 80 line hcar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is 79 characters :)

Comment on lines +10 to +11
from pydough.relational.relational_expressions import RelationalExpression
from pydough.relational.relational_expressions.column_reference import ColumnReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: all of this stuff makes sense to include in the __init__.py files in such a manner that you can import all of them directly from pydough.relational.

@njriasan njriasan merged commit 2e2150b into main Nov 14, 2024
4 checks passed
@njriasan njriasan deleted the nick/relational_scan branch November 14, 2024 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants