New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[BSE-4155] Add support for the Scan relational node #36

Merged

njriasan merged 22 commits into main from nick/relational_scan

Nov 14, 2024

Contributor

njriasan commented Nov 8, 2024

Adds support for the Scan relational node definition.

njriasan added 13 commits

November 6, 2024 13:52


          Added the new base files

cce5e4c


          Started added abstract classes

19460cc


          Added the file definitions

89f8e42


          Added the sqlglot import


          Added a basic test [run CI]

4ee70be


          Defined tests

66107ec


          Defined the tests cases we want to write

256ecef


          Updated scan

178ffd1


          wrote tests for the basic scan ops

2e7d1a5


          Fixed the first basic unit test:

9b88810


          Added equality tests

b844130


          Added error test

24b620e


          Added remaining tests [run CI]

63aa271

njriasan requested review from knassre-bodo and removed request for knassre-bodo

November 8, 2024 21:05

njriasan changed the base branch from main to nick/relational_abstract

November 8, 2024 21:06

njriasan commented

View reviewed changes

pydough/pydough_ast/expressions/simple_column_reference.py Outdated Show resolved Hide resolved

pydough/relational/abstract.py Outdated

+                  @abstractmethod
+                  def equals(self, other: "Relational") -> bool:
+                      """
+                      Determine if two relational nodes are exactly identical,

Contributor Author

njriasan Nov 11, 2024

I've determine that trying to do recursive merging could get really nasty really quickly, so I added a concept of equality and will currently only allow merging nodes with at least identical inputs.

njriasan requested a review from knassre-bodo

November 11, 2024 16:26

Base automatically changed from nick/relational_abstract to main

November 11, 2024 16:56


          Merged with prior [run CI]

ac2a6e4

knassre-bodo reviewed

View reviewed changes

Contributor

knassre-bodo left a comment

I have some big hesitations on how this is implemented.

pydough/pydough_ast/expressions/simple_column_reference.py Outdated Show resolved Hide resolved

pydough/pydough_ast/expressions/simple_column_reference.py Outdated Show resolved Hide resolved

tests/test_relational.py Outdated

Comment on lines 30 to 44

+              def make_column(name: str) -> Column:
+                  """
+                  Make an Int64 column with the given name. This is used
+                  for generating various relational nodes.
+                  Note: This doesn't handle renaming a column.
+                  Args:
+                      name (str): The name of the column in both the input and the
+                      current node.
+                  Returns:
+                      Column: The output column.
+                  """
+                  return Column(name, make_simple_column_reference(name))

Contributor

knassre-bodo Nov 12, 2024

In my opinion, we should handle this as one function in the following form:

def make_column(name: str, typ: PyDoughType | None) -> Column:
 pydough_type = typ if typ is not None else UnknownType()
 return Column(name, SimpleColumnReference(name, typ))

We also probably want to rename this to something that makes it clearer that this is for columns that are names, as opposed to general columns (I also vote for renaming Column to something else bc its very confusing; perhaps Expr?).

Contributor Author

njriasan Nov 13, 2024

I'm going to call this a RelationalColumn because this is the column for any "node"/step of a query.

Contributor Author

njriasan Nov 13, 2024

Okay actually we opted to replace this with a dictionary so I will remove this.


          applied most of Kian's changes, need to test column expressions

ecc1600

Contributor Author

njriasan commented Nov 13, 2024

This implements the base changes, but I need to add testing directly for the columns themselves (and any future expressions). I'll update this in the morning and then start to propagate all of the changes to the followup PRs.

njriasan added 2 commits

November 13, 2024 10:44


          Removed unnecessary column class

25a7791


          Added remaining tests [run CI]

ea9255c

njriasan requested a review from knassre-bodo

November 13, 2024 15:52

njriasan added 2 commits

November 13, 2024 11:59


          Back-ported changes [run CI]

591b67d


          Fix a typo [run CI]

ce7b9fa

knassre-bodo approved these changes

View reviewed changes

Contributor

knassre-bodo left a comment

Lot of minor stuff I'd like to see tinkered with, but overall LGTM.

pydough/relational/relational_expressions/abstract.py Outdated

Comment on lines 60 to 61

		@abstractmethod
		def to_sqlglot(self) -> SQLGlotExpression:

Contributor

knassre-bodo Nov 14, 2024

Should this take in a dialect value also?

Contributor Author

njriasan Nov 14, 2024

This is getting removed in a future PR.

Contributor Author

njriasan Nov 14, 2024

I'm going to ignore feedback on this API for right now.

pydough/relational/relational_expressions/abstract.py Outdated

Comment on lines 26 to 28

+                  @property
+                  def is_aggregation(self) -> bool:
+                      return False

Contributor

knassre-bodo Nov 14, 2024

We shouldn't need this, since it only exists in the AST form since there is no other distinction besides this between scalar vs aggregate functions. I think what we just need, when we get to aggregates, is a special implementation class to denote agg function calls.

Contributor Author

njriasan Nov 14, 2024

I think it's probably simpler to have this be a functional call property in general and have one class (at least at this time). I agree it doesn't need to be in the base class though.

pydough/relational/relational_expressions/abstract.py Outdated

+                      Convert the relational expression to a string.
+                      Returns:
+                          str: A string representation of the this expression including converting

Contributor

knassre-bodo Nov 14, 2024

NIT: 80 line char

pydough/relational/relational_expressions/abstract.py Outdated

+                      after any alterations, for example commuting the inputs.
+                      Args:
+                          other (RelationalExpression): The other relational expression to compare against.

Contributor

knassre-bodo Nov 14, 2024

NIT: 80 line char

pydough/relational/relational_expressions/abstract.py Outdated

Comment on lines 44 to 45

		def __eq__(self, other: Any) -> bool:
		return isinstance(other, RelationalExpression) and self.equals(other)

Contributor

knassre-bodo Nov 14, 2024

Should we also do __hash__, so that individual subclasses can have cached properties? If so, hash(repr(self)) would work in a pinch if we can ensure that both equals and to_string are coherent.

Contributor Author

njriasan Nov 14, 2024

I think we should do this when add the caching so that its tested. I see no harm in delaying this. If the node is slightly more fleshed out we will need less code change.

pydough/relational/abstract.py Outdated

@@ @@ -1,17 +1,17 @@ @@
               """
-              This module contains the abstract base classes for the relational
+              This file contains the abstract base classes for the relational

Contributor

knassre-bodo Nov 14, 2024

NIT: confusing to have 2 files both named abstract.py in the relational module, and also a bit clunky to have the expressions as a subdirectory of the same folder containing the relational nodes. Instead, can we have it structured like this?

relational
- relational_expressions
  - abstract_expression.py
  - column_reference.py
- relational_nodes
  - abstract_node.py
  - scan.py

(This is more-or-less how we structure similar things in the rest of PyDough, so it also adds structural consistency in the codebase).

Contributor Author

njriasan Nov 14, 2024

I can do this rename for file searching purposes.

tests/test_relational_expressions.py Outdated

+                          id="different_type",
+                      ),
+                  ],
+                  # TODO: Add a test for different types when we add literals

Contributor

knassre-bodo Nov 14, 2024

Isn't that already here?

Contributor Author

njriasan Nov 14, 2024

I added literal expressions in the followup PR.

tests/test_relational_expressions.py

+                  ],
+                  # TODO: Add a test for different types when we add literals
+              )
+              def test_column_reference_equals(

Contributor

knassre-bodo Nov 14, 2024

NIT: add tiny docstrings to both of these test functions

Contributor Author

njriasan Nov 14, 2024

I will add it, but in general it feels like a bit of overkill since the function name gives a clear description.

tests/test_relational_expressions.py

+                      ),
+                  ],
+              )
+              def test_column_reference_to_string(column_ref: ColumnReference, output: str):

Contributor

knassre-bodo Nov 14, 2024

NIT: 80 line hcar

Contributor Author

njriasan Nov 14, 2024

This is 79 characters :)

tests/test_relational_expressions.py

Comment on lines +10 to +11

		from pydough.relational.relational_expressions import RelationalExpression
		from pydough.relational.relational_expressions.column_reference import ColumnReference

Contributor

knassre-bodo Nov 14, 2024

NIT: all of this stuff makes sense to include in the __init__.py files in such a manner that you can import all of them directly from pydough.relational.


          Renamed files

90b4e7a

njriasan added 2 commits

November 14, 2024 12:57


          Applied actual refactoring, need to update test info

cc1a2d1


          Added test changes [run CI]

4a01824

njriasan merged commit 2e2150b into main

4 checks passed

njriasan deleted the nick/relational_scan branch

November 14, 2024 18:04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet