[BSE-4155] Add support for the Join relational node #42

njriasan · 2024-11-10T21:17:19Z

Adds support for the basic definition of the Join relational node.

njriasan · 2024-11-14T15:51:33Z

pydough/relational/relational_expressions/column_reference.py

        super().__init__(data_type)
        self._name: str = name
+        self._input_name: str | None = input_name


This should allow join to generate input specific columns in the conversion using "left" or "right".

Since we are not doing full qualification (just "does it come from the LHS or RHS of the join) does that mean we must insert naming/pruning projections after every join so that the left/right sides are unified afterwards? Otherwise, if there are subsequent joins, how will we know what "left" refers to, if "left" also has its own "left" and "right"

No these are purely ephemeral for translating your input and wouldn't be represented in the output. My thought was that we just need to be able to resolve relative to the node and then lowering could optionally translate this into the SQL.

So for example, imagine we opted to have join always use left and right. Then we could execute the following steps:

Assign the inputs the join the aliases left and right.

Generate expressions using these aliases.

Generate the output which won't depend on any table alias.

Alternatively the lowering could only generate aliases when its necessary or rather than doing left/right every time could generate unique aliases to map left/right to, possibly via a counter. The exact procedure isn't important so long as we can tell the origin for every column where it may be ambiguous.

knassre-bodo

Few comments, but overall LGTM.

pydough/relational/join.py

knassre-bodo · 2024-11-14T18:11:20Z

pydough/relational/join.py

+class JoinType(Enum):
+    INNER = "inner"
+    LEFT = "left"
+    RIGHT = "right"
+    FULL_OUTER = "full outer"


We should include semi and anti, bc PyDough will have convenience syntax for HAS and HASNOT, and we can just fast-track them to that path + deal with the specifics of the translation in the SqlGlot step.

SQLGlot does have join kinds for semi/anti, and does the EXIST translation step for us if the dialect doesn't allow them (https://github.com/tobymao/sqlglot/blob/79f67830d7d3ba92bff91eeb95b4dc8bdfa6c44e/sqlglot/dialects/sqlite.py#L199).

Example of what I see this looking like:

Suppose we have S.WHERE(HAS(T) & p1 & p2). That becomes JOIN(semi, S, T).WHERE(p1 & p2)

Suppose we have S.WHERE(HASNOT(T) & p1 & p2). That becomes JOIN(anti, S, T).WHERE(p1 & p2)

If HAS is used in any other patterns besides in the conjunction of a WHERE, it is expanded to COUNT(T) > 0, and HASNOT into COUNT(T) == 0, then translated accordingly.

We could even make the expansion happen in the AST phase, so all HAS/HASNOT present during AST->Relational translation must be semi/anti joins.

Let's add this when we have the syntax.

knassre-bodo · 2024-11-14T18:22:29Z

pydough/relational/relational_expressions/column_reference.py

        super().__init__(data_type)
        self._name: str = name
+        self._input_name: str | None = input_name


Since we are not doing full qualification (just "does it come from the LHS or RHS of the join) does that mean we must insert naming/pruning projections after every join so that the left/right sides are unified afterwards? Otherwise, if there are subsequent joins, how will we know what "left" refers to, if "left" also has its own "left" and "right"

njriasan added 30 commits November 6, 2024 13:52

Added the new base files

cce5e4c

Started added abstract classes

19460cc

Added the file definitions

89f8e42

Added the sqlglot import

2786774

Added a basic test [run CI]

4ee70be

Defined tests

66107ec

Defined the tests cases we want to write

256ecef

Added the project file docstring

d267275

Added the class defintion

cb32b89

Started adding implementation parts

b11a50f

Added the projection defintion

856e20e

Added the limit file

513abe3

Added a single relational base class

5ed09e0

Merge branch 'nick/projection' into nick/limit

24dceb2

Added limit support

4cfd06e

Added the aggregate node

7081a2c

Cleaned up the class structure

d318e64

Merge branch 'nick/limit' into nick/aggregate

0764621

Added the filter definition

8eee9a4

Added the root definition

b51f85c

Added the join implementation

de2cce8

Updated scan

178ffd1

wrote tests for the basic scan ops

2e7d1a5

Fixed the first basic unit test:

9b88810

Added equality tests

b844130

Added error test

24b620e

Added remaining tests [run CI]

63aa271

Merged with prior PR

d32f73e

Updated orderings check

c2aef51

added the projection changes

c832181

njriasan added 4 commits November 14, 2024 10:22

Fixed root node [run CI]

f033835

Fixed the string outputs [run CI]

92643c9

Fixed the join node

0369d0b

Added tests for expresion [run CI]

7c7a67a

njriasan commented Nov 14, 2024

View reviewed changes

njriasan requested a review from knassre-bodo November 14, 2024 15:51

njriasan added 6 commits November 14, 2024 12:50

Renamed files

90b4e7a

Applied actual refactoring, need to update test info

cc1a2d1

Added test changes [run CI]

4a01824

merged with prior PR

d05aa76

Added test docstrings

11b5d27

applied remaining feedback [run CI]

0f9329d

knassre-bodo approved these changes Nov 14, 2024

View reviewed changes

njriasan added 12 commits November 14, 2024 14:21

Merged with prior PR, need to clean up

07f4e32

Merged with prior PR

d66dcee

merged updates

393057b

Updated testing [run CI]

8a78231

Added aggregate

e00ee40

Propagated changes

196a812

Merged with parent PR [run CI]

16954e2

merged with prior PR

f6da86b

Merged with prior PR [run CI]

876eb12

Merged with prior PR

e6049ef

Updated tests [run CI]

304054c

Merged with prior PR

ccfcffd

Base automatically changed from nick/root to main November 14, 2024 20:16

njriasan added 2 commits November 14, 2024 15:17

Updated the tests

9d094cb

Updated final comments [run CI]

945c902

njriasan merged commit 18617d9 into main Nov 14, 2024
4 checks passed

njriasan deleted the nick/join branch November 14, 2024 20:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BSE-4155] Add support for the Join relational node #42

[BSE-4155] Add support for the Join relational node #42

njriasan commented Nov 10, 2024

njriasan Nov 14, 2024

knassre-bodo Nov 14, 2024

njriasan Nov 14, 2024

knassre-bodo left a comment

knassre-bodo Nov 14, 2024 •

edited

Loading

knassre-bodo Nov 14, 2024

njriasan Nov 14, 2024

knassre-bodo Nov 14, 2024

[BSE-4155] Add support for the Join relational node #42

[BSE-4155] Add support for the Join relational node #42

Conversation

njriasan commented Nov 10, 2024

njriasan Nov 14, 2024

Choose a reason for hiding this comment

knassre-bodo Nov 14, 2024

Choose a reason for hiding this comment

njriasan Nov 14, 2024

Choose a reason for hiding this comment

knassre-bodo left a comment

Choose a reason for hiding this comment

knassre-bodo Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

knassre-bodo Nov 14, 2024

Choose a reason for hiding this comment

njriasan Nov 14, 2024

Choose a reason for hiding this comment

knassre-bodo Nov 14, 2024

Choose a reason for hiding this comment

knassre-bodo Nov 14, 2024 •

edited

Loading