Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BSE-4155] Add support for the Project relational node #37

Merged
merged 45 commits into from
Nov 14, 2024
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
cce5e4c
Added the new base files
njriasan Nov 6, 2024
19460cc
Started added abstract classes
njriasan Nov 6, 2024
89f8e42
Added the file definitions
njriasan Nov 7, 2024
2786774
Added the sqlglot import
njriasan Nov 7, 2024
4ee70be
Added a basic test [run CI]
njriasan Nov 7, 2024
66107ec
Defined tests
njriasan Nov 8, 2024
256ecef
Defined the tests cases we want to write
njriasan Nov 8, 2024
d267275
Added the project file docstring
njriasan Nov 8, 2024
cb32b89
Added the class defintion
njriasan Nov 8, 2024
b11a50f
Started adding implementation parts
njriasan Nov 8, 2024
856e20e
Added the projection defintion
njriasan Nov 8, 2024
5ed09e0
Added a single relational base class
njriasan Nov 8, 2024
178ffd1
Updated scan
njriasan Nov 8, 2024
2e7d1a5
wrote tests for the basic scan ops
njriasan Nov 8, 2024
9b88810
Fixed the first basic unit test:
njriasan Nov 8, 2024
b844130
Added equality tests
njriasan Nov 8, 2024
24b620e
Added error test
njriasan Nov 8, 2024
63aa271
Added remaining tests [run CI]
njriasan Nov 8, 2024
d32f73e
Merged with prior PR
njriasan Nov 8, 2024
c2aef51
Updated orderings check
njriasan Nov 8, 2024
c832181
added the projection changes
njriasan Nov 8, 2024
135ab5f
added a test for project equality
njriasan Nov 9, 2024
ff3b25f
Added function definitions for the remaining unit tests
njriasan Nov 9, 2024
90cd1e5
Fixed the to_string() test
njriasan Nov 9, 2024
8bd7cfc
added a can merge test
njriasan Nov 9, 2024
0737744
added a merge test
njriasan Nov 9, 2024
95c281e
Finished adding project tests [run CI]
njriasan Nov 9, 2024
ac2a6e4
Merged with prior [run CI]
njriasan Nov 11, 2024
44f709f
Merge branch 'nick/relational_scan' into nick/projection
njriasan Nov 11, 2024
ecc1600
applied most of Kian's changes, need to test column expressions
njriasan Nov 13, 2024
25a7791
Removed unnecessary column class
njriasan Nov 13, 2024
ea9255c
Added remaining tests [run CI]
njriasan Nov 13, 2024
bf20591
Added remaining tests [run CI]
njriasan Nov 13, 2024
ab2f5db
added the literal
njriasan Nov 13, 2024
32cc1ea
added the remaining tests [run CI]
njriasan Nov 13, 2024
591b67d
Back-ported changes [run CI]
njriasan Nov 13, 2024
d96e9f6
Merged with prior PR
njriasan Nov 13, 2024
ce7b9fa
Fix a typo [run CI]
njriasan Nov 13, 2024
5376da1
Merge branch 'nick/relational_scan' into nick/projection [run CI]
njriasan Nov 13, 2024
90b4e7a
Renamed files
njriasan Nov 14, 2024
cc1a2d1
Applied actual refactoring, need to update test info
njriasan Nov 14, 2024
4a01824
Added test changes [run CI]
njriasan Nov 14, 2024
d05aa76
merged with prior PR
njriasan Nov 14, 2024
11b5d27
Added test docstrings
njriasan Nov 14, 2024
0f9329d
applied remaining feedback [run CI]
njriasan Nov 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions pydough/relational/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
"""

__all__ = [
"Column",
"Relational",
]
from .abstract import (
Column,
Relational,
)
92 changes: 26 additions & 66 deletions pydough/relational/abstract.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
"""
This module contains the abstract base classes for the relational
This file contains the abstract base classes for the relational
representation. This roughly maps to a Relational Algebra representation
but is not exact because it needs to maintain PyDough traits that define
ordering and other properties of the relational expression.
"""

from abc import ABC, abstractmethod
from collections.abc import MutableMapping, MutableSequence
from typing import Any, NamedTuple
from typing import Any

from sqlglot.expressions import Expression
from sqlglot.expressions import Expression as SQLGlotExpression

from pydough.pydough_ast.expressions import PyDoughExpressionAST
from .relational_expressions.abstract import RelationalExpression


class Relational(ABC):
Expand All @@ -32,55 +32,48 @@ def inputs(self) -> MutableSequence["Relational"]:
"""

@property
def traits(self) -> MutableMapping[str, Any]:
@abstractmethod
def columns(self) -> MutableMapping[str, RelationalExpression]:
"""
Return the traits of the relational expression.
The traits in general may have a variable schema,
but each entry should be strongly defined. Here are
traits that should always be available:
Returns the columns of the relational expression.

- orderings: MutableSequence[PyDoughExpressionAST]
TODO: Associate an ordering in the future to avoid unnecessary SQL with the
final ordering of the root nodes.

Returns:
MutableMapping[str, Any]: The traits of the relational expression.
MutableMapping[str, RelationalExpression]: The columns of the relational expression.
This does not have a defined ordering.
"""
return {"orderings": self.orderings}

@property
@abstractmethod
def orderings(self) -> MutableSequence["PyDoughExpressionAST"]:
"""
Returns the PyDoughExpressionAST that the relational expression is ordered by.
Each PyDoughExpressionAST is a result computed relative to the given set of columns.

Returns:
MutableSequence[PyDoughExpressionAST]: The PyDoughExpressionAST that the relational expression is ordered by,
possibly empty.
def equals(self, other: "Relational") -> bool:
"""
Determine if two relational nodes are exactly identical,
including column ordering.

@property
@abstractmethod
def columns(self) -> MutableSequence["Column"]:
"""
Returns the columns of the relational expression.
Args:
other (Relational): The other relational node to compare against.

Returns:
MutableSequence[Column]: The columns of the relational expression.
bool: Are the two relational nodes equal.
"""

def __eq__(self, other: Any) -> bool:
return isinstance(other, Relational) and self.equals(other)

@abstractmethod
def to_sqlglot(self) -> "Expression":
"""Translate the given relational expression
def to_sqlglot(self) -> SQLGlotExpression:
"""Translate the given relational node
and its children to a SQLGlot expression.

Returns:
Expression: A SqlGlot expression representing the relational expression.
Expression: A SqlGlot expression representing the relational node.
"""

@abstractmethod
def to_string(self) -> str:
"""
Convert the relational expression to a string.
Convert the relational node to a string.

TODO: Refactor this API to include some form of string
builder so we can draw lines between children properly.
Expand All @@ -90,38 +83,5 @@ def to_string(self) -> str:
with this node at the root.
"""

@abstractmethod
def can_merge(self, other: "Relational") -> bool:
"""
Determine if two relational nodes can be merged together.

Args:
other (Relational): The other relational node to merge with.

Returns:
bool: Can the two relational nodes be merged together.
"""

@abstractmethod
def merge(self, other: "Relational") -> "Relational":
"""
Merge two relational nodes together to produce one output
relational node. This requires can_merge to return True.

Args:
other (Relational): The other relational node to merge with.

Returns:
Relational: A new relational node that is the result of merging
the two input relational nodes together and removing any redundant
components.
"""


class Column(NamedTuple):
"""
An column expression consisting of a name and an expression.
"""

name: str
expr: "PyDoughExpressionAST"
def __repr__(self) -> str:
return self.to_string()
52 changes: 52 additions & 0 deletions pydough/relational/project.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
"""
This file contains the relational implementation for a "project". This is our
relational representation for a "calc" that involves any compute steps and can include
adding or removing columns (as well as technically reordering). In general, we seek to
avoid introducing extra nodes just to reorder or prune columns, so ideally their use
should be sparse.
"""

from collections.abc import MutableMapping

from sqlglot.expressions import Expression

from .abstract import Relational
from .relational_expressions import RelationalExpression
from .single_relational import SingleRelational


class Project(SingleRelational):
"""
The Project node in the relational tree. This node represents a "calc" in
relational algebra, which should involve some "compute" functions and may
involve adding, removing, or reordering columns.
"""

def __init__(
self,
input: Relational,
columns: MutableMapping[str, RelationalExpression],
) -> None:
super().__init__(input)
self._columns: MutableMapping[str, RelationalExpression] = columns

@property
def columns(self) -> MutableMapping[str, RelationalExpression]:
return self._columns

def to_sqlglot(self) -> Expression:
raise NotImplementedError(
"Conversion to SQLGlot Expressions is not yet implemented."
)

def equals(self, other: Relational) -> bool:
return (
isinstance(other, Project)
# TODO: Do we need a fast path for caching the inputs?
and self.input == other.input
and self.columns == other.columns
)

def to_string(self) -> str:
# TODO: Should we visit the input?
return f"PROJECT(columns={self.columns})"
10 changes: 10 additions & 0 deletions pydough/relational/relational_expressions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""
TODO: add module-level docstring
"""

__all__ = [
"RelationalExpression",
]
from .abstract import (
RelationalExpression,
)
66 changes: 66 additions & 0 deletions pydough/relational/relational_expressions/abstract.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
"""
This file contains the abstract base classes for the relational
expression representation. Relational expressions are representations
of literals, column accesses, or other functions that are used in the
relational tree to build the final SQL query.
"""

from abc import ABC, abstractmethod
from typing import Any

from sqlglot.expressions import Expression as SQLGlotExpression

__all__ = ["RelationalExpression"]

from pydough.types import PyDoughType


class RelationalExpression(ABC):
def __init__(self, data_type: PyDoughType) -> None:
self._data_type: PyDoughType = data_type

@property
def data_type(self) -> PyDoughType:
return self._data_type

@property
def is_aggregation(self) -> bool:
return False

@abstractmethod
def equals(self, other: "RelationalExpression") -> bool:
"""
Determine if two RelationalExpression nodes are exactly identical,
including ordering. This does not check if two expression are equal
after any alterations, for example commuting the inputs.

Args:
other (RelationalExpression): The other relational expression to compare against.

Returns:
bool: Are the two relational expressions equal.
"""

def __eq__(self, other: Any) -> bool:
return isinstance(other, RelationalExpression) and self.equals(other)

@abstractmethod
def to_string(self) -> str:
"""
Convert the relational expression to a string.

Returns:
str: A string representation of the this expression including converting
any of its inputs to strings.
"""

def __repr__(self) -> str:
return self.to_string()

@abstractmethod
def to_sqlglot(self) -> SQLGlotExpression:
"""Translate the given relational expression

Returns:
Expression: A SqlGlot expression representing the relational expression.
"""
49 changes: 49 additions & 0 deletions pydough/relational/relational_expressions/column_reference.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
"""
The representation of a column access for use in a relational tree.
The provided name of the column should match the name that can be used
for the column in the input node.
"""

__all__ = ["ColumnReference"]

from sqlglot.expressions import Expression as SQLGlotExpression

from pydough.types import PyDoughType

from .abstract import RelationalExpression


class ColumnReference(RelationalExpression):
"""
The Expression implementation for accessing a column
in a relational node.
"""

def __init__(self, name: str, data_type: PyDoughType):
super().__init__(data_type)
self._name: str = name

def __hash__(self) -> int:
return hash((self.name, self.data_type))

@property
def name(self) -> object:
"""
The name of the column.
"""
return self._name

def to_sqlglot(self) -> SQLGlotExpression:
raise NotImplementedError(
"Conversion to SQLGlot Expressions is not yet implemented."
)

def to_string(self) -> str:
return f"Column(name={self.name}, type={self.data_type})"

def equals(self, other: object) -> bool:
return (
isinstance(other, ColumnReference)
and (self.data_type == other.data_type)
and (self.name == other.name)
)
56 changes: 56 additions & 0 deletions pydough/relational/relational_expressions/literal_expression.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""
The representation of a literal value for using in a relational
expression.
"""

__all__ = ["LiteralExpression"]

from typing import Any

from sqlglot.expressions import Expression as SQLGlotExpression

from pydough.types import PyDoughType

from .abstract import RelationalExpression


class LiteralExpression(RelationalExpression):
"""
The Expression implementation for an Literal value
in a relational node. There are no restrictions on the
relationship between the value and the type so we can
represent arbitrary Python classes as any type and lowering
to SQL is responsible for determining how this can be
achieved (e.g. casting) or translation must prevent this
from being generated.
"""

def __init__(self, value: Any, data_type: PyDoughType):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Any or object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are equivalent. Everything inherits from object in Python. Personally I prefer Any.

super().__init__(data_type)
self._value: Any = value

def __hash__(self) -> int:
# Note: This will break if the value isn't hashable.
return hash((self.value, self.data_type))
Comment on lines +30 to +32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my point in the Scan PR about hashing; would the string representation be sufficient?


@property
def value(self) -> object:
"""
The literal's Python value.
"""
return self._value

def to_sqlglot(self) -> SQLGlotExpression:
raise NotImplementedError(
"Conversion to SQLGlot Expressions is not yet implemented."
)

def to_string(self) -> str:
return f"Literal(value={self.value}, type={self.data_type})"

def equals(self, other: object) -> bool:
return (
isinstance(other, LiteralExpression)
and (self.data_type == other.data_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we havesuper().equalsthen we can skip line 54

and (self.value == other.value)
)
Loading