Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support DATETIME function in PyDough #267

Merged
merged 15 commits into from
Feb 19, 2025
Merged

Support DATETIME function in PyDough #267

merged 15 commits into from
Feb 19, 2025

Conversation

knassre-bodo
Copy link
Contributor

@knassre-bodo knassre-bodo commented Feb 14, 2025

Resolves #208. See issue for more details. Full list of additional changes:

  • Ensured the DATETIME operator works for both sqlite & ANSI with very permissive spacing/capitalization/alias rules for the various string literals.
    • now (& similar aliases): for ANSI generates CURRENT_TIMESTAMP(), for sqlite generates DATETIME('now')
    • start of <unit> (year/month/day): for ANSI generates DATE_TRUNC(<unit>, dt), for sqlite generates DATETIME(dt, 'start of <unit>')
    • start of <unit> (hour/minute/second): for ANSI generates DATE_TRUNC(<unit>, dt), for sqlite generates STRFTIME(fmt_str, dt) where the fmt_str truncates to that unit.
    • ±<amt> <unit>: for ANSI generates DATEADD(<unit>, ±<amt>dt), for sqlite generates DATETIME(dt, '±<amt> <unit>)
  • Moves a lot of the datetime unit logic in the transform bindings into a DateTimeUnit enum to handle common logic, such as converting a unit alias into a cannonical form or identifying the format string used to truncate to that unit.
  • Added new fixture to create a DatabaseContext without a connection for each supported dialect
  • Changed the tests in test_pydough_to_sql.py to be parameterized with the afformentioned new fixture so sqlite & ANSI are both tested
  • Changed the refsol .sql files to contain the dialect (e.g. xxx_ansi.sql vs xxx_sqlite.sql)
  • Fixed miscellaneous typos found in the repo
  • Adjusted the SQLGlot generation of the empty singleton to avoid a malformed VALUES clause

Comment on lines 4 to 5
FROM (VALUES
())
(NULL))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just having VALUES () is invalid, so using VALUES (NULL) helps

@@ -0,0 +1,47 @@
SELECT
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now automatically generated along with the XXX_ansi.sql file for all the tests in test_pydough_to_sql.py, so we can verify that each supported dialect works on the expected functions.

@@ -153,14 +154,14 @@ def impl(file_name: str) -> str:


@pytest.fixture(scope="session")
def get_sql_test_filename() -> Callable[[str], str]:
def get_sql_test_filename() -> Callable[[str, DatabaseDialect], str]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change allows the SQL files to be dialect-specific.

pytest.param(DatabaseDialect.SQLITE, id="sqlite"),
]
)
def empty_context_database(request) -> DatabaseContext:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new fixture with a DatabaseContext w/o an actual connection for every dialect, so we can test SQL generation for all dialects.

)


def datetime_sampler():
Copy link
Contributor Author

@knassre-bodo knassre-bodo Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't cover every edge case for the string parsing of DATETIME, but covers a large number of them that were generated to account for various units, aliases, signs, spacings, capitalizations, etc.

"""


class DateTimeUnit(Enum):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved logic into this enum so that DATETIME wouldn't duplicate its unit-handling logic with DATEDIFF.

# Set of names of of fields that can be included in the JSON
# Set of names of fields that can be included in the JSON
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a typo I mass-fixed

Comment on lines +200 to +202
if isinstance(base, sqlglot_expressions.Datetime):
base.this.append(trunc_expr)
return base
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This edge case allows us to flatten DATETIME(DATETIME('now', '-1 day'), 'start of year') into DATETIME('now', '-1 day', 'start of year').

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarity purposes: Does the visitor revisit after appending? Maybe we can keep a small comment for further programmers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The visitor doesn't visit SQLGlot nodes, it returns them. In this case, base is a SQLGlot expression node returned earlier by the SQLGlot relational expression visitor (which created base from a relational expression). All we're doing is modifying base after the fact, instead of returning a new object containing it.

Comment on lines +243 to +245
if isinstance(base, sqlglot_expressions.Datetime):
base.this.append(offset_expr)
return base
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This edge case allows us to flatten DATETIME(DATETIME('now', 'start of year'), '-1 day') into DATETIME('now', 'start of year', '-1 day').

@knassre-bodo knassre-bodo marked this pull request as ready for review February 17, 2025 19:49
Returns:
The SQLGlot expression to truncate `base`.
"""
if dialect == DatabaseDialect.SQLITE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other SQLite bindings, we should either create a new function here or refactor the other bindings to align with the current conditional approach.

Copy link
Contributor Author

@knassre-bodo knassre-bodo Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the core behavior is the same, but the helper function changes. What I probably want, in future, is to overhaul how the bindings work so there is a measure of inheritance. That way, there can be general superclasses that deal with most of the behavior, and subclasses can override specific helpers, and there can be inheritance as fallbacks. Whether this is literally via classes, or just a stack of objects, needs to be designed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, I agree with your sentiment, but I want fixing this to be part of a larger overhaul/refactor of the file.

" \n \t312 \nD \n\r ",
),
DATETIME(" Current_Date ", "+ 45 MM", "-135 s"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Might be far-fetched but how about adding tests for camel-cases such as DATETIME("cUrReNt_Timestamp").

Copy link
Contributor Author

@knassre-bodo knassre-bodo Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 might as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a few weird ones.

@knassre-bodo knassre-bodo merged commit 5fc012c into main Feb 19, 2025
5 checks passed
@knassre-bodo knassre-bodo deleted the kian/datetime branch February 19, 2025 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support a timestamp construction/alteration API in PyDough similar to the sqlite DATETIME function
2 participants