-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
A-sqlArea: Polars SQL functionalityArea: Polars SQL functionalityacceptedReady for implementationReady for implementationenhancementNew feature or an improvement of an existing featureNew feature or an improvement of an existing feature
Description
Description
Currently, Polars SQL does not support the QUALIFY clause, a SQL feature/syntax available in several database backends such as BigQuery, Snowflake, Databricks, and Teradata.
There a number of use cases for the QUALIFY statement:
- Deduplication (keep latest/first record per group)
- Top-N queries per partition
- Filtering on percentile ranks
- Removing outliers based on window functions
Some additional benefits:
- It allows a user to write unittests for queries that use these statements that are executed on these platforms
- It enhances quality of life, as the user does not need to write separate CTEs to filter the results of a windowing function
Describe the solution you'd like
Add native support for the QUALIFY clause in Polars SQL Context. The QUALIFY clause should filter rows based on window function results, similar to how WHERE filters based on column values and HAVING filters based on aggregate results.
Example usage:
import polars as pl
from datetime import datetime
df = pl.DataFrame({
"id": [1, 1, 1, 2, 2, 3],
"name": ["Alice_v1", "Alice_v2", "Alice_v3", "Bob_v1", "Bob_v2", "Charlie"],
"updated_at": [
datetime(2024, 1, 1),
datetime(2024, 1, 2),
datetime(2024, 1, 3),
datetime(2024, 1, 1),
datetime(2024, 1, 2),
datetime(2024, 1, 1),
],
})
ctx = pl.SQLContext(test_table=df, eager=True)
result = ctx.execute("""
SELECT *
FROM test_table
QUALIFY ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY updated_at DESC
) = 1
""")The current workaround is the following:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY id
ORDER BY updated_at DESC
) AS rn
FROM test_table
)
WHERE rn = 1Expected Output
| id | name | updated_at | _persist_time |
|---|---|---|---|
| 1 | Alice_v3 | 2024-01-03 00:00:00 | 2024-01-03 12:00:00 |
| 2 | Bob_v2 | 2024-01-02 00:00:00 | 2024-01-02 12:00:00 |
| 3 | Charlie | 2024-01-01 00:00:00 | 2024-01-01 12:00:00 |
Metadata
Metadata
Assignees
Labels
A-sqlArea: Polars SQL functionalityArea: Polars SQL functionalityacceptedReady for implementationReady for implementationenhancementNew feature or an improvement of an existing featureNew feature or an improvement of an existing feature