Skip to content

Add support for QUALIFY clause in Polars SQL #25286

@robinvanschaik

Description

@robinvanschaik

Description

Currently, Polars SQL does not support the QUALIFY clause, a SQL feature/syntax available in several database backends such as BigQuery, Snowflake, Databricks, and Teradata.

There a number of use cases for the QUALIFY statement:

  • Deduplication (keep latest/first record per group)
  • Top-N queries per partition
  • Filtering on percentile ranks
  • Removing outliers based on window functions

Some additional benefits:

  • It allows a user to write unittests for queries that use these statements that are executed on these platforms
  • It enhances quality of life, as the user does not need to write separate CTEs to filter the results of a windowing function

Describe the solution you'd like

Add native support for the QUALIFY clause in Polars SQL Context. The QUALIFY clause should filter rows based on window function results, similar to how WHERE filters based on column values and HAVING filters based on aggregate results.

Example usage:

import polars as pl
from datetime import datetime

df = pl.DataFrame({
    "id": [1, 1, 1, 2, 2, 3],
    "name": ["Alice_v1", "Alice_v2", "Alice_v3", "Bob_v1", "Bob_v2", "Charlie"],
    "updated_at": [
        datetime(2024, 1, 1),
        datetime(2024, 1, 2),
        datetime(2024, 1, 3),
        datetime(2024, 1, 1),
        datetime(2024, 1, 2),
        datetime(2024, 1, 1),
    ],
})

ctx = pl.SQLContext(test_table=df, eager=True)
result = ctx.execute("""
    SELECT *
    FROM test_table
    QUALIFY ROW_NUMBER() OVER (
        PARTITION BY id
        ORDER BY updated_at DESC
    ) = 1
""")

The current workaround is the following:

SELECT * FROM (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY id
            ORDER BY updated_at DESC
        ) AS rn
    FROM test_table
)
WHERE rn = 1

Expected Output

id name updated_at _persist_time
1 Alice_v3 2024-01-03 00:00:00 2024-01-03 12:00:00
2 Bob_v2 2024-01-02 00:00:00 2024-01-02 12:00:00
3 Charlie 2024-01-01 00:00:00 2024-01-01 12:00:00

Metadata

Metadata

Labels

A-sqlArea: Polars SQL functionalityacceptedReady for implementationenhancementNew feature or an improvement of an existing feature

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions