Skip to content

Conversation

@alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Dec 6, 2025

Closes #25286.

The "QUALIFY" clause is to window function filtering what "HAVING" is to GROUP BY; it allows you to write cleaner SQL queries that constrain data by the results of one or more window functions, where you would otherwise have a nested query.

Example

from datetime import datetime
import polars as pl

df = pl.DataFrame({
  "user_id": [1, 1, 1, 2, 2, 3],
  "event": ["login", "purchase", "logout", "login", "purchase", "login"],
  "timestamp": [
      datetime(2024, 1, 15, 10, 0),
      datetime(2024, 1, 15, 14, 30),
      datetime(2024, 1, 15, 18, 0),
      datetime(2024, 1, 15, 9, 0),
      datetime(2024, 1, 15, 16, 45),
      datetime(2024, 1, 14, 8, 0),
  ],
})

A typical "latest record by id" query, without QUALIFY - requires nesting/projection:

df.sql("""
  SELECT user_id, event, timestamp
  FROM (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY timestamp DESC) as rn
    FROM self
  ) events
  WHERE rn = 1
  ORDER BY user_id
""")

The same query is much cleaner using QUALIFY:

df.sql("""
  SELECT user_id, event, timestamp
  FROM self
  QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY timestamp DESC) = 1
  ORDER BY user_id
""")

QUALIFY also plays well with the named WINDOW1 clause, which is nice once the query becomes non-trivial:

df.sql("""
  SELECT user_id, event, timestamp
  FROM self
  WINDOW most_recent_by_user AS (PARTITION BY user_id ORDER BY timestamp DESC)
  QUALIFY ROW_NUMBER() OVER most_recent_by_user = 1
  ORDER BY user_id
""")

# shape: (3, 3)
# ┌─────────┬──────────┬─────────────────────┐
# │ user_id ┆ event    ┆ timestamp           │
# │ ---     ┆ ---      ┆ ---                 │
# │ i64     ┆ str      ┆ datetime[μs]        │
# ╞═════════╪══════════╪═════════════════════╡
# │ 1       ┆ logout   ┆ 2024-01-15 18:00:00 │
# │ 2       ┆ purchase ┆ 2024-01-15 16:45:00 │
# │ 3       ┆ login    ┆ 2024-01-14 08:00:00 │
# └─────────┴──────────┴─────────────────────┘

Footnotes

  1. Support for WINDOW was added here:
    https://github.com/pola-rs/polars/pull/25400

@github-actions github-actions bot added A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Dec 6, 2025
@codecov
Copy link

codecov bot commented Dec 6, 2025

Codecov Report

❌ Patch coverage is 90.00000% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.36%. Comparing base (8ab1657) to head (757ceb0).

Files with missing lines Patch % Lines
crates/polars-sql/src/context.rs 90.00% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #25652      +/-   ##
==========================================
- Coverage   79.58%   79.36%   -0.22%     
==========================================
  Files        1743     1743              
  Lines      240439   240507      +68     
  Branches     3038     3038              
==========================================
- Hits       191347   190886     -461     
- Misses      48310    48839     +529     
  Partials      782      782              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-sql Area: Polars SQL functionality enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for QUALIFY clause in Polars SQL

1 participant