Skip to content

fix(redshift): use boundary-aware segment stitching for query reconstruction#16253

Draft
kyungsoo-datahub wants to merge 3 commits intomasterfrom
fix/redshift-query-stitching
Draft

fix(redshift): use boundary-aware segment stitching for query reconstruction#16253
kyungsoo-datahub wants to merge 3 commits intomasterfrom
fix/redshift-query-stitching

Conversation

@kyungsoo-datahub
Copy link
Contributor

Redshift stores queries in fixed-width character(200) or character(4000) segments. RTRIM per-segment and RTRIM(LISTAGG(text)) both fail because character(n) padding is stripped before LISTAGG receives values, merging keywords at boundaries (e.g. GROUP BY -> GROUPBY).

Fix: add a space back when trimmed segment length < segment size. Applied to all 5 LISTAGG locations. Also replaced stl_query.querytxt (truncated to 4000 chars) with a CTE from STL_QUERYTEXT in provisioned scan lineage.

…ruction

Redshift stores queries in fixed-width character(200) or character(4000)
segments. RTRIM per-segment and RTRIM(LISTAGG(text)) both fail because
character(n) padding is stripped before LISTAGG receives values, merging
keywords at boundaries (e.g. GROUP BY -> GROUPBY).

Fix: add a space back when trimmed segment length < segment size. Applied
to all 5 LISTAGG locations. Also replaced stl_query.querytxt (truncated
to 4000 chars) with a CTE from STL_QUERYTEXT in provisioned scan lineage.
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Feb 18, 2026
@github-actions
Copy link
Contributor

Linear: ING-1654

@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ All tests successful. No failed tests found.

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label Feb 18, 2026
@kyungsoo-datahub kyungsoo-datahub marked this pull request as draft February 18, 2026 23:25
- Add relevant_queries CTE to scope STL_QUERYTEXT scan to time-filtered
  query IDs from stl_insert (fixes unbounded full-table scan)
- Extract segment size constants (_PROVISIONED_SEGMENT_SIZE=200,
  _SERVERLESS_SEGMENT_SIZE=4000) instead of inline magic numbers
- Add TODO for pre-existing unscoped CTE in list_insert_create_queries_sql
- Add crossed-wire tests (provisioned doesn't use 4000, serverless doesn't use 200)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion PR or Issue related to the ingestion of metadata needs-review Label for PRs that need review from a maintainer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments