Skip to content

fix: drop a deep BinaryExpr chain iteratively to avoid a stack overflow#23198

Open
mdashti wants to merge 1 commit into
apache:mainfrom
paradedb:moe/sql-stack-overflow-guard
Open

fix: drop a deep BinaryExpr chain iteratively to avoid a stack overflow#23198
mdashti wants to merge 1 commit into
apache:mainfrom
paradedb:moe/sql-stack-overflow-guard

Conversation

@mdashti

@mdashti mdashti commented Jun 25, 2026

Copy link
Copy Markdown

Which issue does this close?

None filed.

Rationale for this change

This PR makes a deep BinaryExpr chain drop iteratively instead of recursively.

A left-associative chain like a OR b OR ... parses with shallow recursion (the left side accumulates in a loop), so recursion_limit never bounds its length. sql_expr_to_logical_expr builds it iteratively (the stack machine from #1444), so planning is fine. But the resulting Expr is as deep as the chain, and its recursive Drop through the boxed children overflows the stack. The tree-walk passes are #[recursive]-protected, so the Drop is the one recursion the protection can't reach.

impl Drop for Expr isn't viable: it forbids moving fields out of Expr, which map_children and the rewriters do by value (E0509).

What changes are included in this PR?

Wrap the recursive BinaryExpr.left/right edges in a BoxedExpr newtype whose Drop detaches each node's children into a heap work-list before the node drops. Expr and BinaryExpr keep no Drop, so the by-value destructuring elsewhere is untouched, and BinaryExpr::new keeps its Box<Expr> signature. The unbounded-depth edges are only these chains; other nesting is capped by recursion_limit.

The rest of the diff is mechanical: a few consumers move BinaryExpr operands by value, so each *left becomes *left.into_inner().

Are these changes tested?

Yes. deep_binary_expr_chain_drops_without_overflowing builds a 200000-deep OR chain and drops it on a normal stack. The existing test_stack_overflow cases cover the SQL planning path, and this fix is what lets them pass without a large-stack workaround.

Are there any user-facing changes?

No.

@github-actions github-actions Bot added the sql SQL Planner label Jun 25, 2026
@mdashti mdashti force-pushed the moe/sql-stack-overflow-guard branch from 100b14c to f7ed788 Compare June 26, 2026 00:53
@mdashti mdashti changed the title fix: guard test_stack_overflow against a deep-recursion stack overflow fix: run the deep test_stack_overflow cases on a large explicit stack Jun 26, 2026
@mdashti mdashti force-pushed the moe/sql-stack-overflow-guard branch from f7ed788 to a46cc36 Compare June 26, 2026 05:08
@mdashti mdashti changed the title fix: run the deep test_stack_overflow cases on a large explicit stack fix: cap binary-operator chain depth at the recursion limit Jun 26, 2026
@mdashti mdashti marked this pull request as draft June 26, 2026 07:21
@mdashti mdashti force-pushed the moe/sql-stack-overflow-guard branch from a46cc36 to ff9369c Compare June 26, 2026 07:31
@mdashti mdashti changed the title fix: cap binary-operator chain depth at the recursion limit fix: run the deep test_stack_overflow cases on a large stack Jun 26, 2026

@adriangb adriangb left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm curious if this is a test-only concern or also a real production issue. should we implement an iterative Drop?

Comment thread datafusion/sql/src/expr/mod.rs Outdated
Comment on lines +1561 to +1562
std::thread::Builder::new()
.stack_size(64 * 1024 * 1024)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt of this reply: "doesn't this defeat the purpose of the test? can we keep the smaller stack size for the planning (which is what we are trying to ensure is iterative) but move the value into a large stack thread to run drop()?"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You're right. It's a real production issue, so I went with the iterative Drop. The BinaryExpr children hold a BoxedExpr whose Drop tears a deep chain down off the call stack (39e458c5c), so the original test_stack_overflow cases pass with no stack trick. impl Drop for Expr wasn't viable (E0509 from the by-value destructuring in map_children), so the newtype wraps just the recursive edges. Repurposed the PR for it.

@mdashti mdashti force-pushed the moe/sql-stack-overflow-guard branch from ff9369c to 5a710be Compare June 26, 2026 16:32
A long `OR`/`AND` chain builds an `Expr` as deep as the chain, so its
recursive `Drop` overflows the stack. The `BinaryExpr` children hold a
smart pointer whose `Drop` tears the chain down iteratively.
@mdashti mdashti force-pushed the moe/sql-stack-overflow-guard branch from 5a710be to 39e458c Compare June 26, 2026 19:22
@mdashti mdashti changed the title fix: run the deep test_stack_overflow cases on a large stack fix: drop a deep BinaryExpr chain iteratively to avoid a stack overflow Jun 26, 2026
@github-actions github-actions Bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate substrait Changes to the substrait crate labels Jun 26, 2026

@mdashti mdashti left a comment

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangb Thanks for the comment.

Comment thread datafusion/sql/src/expr/mod.rs Outdated
Comment on lines +1561 to +1562
std::thread::Builder::new()
.stack_size(64 * 1024 * 1024)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You're right. It's a real production issue, so I went with the iterative Drop. The BinaryExpr children hold a BoxedExpr whose Drop tears a deep chain down off the call stack (39e458c5c), so the original test_stack_overflow cases pass with no stack trick. impl Drop for Expr wasn't viable (E0509 from the by-value destructuring in map_children), so the newtype wraps just the recursive edges. Repurposed the PR for it.

@mdashti mdashti requested a review from adriangb June 26, 2026 19:37
@mdashti mdashti marked this pull request as ready for review June 26, 2026 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates sql SQL Planner substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants