Remove per-row runtime_error from CastInputsToJson in aggregate functions#243
Remove per-row runtime_error from CastInputsToJson in aggregate functions#243anasdorbani merged 2 commits intodevfrom
Conversation
…ion is already done at bind time Co-authored-by: anasdorbani <95044293+anasdorbani@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adjusts Flock’s aggregate-function input parsing to avoid throwing a runtime exception during aggregation when context_columns is absent, relying instead on bind-time validation.
Changes:
- Removed the runtime
std::runtime_errorbranch inAggregateFunctionBase::CastInputsToJsonwhencontext_columnsis missing. - Leaves enforcement of
context_columnspresence to bind-time checks (ValidatePromptStructFields).
| auto prompt_context_json = CastVectorOfStructsToJson(inputs[1], count); | ||
| auto context_columns = nlohmann::json::array(); | ||
| if (prompt_context_json.contains("context_columns")) { | ||
| context_columns = prompt_context_json["context_columns"]; | ||
| prompt_context_json.erase("context_columns"); | ||
| } else { | ||
| throw std::runtime_error("Missing 'context_columns' in second argument. The prompt struct must include context_columns."); | ||
| } |
There was a problem hiding this comment.
ValidatePromptStructFields only guarantees the prompt STRUCT type contains a context_columns field, but CastVectorOfStructsToJson currently omits the context_columns key when the list is empty (and also returns an empty object when count == 0). With this change, those cases will silently produce context_columns = [] instead of throwing, which can change aggregate behavior (often resulting in NULL output). Consider ensuring CastVectorOfStructsToJson always materializes context_columns as an empty array when present-but-empty, or add an explicit validation/error here if empty context is not supported.
| if (prompt_context_json.contains("context_columns")) { | ||
| context_columns = prompt_context_json["context_columns"]; | ||
| prompt_context_json.erase("context_columns"); | ||
| } else { | ||
| throw std::runtime_error("Missing 'context_columns' in second argument. The prompt struct must include context_columns."); | ||
| } |
There was a problem hiding this comment.
This change alters runtime behavior when context_columns is empty/missing in the JSON produced at execution time. Please add a unit/integration test that exercises an aggregate call with context_columns := [] (and/or an empty input chunk) to lock in the intended behavior (error vs NULL result).
CastInputsToJsonthrew astd::runtime_errorwhencontext_columnswas absent, but this was called on every row during aggregation — meaning a bad prompt would produce one exception per row at execution time rather than failing once at bind time.Changes
src/functions/aggregate/aggregate.cpp: Removed theelse { throw std::runtime_error(...) }branch fromCastInputsToJson. Thecontext_columnspresence check is already enforced at bind time byValidatePromptStructFields, making the per-row throw unreachable dead code.💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.