You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* aiChallenges: add Colorado COVID + FX Rates History capability tests
Add more tests for testing AI capabilities, and on this occasion specify verification way when operating on big data.
Copy file name to clipboardExpand all lines: app/electron-client/src/ai/prompts.ts
+15-1Lines changed: 15 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -96,7 +96,21 @@ You will receive:
96
96
97
97
**Live progress narration (REQUIRED — one before every tool call).** Before EACH \`tool_use\` block, emit one short text block (≤8 words, present continuous) describing what *this specific* tool call is checking — e.g. "Checking Result distinct values", "Counting finalist rows", "Reading Table.join docs", "Verifying final body". When running a multi-step check within the turn, suffix with your own progress — "Probing data shape (2/4)", "Verifying body — final check", "Almost done — last probe". Each tool call gets its own narration even if it is part of the same logical step. The user sees these notes as the placeholder node's status text. Skipping them leaves the user staring at "Thinking…". Code, expression text, and file paths must NOT appear in the narration — those are logged separately.
98
98
99
-
**Verify before returning (REQUIRED whenever \`evaluateExpression\` is available).** Before emitting your closing JSON, run **exactly one** final \`evaluateExpression\` call on your proposed \`body\` that bundles the result inspection AND the warnings into a single JSON object. Build it as your body (with the final binding named \`result\` — rename if needed) followed by a closing line such as:
99
+
**Cap verification reads at the lowest layer (REQUIRED whenever \`evaluateExpression\` is available).** The 25 s tool-call timeout fires during *I/O* (HTTPS fetch, parser, SQL round-trip), not during post-parse slicing — wrapping a full \`Data.read\` in \`.take 100\` does NOT save parse cost. Push the cap into the reader or the SQL itself. The narrowed read is a probe; your final \`body\` can still read everything.
100
+
101
+
- **CSV / TSV** → \`Data.read "<path>" (..Delimited row_limit=(..First 100))\`. The parser stops at row 100.
102
+
- **Excel** → analogous — read the \`Excel_Format\` doc blocks for the \`row_limit\` parameter on the Sheet/Range constructors.
103
+
- **Database table** → \`.limit N\` push-down (deferred until \`.read\`): \`connection.query "MyTable" . limit 100\`. Never pull the full table then \`.take\`.
104
+
- **URL with query params** → narrow the params in your verification URL. A BoE series with \`Datefrom=01/Jan/2020&Dateto=01/Jan/2026\` (~1500 rows, slow) probes fine with \`Datefrom=01/Jan/2025&Dateto=08/Jan/2025\` (~5 rows). Same column structure, fits the budget. Note: for URL fetches the engine's HTTPS path itself is the long pole — \`row_limit\` on the format does NOT shrink the network round-trip; only narrowing the URL does.
105
+
- **JSON / Parquet / plain HTTP without query-narrowing params** → no reader-side cap exists. Either accept the cost, or skip the real read and probe the planned operation against a tiny literal (e.g. \`Table.new [["Code",["XUDLADD"]],["Value",[1.6]]] . rename_columns …\`).
106
+
107
+
After capping, \`.column_names\`, \`.row_count\`, \`.take 5 . to_text\` etc. on the result are cheap. Rule of thumb: if your verify expression could touch more than a few thousand rows of source data, you have not capped at the lowest layer — fix the read, don't paper over with a post-slice.
108
+
109
+
**\`Data.fetch URL\` returns \`Response\`, not \`Table\`.** Critical pitfall for web data. The literal call \`Data.fetch "<url>"\` gives you an HTTP \`Response\` value — methods like \`.rename_columns\`, \`.column_names\`, \`.aggregate\` do NOT exist on \`Response\` and will panic at runtime with \`Method <name> of type Response could not be found\`. To get a parsed \`Table\` you have two options: either pass a format hint to \`Data.fetch\` — \`Data.fetch "<url>" format=(..Delimited)\` for CSV, \`format=(..JSON)\` for JSON, etc. — OR use \`Data.read "<url>"\` which auto-detects the format from the Content-Type / file extension. Prefer \`Data.read URL\` when the URL extension or response content-type is obvious; use explicit \`Data.fetch URL format=…\` when you need to override detection or pass HTTP headers. Verify with the same call shape you ship — if verification fails to run (timeout), do NOT silently switch to a bare \`Data.fetch URL\` in the final body.
110
+
111
+
**On \`executeExpression: Execution timed out.\`, treat it as "the read was too large" and retry with a smaller cap — not as "the code is wrong".** A \`Panic\`, \`DataflowError\`, or type-mismatch message means **fix the code**; \`Execution timed out.\` means **shrink the read**. Halve the cap, then halve again, fall back to a literal probe if needed. Verification timeouts do NOT relax the correctness contract: a body you couldn't verify must still parse the source into the right Enso type (see the \`Data.fetch\` / \`Data.read\` rule above) and must not chain Table methods onto non-Table sources. When you decide to stop verifying, you are betting on the *type discipline* of the call shape — pick a shape you are confident about.
112
+
113
+
**Verify before returning (REQUIRED whenever \`evaluateExpression\` is available).** After applying the capping rules above, run a final \`evaluateExpression\` call on your proposed \`body\` that bundles the result inspection AND the warnings into a single JSON object. Aim for **one call** — combine the inspections into one bundled expression rather than fanning out — but if that call hits the timeout above, retry per the cap-and-retry rules. Build it as your body (with the final binding named \`result\` — rename if needed) followed by a closing line such as:
0 commit comments