Skip to content

Commit 2f7e9a6

Browse files
authored
feat: add none deployment target, GEMINI.md per-language, and eval evalsets (#763)
* feat: add "none" deployment target, GEMINI.md improvements, and eval evalsets - Add "none" deployment target for local-only projects - Move GEMINI.md from _shared to python template directory - Add minimal GEMINI.md for Go and Java (ADK docs link) - Remove Python language guards (file is now Python-only) - Add eval evalsets and README for adk, adk_a2a, agentic_rag agents - Update adk-cheatsheet.md with ADK best practices - Update .gitignore, llm.txt, Makefile eval targets - Remove --cicd-runner from setup_cicd.py examples * feat: improve "none" deployment target handling - Add description for "none" in deployment target selection - Show enhance hint when "none" is selected - Skip lock file copy for "none" deployment target - Skip lock file generation for "none" in generate_locks - Add test for creating project with "none" deployment target - Update invalid deployment target test to include "none" * feat: add LLM-as-judge eval config and eval extra dependency - Add eval_config.json with rubric-based criteria for adk, adk_a2a, agentic_rag - Add google-adk[eval] as optional dependency in pyproject.toml - Fix evalsets: correct app_name and remove intermediate_data - Skip eval/eval-all targets in makefile usability test * fix: address PR review issues - Remove conflicting fastapi pin from BQ analytics dependencies - Add explicit session_type=in_memory for "none" deployment target - Fix escaped newline in BQ analytics console message - Remove stale comment about import assumption - Remove dead llm_txt loading code (no template references it) - Add "none" to adk_java deployment_targets for consistency - Remove duplicate adk entry in test fixture
1 parent 3bfcf25 commit 2f7e9a6

54 files changed

Lines changed: 5717 additions & 2864 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ Thumbs.db
197197
.saved_chats
198198
.aider*
199199
target
200+
target*/
200201
set_projects.sh
201202
delete_genai_repos.sh
202203
cleanup_e2e_projects.sh

agent_starter_pack/agents/adk/.template/templateconfig.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ example_question: "What's the weather in San Francisco?"
1717
settings:
1818
requires_data_ingestion: false
1919
requires_session: true
20-
deployment_targets: ["agent_engine", "cloud_run"]
20+
deployment_targets: ["agent_engine", "cloud_run", "none"]
2121
extra_dependencies: ["google-adk>=1.15.0,<2.0.0"]
2222
tags: ["adk"]
2323
frontend_type: "None"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"criteria": {
3+
"rubric_based_final_response_quality_v1": {
4+
"threshold": 0.8,
5+
"rubrics": [
6+
{
7+
"rubricId": "relevance",
8+
"rubricContent": { "textProperty": "The response directly addresses the user's query." }
9+
},
10+
{
11+
"rubricId": "helpfulness",
12+
"rubricContent": { "textProperty": "The response is helpful and provides useful information." }
13+
}
14+
]
15+
}
16+
}
17+
}
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Evaluation Sets
2+
3+
This directory contains evaluation sets for testing agent behavior using `adk eval`.
4+
5+
## Running Evaluations
6+
7+
```bash
8+
# Run default evalset
9+
make eval
10+
11+
# Run specific evalset
12+
make eval EVALSET=tests/eval/evalsets/custom.evalset.json
13+
14+
# Run all evalsets
15+
make eval-all
16+
```
17+
18+
## Evalset Format
19+
20+
Each `.evalset.json` follows the ADK evaluation format:
21+
22+
```json
23+
{
24+
"eval_set_id": "unique_id",
25+
"name": "Human-readable name",
26+
"description": "What this evalset tests",
27+
"eval_cases": [
28+
{
29+
"eval_id": "case_id",
30+
"conversation": [
31+
{
32+
"user_content": {
33+
"parts": [{"text": "User message"}]
34+
},
35+
"intermediate_data": {
36+
"tool_uses": [
37+
{"name": "tool_name", "args": {"param": "value"}}
38+
]
39+
}
40+
}
41+
],
42+
"session_input": {
43+
"app_name": "app_name",
44+
"user_id": "test_user",
45+
"state": {}
46+
}
47+
}
48+
]
49+
}
50+
```
51+
52+
## Key Fields
53+
54+
- `eval_cases`: Array of test scenarios
55+
- `conversation`: Sequence of user messages
56+
- `intermediate_data.tool_uses`: Expected tool calls (for trajectory matching)
57+
- `session_input`: Initial session state
58+
59+
## Evaluation Metrics
60+
61+
ADK eval measures:
62+
63+
- **tool_trajectory_avg_score**: Are the correct tools called in the right order?
64+
- **response_match_score**: How similar is the response to expected output?
65+
66+
## Creating Custom Evalsets
67+
68+
1. Copy `basic.evalset.json` as a template
69+
2. Add cases based on your `DESIGN_SPEC.md` scenarios
70+
3. Include expected tool calls for capability tests
71+
4. Run `make eval EVALSET=your_evalset.json`
72+
73+
## Tips
74+
75+
- Start with 3-5 representative cases
76+
- Include both happy path and edge cases
77+
- Test each core capability from DESIGN_SPEC.md
78+
- Add cases when you find bugs in production
79+
80+
See [ADK documentation](https://google.github.io/adk-docs/) for advanced evaluation options.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"eval_set_id": "basic_eval",
3+
"name": "Basic Agent Evaluation",
4+
"description": "Sample evaluation set for testing core agent functionality. Customize these cases based on your DESIGN_SPEC.md.",
5+
"eval_cases": [
6+
{
7+
"eval_id": "greeting",
8+
"conversation": [
9+
{
10+
"user_content": {
11+
"parts": [{"text": "Hello, what can you help me with?"}]
12+
}
13+
}
14+
],
15+
"session_input": {
16+
"app_name": "app",
17+
"user_id": "eval_user",
18+
"state": {}
19+
}
20+
},
21+
{
22+
"eval_id": "weather_query",
23+
"conversation": [
24+
{
25+
"user_content": {
26+
"parts": [{"text": "What's the weather like in San Francisco?"}]
27+
}
28+
}
29+
],
30+
"session_input": {
31+
"app_name": "app",
32+
"user_id": "eval_user",
33+
"state": {}
34+
}
35+
}
36+
]
37+
}

agent_starter_pack/agents/adk_a2a/.template/templateconfig.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ description: "ReAct agent with A2A protocol [experimental]"
1616
example_question: "What's the weather in San Francisco?"
1717
settings:
1818
requires_data_ingestion: false
19-
deployment_targets: ["agent_engine", "cloud_run"]
19+
deployment_targets: ["agent_engine", "cloud_run", "none"]
2020
extra_dependencies: ["google-adk>=1.16.0,<2.0.0", "a2a-sdk~=0.3.22", "nest-asyncio>=1.6.0,<2.0.0"]
2121
tags: ["adk", "a2a"]
2222
frontend_type: "None"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
{
2+
"criteria": {
3+
"rubric_based_final_response_quality_v1": {
4+
"threshold": 0.8,
5+
"rubrics": [
6+
{
7+
"rubricId": "relevance",
8+
"rubricContent": { "textProperty": "The response directly addresses the user's query." }
9+
},
10+
{
11+
"rubricId": "helpfulness",
12+
"rubricContent": { "textProperty": "The response is helpful and provides useful information." }
13+
}
14+
]
15+
}
16+
}
17+
}
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Evaluation Sets
2+
3+
This directory contains evaluation sets for testing agent behavior using `adk eval`.
4+
5+
## Running Evaluations
6+
7+
```bash
8+
# Run default evalset
9+
make eval
10+
11+
# Run specific evalset
12+
make eval EVALSET=tests/eval/evalsets/custom.evalset.json
13+
14+
# Run all evalsets
15+
make eval-all
16+
```
17+
18+
## Evalset Format
19+
20+
Each `.evalset.json` follows the ADK evaluation format:
21+
22+
```json
23+
{
24+
"eval_set_id": "unique_id",
25+
"name": "Human-readable name",
26+
"description": "What this evalset tests",
27+
"eval_cases": [
28+
{
29+
"eval_id": "case_id",
30+
"conversation": [
31+
{
32+
"user_content": {
33+
"parts": [{"text": "User message"}]
34+
},
35+
"intermediate_data": {
36+
"tool_uses": [
37+
{"name": "tool_name", "args": {"param": "value"}}
38+
]
39+
}
40+
}
41+
],
42+
"session_input": {
43+
"app_name": "app_name",
44+
"user_id": "test_user",
45+
"state": {}
46+
}
47+
}
48+
]
49+
}
50+
```
51+
52+
## Key Fields
53+
54+
- `eval_cases`: Array of test scenarios
55+
- `conversation`: Sequence of user messages
56+
- `intermediate_data.tool_uses`: Expected tool calls (for trajectory matching)
57+
- `session_input`: Initial session state
58+
59+
## Evaluation Metrics
60+
61+
ADK eval measures:
62+
63+
- **tool_trajectory_avg_score**: Are the correct tools called in the right order?
64+
- **response_match_score**: How similar is the response to expected output?
65+
66+
## Creating Custom Evalsets
67+
68+
1. Copy `basic.evalset.json` as a template
69+
2. Add cases based on your `DESIGN_SPEC.md` scenarios
70+
3. Include expected tool calls for capability tests
71+
4. Run `make eval EVALSET=your_evalset.json`
72+
73+
## Tips
74+
75+
- Start with 3-5 representative cases
76+
- Include both happy path and edge cases
77+
- Test each core capability from DESIGN_SPEC.md
78+
- Add cases when you find bugs in production
79+
80+
See [ADK documentation](https://google.github.io/adk-docs/) for advanced evaluation options.
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
{
2+
"eval_set_id": "basic_eval",
3+
"name": "Basic Agent Evaluation",
4+
"description": "Sample evaluation set for testing core agent functionality. Customize these cases based on your DESIGN_SPEC.md.",
5+
"eval_cases": [
6+
{
7+
"eval_id": "greeting",
8+
"conversation": [
9+
{
10+
"user_content": {
11+
"parts": [{"text": "Hello, what can you help me with?"}]
12+
}
13+
}
14+
],
15+
"session_input": {
16+
"app_name": "app",
17+
"user_id": "eval_user",
18+
"state": {}
19+
}
20+
},
21+
{
22+
"eval_id": "capability_query",
23+
"conversation": [
24+
{
25+
"user_content": {
26+
"parts": [{"text": "What tools do you have available?"}]
27+
}
28+
}
29+
],
30+
"session_input": {
31+
"app_name": "app",
32+
"user_id": "eval_user",
33+
"state": {}
34+
}
35+
}
36+
]
37+
}

agent_starter_pack/agents/adk_go/.template/templateconfig.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ settings:
1818
language: "go"
1919
requires_data_ingestion: false
2020
requires_session: false
21-
deployment_targets: ["cloud_run"]
21+
deployment_targets: ["cloud_run", "none"]
2222
extra_dependencies: []
2323
tags: ["adk", "go", "a2a"]
2424
frontend_type: "None"

0 commit comments

Comments
 (0)