Agent Persona Exploration - 2026-03-17 #21322

2026-03-17T01:34:30Z

github-actions[bot]
bot Mar 17, 2026

Persona Overview

Agent: developer.instructions (agentic-workflows proxy)
Scenarios Tested: 7
Average Quality Score: 4.86/5.0
Run: §23173867188

This is the highest quality run recorded across 20+ prior explorations. All 7 scenarios produced complete, production-ready workflow designs with deployment checklists and explicit security analysis.

Key Findings

Claude dominates (5/7 scenarios): reserved for complex multi-step analysis; codex correctly chosen for deterministic validation (commit linting), copilot for classification tasks (bug triage)
lockdown:true is now the consistent default for all event-triggered workflows — applied in 5/7 scenarios without prompting
Escape hatch patterns emerged as a quality signal: doc:skip annotation, dry_run input, confidence thresholds, and roles: gating appeared in 4 different scenarios
Prerequisite documentation improved significantly: agent listed 7 explicit limitations for the org billing scenario, required label bootstrap checklist for bug triage, and Discussions category setup for reports
Novel pattern: mcp-scripts:gh recommended for org-level GitHub API access (billing endpoints) beyond standard toolsets

Top Patterns

Trigger distribution: pull_request (3), schedule (2), issues:labeled (1), slash_command (1)
Tools: github.lockdown: true + explicit toolsets + read-only bash allowlist when shell needed
Security: safe-outputs label whitelist, concurrency cancel-in-progress, no write permissions on agent job

View High Quality Responses (5.0/5.0)

BE-1: Conventional Commits Enforcer — pull_request trigger, codex engine

No bash tools needed — pure GitHub API regex validation
Concurrency cancel-in-progress prevents stale run accumulation
3-tier classification (✅ valid / ⚠️ warning / ❌ violation) reduces noise vs binary pass/fail
Inline git fix commands (git commit --amend, git rebase -i) in violation comment template
REQUEST_CHANGES review as merge gate, auto-cleared on clean re-run

BE-2: API Documentation Gap Detector — pull_request + paths: filter, claude engine

6-layer false positive strategy: path filter → per-language exclusion rules → // doc:skip annotation → PR-level suppression → coarse docs heuristic → max: 10 comment cap
Per-language API surface detection rules for Go (exported identifiers), REST (route registration), GraphQL, TypeScript, and Protocol Buffers
Novel // doc:skip escape hatch annotation for intentionally undocumented exports
Tuning guidance for reducing false positives per language ecosystem

View Scenario Scores (All 7)

ID	Scenario	Persona	Engine	Trigger	Avg Score
BE-1	Conventional Commits Enforcer	Backend Eng	codex	pull_request	5.0
BE-2	API Doc Gap Detector	Backend Eng	claude	pull_request + paths	5.0
DO-1	Actions Usage Analyzer	DevOps	claude	schedule weekly	4.83
QA-1	Bug Triage Agent	QA Tester	copilot	issues:labeled	4.83
PM-1	Monthly Impact Report	Product Mgr	claude	schedule monthly	4.83
QA-2	/reproduce Slash Command	QA Tester	claude	slash_command	4.83
FE-1	Changelog Generator	Frontend Dev	claude	pull_request_target	4.67

View Areas for Improvement

DO-1 (DevOps/Org Billing): Requires an org-level admin PAT (ACTIONS_REPORT_PAT) with organization_administration: read scope — this is a real deployment barrier. The agent documented the limitation but the workflow can't be self-contained. Consider a pattern for graceful degradation when billing API returns 403.
FE-1 (Changelog via pull_request_target): pull_request_target with fork PRs is a high-risk trigger — the agent correctly flagged the injection risk and suggested author_association filtering, but this is a non-trivial security decision that could be missed by less careful implementors.
Engine selection guidance: The claude/codex/copilot tradeoffs are well-articulated per-scenario but not consistently summarized with a decision heuristic. A reusable rule like "codex for deterministic validation, claude for analysis, copilot for classification" would help.

Recommendations

Promote escape hatch patterns (doc:skip annotation, dry_run input, confidence thresholds, roles: gating) as first-class guidance in workflow authoring docs — these appeared organically across 4 scenarios and represent mature workflow design
Document mcp-scripts:gh as the canonical pattern for org-level GitHub API access (billing, org settings) that exceeds standard toolset coverage — with PAT scoping examples
Add 3-layer deduplication (repo-memory gate + max: 1 + close-older-discussions) as the canonical pattern for all scheduled report workflows; the PM-1 scenario demonstrated this elegantly

Trend

Run	Date	Avg Score
2026-02-22	Feb	4.37
2026-03-04	Mar	4.96 (5 scenarios)
2026-03-13	Mar	3.80
2026-03-16	Yesterday	3.93
2026-03-17	Today	4.86

Consistent quality remains dependent on scenario selection. Scenarios with well-scoped GitHub-native tasks (PR review, scheduled reports, issue automation) score higher than scenarios requiring external API access or binary artifact handling.

References: §23173867188

AI generated by Agent Persona Explorer · history

2026-03-19T01:33:41Z

github-actions[bot]
bot Mar 19, 2026
Author

This discussion has been marked as outdated by Agent Persona Explorer.

A newer discussion is available at Discussion #21704.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Persona Exploration - 2026-03-17 #21322

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Agent Persona Exploration - 2026-03-17 #21322

Uh oh!

github-actions[bot] bot Mar 17, 2026

Persona Overview

Key Findings

Top Patterns

Recommendations

Trend

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 19, 2026 Author

github-actions[bot]
bot Mar 17, 2026

github-actions[bot]
bot Mar 19, 2026
Author