Experiments as first-class evidence source for intent authorization

## Context

Evidence-backed intent authorization (DISCUSS wave in `docs/feature/intent-evidence/discuss/`) requires `evidence_refs` on intents pointing to graph records. Experiments (#188) produce high-trust evidence through a governed lifecycle. This issue connects the two: concluded experiments and their outputs become a privileged evidence class for intent authorization.

Related issues:
- #188 Experiments — structured uncertainty as a first-class graph entity
- #191 Decision Warrants — from evidence-backed intents to institutional closure
- #165 External event ingestion via webhooks (external evidence sources)
- Evidence-backed intent authorization (DISCUSS artifacts in `docs/feature/intent-evidence/`)

## Problem

The evidence verification pipeline treats all `evidence_refs` equally — a standalone observation created by any agent has the same weight as a decision that emerged from a month-long experiment with human-approved budget, success criteria, and concluded results. This is wrong.

A supply chain team approving a new vendor sourcing strategy based on a 2-week procurement experiment (hypothesis tested, budget approved, results measured) should carry more weight than an agent's ad-hoc observation that "supplier X looks cheaper." The evidence verification pipeline has no way to distinguish these today.

## Design

### Evidence quality tiers

The intent authorizer assigns trust weight based on evidence provenance:

| Tier | Source | Trust weight | Why |
|------|--------|-------------|-----|
| **Tier 1** | Concluded experiment output (`produced` edge from experiment with `status: concluded`) | Highest | Human-approved hypothesis, bounded budget, explicit success criteria, governed lifecycle |
| **Tier 2** | Confirmed decision / resolved observation (independent authorship) | High | Went through confirmation by a different identity than the requester |
| **Tier 3** | Provisional decision / open observation | Medium | Exists in the graph but not yet validated by an independent party |
| **Tier 4** | Standalone entity with same author as intent requester | Low | No independence, no external validation |

### Experiment-to-evidence provenance chain

```
Experiment (proposed → approved → running → concluded → absorbed)
  │
  ├── produced → Decision D-1 (confirmed)     ← Tier 1 evidence
  ├── produced → Observation O-3 (resolved)    ← Tier 1 evidence
  ├── produced → Learning L-5 (active)         ← Tier 1 evidence (informational)
  │
  └── The experiment record itself              ← Tier 1 evidence (proves structured inquiry happened)
```

When an intent references a decision as evidence, the verification pipeline checks:
1. Does this decision have a `produced` edge from an experiment?
2. Is that experiment `concluded` or `absorbed`?
3. Was the experiment `approved` (human gate passed)?

If yes → Tier 1 trust weight. If no → fall through to Tier 2/3 based on decision status and authorship.

### Verification pipeline additions

Add to the deterministic pre-LLM verification:

1. **Experiment provenance check** — for each `evidence_ref`, query `SELECT <-produced<-experiment WHERE status IN ['concluded', 'absorbed']`. If found, tag as experiment-backed.
2. **Experiment status gate** — reject evidence from experiments still in `proposed` or `running` status (results not yet available).
3. **Budget compliance check** — flag evidence from experiments that exceeded their approved budget (trust discount, not rejection).
4. **Absorption check** — evidence from `concluded` (not yet `absorbed`) experiments gets a soft warning: results exist but haven't been formally converted to decisions/learnings yet.

### Risk router integration

Experiment-backed evidence lowers effective risk:
- Intent with 3 Tier 1 evidence refs (all from concluded experiments) → risk score discount of 15-20 points
- Intent mixing Tier 1 and Tier 3 evidence → standard risk scoring
- Intent with only Tier 4 evidence → risk score premium

This means well-evidenced intents from concluded experiments are more likely to auto-approve, while poorly-evidenced intents face higher scrutiny. The system rewards structured inquiry.

### LLM evaluator context

When the LLM evaluator runs (high-risk intents), include experiment context:
- Experiment hypothesis and success criteria
- Whether results confirmed or rejected the hypothesis
- Budget utilization (within bounds = trustworthy)
- Time from experiment start to conclusion (rushed experiments are less trustworthy)

### Observer integration

New Observer scan patterns:
- **Evidence without experiment**: High-risk intent approved with no experiment-backed evidence → suggest running an experiment first
- **Experiment results unused**: Concluded experiment with decisions/observations that have never been referenced as evidence → the knowledge exists but isn't being applied
- **Repeated evidence patterns**: Same evidence refs used across many intents → suggest formalizing as a policy rather than re-verifying each time

## Examples

### Vendor sourcing decision
1. Procurement team proposes experiment: "Test whether Supplier B can meet SLA within 10% cost reduction"
2. Human approves with 2-week budget and success criteria
3. Experiment runs → tasks execute → observations logged → experiment concluded with results
4. Agent creates intent: "Switch primary supplier to Supplier B for component X"
5. Intent `evidence_refs`: experiment record + produced decision ("Supplier B met SLA in trial") + produced observation ("Cost reduction confirmed at 12%")
6. Verification pipeline: all Tier 1 (experiment-backed, human-approved, concluded) → risk discount applied → auto-approve

### Contrast: without experiment
1. Agent creates observation: "Supplier B seems cheaper based on public pricing"
2. Agent creates intent: "Switch primary supplier to Supplier B for component X"  
3. Intent `evidence_refs`: single observation by same agent
4. Verification pipeline: Tier 4 (same author, no independence) → risk premium → veto window or rejection

## Dependencies

- #188 Experiments (must be implemented first — schema, lifecycle, `produced` edges)
- Evidence-backed intent authorization (evidence_refs, verification pipeline)

## Phase

After #188 experiments are implemented and evidence-backed intents are in soft/hard enforcement mode.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments as first-class evidence source for intent authorization #192

Context

Problem

Design

Evidence quality tiers

Experiment-to-evidence provenance chain

Verification pipeline additions

Risk router integration

LLM evaluator context

Observer integration

Examples

Vendor sourcing decision

Contrast: without experiment

Dependencies

Phase

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tier	Source	Trust weight	Why
Tier 1	Concluded experiment output (`produced` edge from experiment with `status: concluded`)	Highest	Human-approved hypothesis, bounded budget, explicit success criteria, governed lifecycle
Tier 2	Confirmed decision / resolved observation (independent authorship)	High	Went through confirmation by a different identity than the requester
Tier 3	Provisional decision / open observation	Medium	Exists in the graph but not yet validated by an independent party
Tier 4	Standalone entity with same author as intent requester	Low	No independence, no external validation

Experiments as first-class evidence source for intent authorization #192

Description

Context

Problem

Design

Evidence quality tiers

Experiment-to-evidence provenance chain

Verification pipeline additions

Risk router integration

LLM evaluator context

Observer integration

Examples

Vendor sourcing decision

Contrast: without experiment

Dependencies

Phase

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions