Skip to content

Commit d426e97

Browse files
committed
docs: add comprehensive module documentation and ADRs
Improve codebase legibility for AI-assisted development without changing any logic. Add module-level documentation to all 43 modules explaining purpose, key types, and algorithms. Changes: - Add //! doc comments to all module files - Create README.md files in src/network/, src/auth/, src/middleware/, and src/client_query/ for quick orientation - Refactor main.rs into documented helper functions for initialization - Add context to NoIndexers error variant (now includes query selector) - Create Architecture Decision Records documenting: - ADR-001: Static allocations via Box::leak - ADR-002: Type-state pattern for indexer processing - ADR-003: PID controller for fee budget management - Fix rustdoc bare URL warnings
1 parent b37acb4 commit d426e97

35 files changed

+1721
-104
lines changed
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# ADR-001: Static Allocations via Box::leak
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Context
8+
9+
The graph-gateway uses Axum as its HTTP framework. Axum's state management requires types to implement `Clone` and have `'static` lifetime. Several gateway components are heavyweight singletons that:
10+
11+
1. Are initialized once at startup
12+
2. Never need to be deallocated (process lifetime)
13+
3. Are expensive to clone (contain channels, cryptographic keys, etc.)
14+
15+
These components include:
16+
17+
- `ReceiptSigner` - TAP receipt signing with private keys
18+
- `Budgeter` - PID controller state for fee management
19+
- `Chains` - Chain head tracking with per-chain state
20+
- `Eip712Domain` (attestation domains) - EIP-712 signing domains
21+
22+
## Decision
23+
24+
Use `Box::leak()` to convert owned `Box<T>` into `&'static T` references for singleton components.
25+
26+
```rust
27+
// Example from main.rs
28+
let receipt_signer: &'static ReceiptSigner = Box::leak(Box::new(ReceiptSigner::new(...)));
29+
30+
let chains: &'static Chains = Box::leak(Box::new(Chains::new(...)));
31+
```
32+
33+
## Consequences
34+
35+
### Positive
36+
37+
1. **Zero-cost sharing**: `&'static T` is `Copy`, so passing to handlers has no overhead
38+
2. **No Arc overhead**: Avoids atomic reference counting on every request
39+
3. **Simpler lifetimes**: No need to propagate lifetime parameters through handler types
40+
4. **Explicit intent**: Makes it clear these are process-lifetime singletons
41+
42+
### Negative
43+
44+
1. **Memory never freed**: The leaked memory is never reclaimed. Acceptable because:
45+
- Components live for the entire process lifetime anyway
46+
- Total leaked memory is small and bounded (< 1 KB)
47+
- Process termination reclaims all memory
48+
49+
2. **Not suitable for tests**: Tests that need fresh state must use different patterns. Currently mitigated by limited test coverage.
50+
51+
## Alternatives Considered
52+
53+
### `Arc<T>` (Rejected)
54+
55+
```rust
56+
let receipt_signer: Arc<ReceiptSigner> = Arc::new(ReceiptSigner::new(...));
57+
```
58+
59+
Problems:
60+
61+
- Atomic operations on every clone (per-request overhead)
62+
- More complex to share across Axum handlers
63+
- Implies shared ownership when sole ownership is the intent
64+
65+
### `once_cell::sync::Lazy` (Rejected)
66+
67+
```rust
68+
static RECEIPT_SIGNER: Lazy<ReceiptSigner> = Lazy::new(|| ...);
69+
```
70+
71+
Problems:
72+
73+
- Requires initialization logic in static context
74+
- Cannot use async initialization
75+
- Configuration not available at static init time
76+
77+
## References
78+
79+
- [Axum State Documentation](https://docs.rs/axum/latest/axum/extract/struct.State.html)
80+
- [Box::leak documentation](https://doc.rust-lang.org/std/boxed/struct.Box.html#method.leak)
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# ADR-002: Type-State Pattern for Indexer Processing
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Context
8+
9+
Indexer information flows through multiple processing stages, with each stage enriching the data:
10+
11+
1. **Raw** - Basic indexer info from network subgraph
12+
2. **Version resolved** - After fetching indexer-service version
13+
3. **Progress resolved** - After fetching indexing progress (block height)
14+
4. **Cost resolved** - After fetching cost model/fee info
15+
16+
Processing order matters: we need version info before we can query for progress (different API versions), and we need progress before cost resolution makes sense (stale indexers are filtered).
17+
18+
A naive approach would use `Option<T>` fields that get populated:
19+
20+
```rust
21+
struct IndexingInfo {
22+
indexer: IndexerId,
23+
deployment: DeploymentId,
24+
version: Option<Version>, // Filled in stage 2
25+
progress: Option<BlockNumber>, // Filled in stage 3
26+
fee: Option<GRT>, // Filled in stage 4
27+
}
28+
```
29+
30+
This leads to `unwrap()` calls throughout the codebase and runtime errors when accessing fields before they're populated.
31+
32+
## Decision
33+
34+
Use the type-state pattern with generic parameters to encode processing stage at compile time.
35+
36+
```rust
37+
// Type markers for processing stages
38+
struct Unresolved;
39+
struct VersionResolved(Version);
40+
struct ProgressResolved { version: Version, block: BlockNumber }
41+
struct FullyResolved { version: Version, block: BlockNumber, fee: GRT }
42+
43+
// Generic struct parameterized by stage
44+
struct IndexingInfo<Stage> {
45+
indexer: IndexerId,
46+
deployment: DeploymentId,
47+
stage: Stage,
48+
}
49+
50+
// Stage transitions are explicit methods
51+
impl IndexingInfo<Unresolved> {
52+
fn resolve_version(self, version: Version) -> IndexingInfo<VersionResolved> {
53+
IndexingInfo {
54+
indexer: self.indexer,
55+
deployment: self.deployment,
56+
stage: VersionResolved(version),
57+
}
58+
}
59+
}
60+
```
61+
62+
See `src/network/indexer_processing.rs` for the actual implementation.
63+
64+
## Consequences
65+
66+
### Positive
67+
68+
1. **Compile-time safety**: Impossible to access version info before it's resolved
69+
2. **Self-documenting**: Function signatures show required processing stage
70+
3. **No runtime overhead**: Type parameters are erased at compile time
71+
4. **Explicit transitions**: Stage changes are visible method calls, not silent mutations
72+
73+
### Negative
74+
75+
1. **Verbose types**: `IndexingInfo<ProgressResolved>` is longer than `IndexingInfo`
76+
2. **Learning curve**: Pattern is less common, may confuse new contributors
77+
3. **More boilerplate**: Stage transition methods must be written explicitly
78+
79+
## Pattern Usage
80+
81+
```rust
82+
// Functions declare their required stage in the signature
83+
fn select_candidate(info: &IndexingInfo<FullyResolved>) -> Score {
84+
// Safe to access info.stage.fee - compiler guarantees it exists
85+
calculate_score(info.stage.fee, info.stage.block)
86+
}
87+
88+
// Processing pipeline
89+
async fn process_indexer(raw: IndexingInfo<Unresolved>) -> Result<IndexingInfo<FullyResolved>> {
90+
let with_version = raw.resolve_version(fetch_version(&raw.indexer).await?);
91+
let with_progress = with_version.resolve_progress(fetch_progress(&with_version).await?);
92+
let fully_resolved = with_progress.resolve_cost(fetch_cost(&with_progress).await?);
93+
Ok(fully_resolved)
94+
}
95+
```
96+
97+
## Alternatives Considered
98+
99+
### Builder Pattern (Rejected)
100+
101+
```rust
102+
IndexingInfoBuilder::new(indexer, deployment)
103+
.version(v)
104+
.progress(p)
105+
.fee(f)
106+
.build()
107+
```
108+
109+
Problems:
110+
111+
- Runtime validation only
112+
- `build()` must check all fields are set
113+
- No compile-time guarantee of processing order
114+
115+
### Separate Structs (Rejected)
116+
117+
```rust
118+
struct RawIndexingInfo { ... }
119+
struct ResolvedIndexingInfo { ... }
120+
```
121+
122+
Problems:
123+
124+
- Code duplication across struct definitions
125+
- Harder to share common logic
126+
- Type relationships not explicit
127+
128+
## References
129+
130+
- [Typestate Pattern in Rust](https://cliffle.com/blog/rust-typestate/)
131+
- [Parse, don't validate](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# ADR-003: PID Controller for Fee Budget Management
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Context
8+
9+
The gateway must manage query fee budgets to balance:
10+
11+
1. **Cost efficiency** - Minimize fees paid to indexers
12+
2. **Query success rate** - Ensure queries succeed by offering competitive fees
13+
3. **Responsiveness** - Adapt quickly to market conditions
14+
15+
Static fee budgets fail because:
16+
17+
- Too low: Indexers reject queries, degraded service
18+
- Too high: Overpaying, wasted budget
19+
- Market conditions change: Indexer fees fluctuate based on demand
20+
21+
We need a dynamic system that automatically adjusts fee budgets based on observed success rates.
22+
23+
## Decision
24+
25+
Implement a PID (Proportional-Integral-Derivative) controller to dynamically adjust fee budgets based on query success rate.
26+
27+
### PID Controller Overview
28+
29+
The PID controller continuously adjusts the fee budget using three terms:
30+
31+
```
32+
adjustment = Kp * error + Ki * integral + Kd * derivative
33+
34+
where:
35+
error = target_success_rate - actual_success_rate
36+
integral = sum of past errors
37+
derivative = rate of error change
38+
```
39+
40+
- **P (Proportional)**: Immediate response to current error
41+
- **I (Integral)**: Corrects persistent bias over time
42+
- **D (Derivative)**: Dampens oscillations, smooths response
43+
44+
### Implementation
45+
46+
See `src/budgets.rs` for implementation:
47+
48+
```rust
49+
pub struct Budgeter {
50+
controller: PidController,
51+
decay_buffer: DecayBuffer,
52+
budget_per_query: f64,
53+
}
54+
55+
impl Budgeter {
56+
pub fn feedback(&self, success: bool) {
57+
self.decay_buffer.record(success);
58+
let success_rate = self.decay_buffer.success_rate();
59+
let adjustment = self.controller.update(success_rate);
60+
self.budget_per_query *= adjustment;
61+
}
62+
}
63+
```
64+
65+
### Decay Buffer
66+
67+
Success rate is calculated using exponential decay to weight recent observations more heavily:
68+
69+
```
70+
weighted_sum = sum(success_i * decay^i)
71+
weighted_count = sum(decay^i)
72+
success_rate = weighted_sum / weighted_count
73+
```
74+
75+
This provides:
76+
77+
- Fast response to changing conditions
78+
- Natural forgetting of stale data
79+
- Bounded memory usage
80+
81+
## Consequences
82+
83+
### Positive
84+
85+
1. **Self-tuning**: Budget automatically converges to optimal level
86+
2. **Adaptive**: Responds to market changes without manual intervention
87+
3. **Stable**: PID controllers are well-understood and tuneable
88+
4. **Observable**: Budget changes can be monitored via metrics
89+
90+
### Negative
91+
92+
1. **Tuning required**: PID gains (Kp, Ki, Kd) must be tuned for the system
93+
2. **Oscillation risk**: Poorly tuned controller can oscillate
94+
3. **Complexity**: More complex than static budgets
95+
4. **Cold start**: Initial budget must be set heuristically
96+
97+
## Tuning Parameters
98+
99+
Current parameters (may need adjustment based on production data):
100+
101+
| Parameter | Value | Purpose |
102+
| --------- | ----- | ----------------------------------------- |
103+
| Kp | 0.1 | Proportional gain - immediate response |
104+
| Ki | 0.01 | Integral gain - bias correction |
105+
| Kd | 0.05 | Derivative gain - oscillation damping |
106+
| Target | 0.95 | Target success rate (95%) |
107+
| Decay | 0.99 | Decay factor for success rate calculation |
108+
109+
## Alternatives Considered
110+
111+
### Static Budget (Rejected)
112+
113+
```rust
114+
const BUDGET_PER_QUERY: GRT = GRT::from_wei(1_000_000);
115+
```
116+
117+
Problems:
118+
119+
- Cannot adapt to market conditions
120+
- Requires manual intervention to change
121+
- Either overpays or fails queries
122+
123+
### Threshold-based Adjustment (Rejected)
124+
125+
```rust
126+
if success_rate < 0.9 { budget *= 1.1; }
127+
if success_rate > 0.95 { budget *= 0.9; }
128+
```
129+
130+
Problems:
131+
132+
- Oscillates around thresholds
133+
- Step changes cause instability
134+
- No derivative term to dampen oscillations
135+
136+
### Machine Learning Model (Rejected)
137+
138+
Train a model to predict optimal budget based on features.
139+
140+
Problems:
141+
142+
- Requires training data
143+
- Black box behavior
144+
- Overkill for this use case
145+
146+
## References
147+
148+
- [PID Controller (Wikipedia)](https://en.wikipedia.org/wiki/PID_controller)
149+
- [Control Theory for Software Engineers](https://blog.acolyer.org/2015/05/01/feedback-control-for-computer-systems/)

src/auth.rs

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,32 @@
1+
//! API Key Authentication
2+
//!
3+
//! Handles API key validation, payment status checks, and domain authorization.
4+
//!
5+
//! # Authentication Flow
6+
//!
7+
//! 1. Extract API key from `Authorization: Bearer <key>` header
8+
//! 2. Parse and validate key format (32-char hex string → 16 bytes)
9+
//! 3. Look up key in `api_keys` map (from Studio API or Kafka)
10+
//! 4. Check payment status (`QueryStatus::Active`, `ServiceShutoff`, `MonthlyCapReached`)
11+
//! 5. Verify origin domain against authorized domains list
12+
//! 6. Return [`AuthSettings`] with user address and authorized subgraphs
13+
//!
14+
//! # Special API Keys
15+
//!
16+
//! Keys in `special_api_keys` bypass payment checks. Used for admin/monitoring.
17+
//!
18+
//! # Domain Authorization
19+
//!
20+
//! The `domains` field supports wildcards:
21+
//! - `"example.com"` → exact match only
22+
//! - `"*.example.com"` → matches `foo.example.com`, `bar.example.com`
23+
//! - Empty list → all domains authorized
24+
//!
25+
//! # API Key Sources
26+
//!
27+
//! - [`studio_api`]: Poll HTTP endpoint periodically
28+
//! - [`kafka`]: Stream updates from Kafka topic
29+
130
pub mod kafka;
231
pub mod studio_api;
332

0 commit comments

Comments
 (0)