Epistemic status: This appendix is a procedures document. It specifies estimation recipes for core TFT quantities. End-to-end worked examples that instantiate the full chain are provided in Appendix C (Kalman, exact) and Appendix D (RL, approximate).
This appendix addresses the measurement gap between TFT's formal objects and practical deployment: how to estimate
| Quantity | Role in TFT | Typical unit | Minimum data needed |
|---|---|---|---|
| Model uncertainty (TF-06) | domain-specific variance/entropy | Predictive posterior or ensemble spread | |
| Observation uncertainty (TF-04/TF-06) | sensor variance/noise scale | Channel calibration or residual variance | |
| Mismatch injection rate (TF-11) | surprise per time | Time-series of mismatch magnitudes | |
| Local mismatch drift during pauses (TF-09) | surprise per time | Deliberation windows with no corrective action | |
| Lower correction efficiency bound (App. A) | inverse time | Vector mismatch trajectories + correction term | |
| Radius where local sector condition holds (App. A) | surprise magnitude | Same as |
|
| Functional adequacy threshold (TF-11) | surprise magnitude | Task-level performance curve vs mismatch |
Use the most native uncertainty representation available in the domain:
| Domain |
|
|
|---|---|---|
| Kalman / linear Gaussian | Prior predictive variance |
Measurement-noise variance |
| Conjugate Bayes | Posterior variance / inverse effective sample size | Likelihood variance (or precision inverse) |
| RL with ensembles | Across-head predictive variance |
TD-target noise variance over replay batches |
| Neural net regression | Ensemble or Laplace posterior variance | Aleatoric head output |
| PID / classical control | State-estimation covariance from observer | Sensor noise from calibration + residual PSD |
For TFT's scalar gain heuristic (TF-06), normalize to common units and compute:
[Operational Definition] $$\hat{\eta}^*t = \frac{\hat{U}{M,t}}{\hat{U}{M,t} + \hat{U}{o,t}}$$
Let
Global mismatch injection rate:
[Operational Definition] $$\hat{\rho}(t) = \left[\frac{s_{t+\Delta t} - s_t}{\Delta t} + \hat{\mathcal{T}}t , s_t\right]+$$
where
Note on estimation sequencing. This estimator requires
Local pause-window drift for TF-09:
[Operational Definition] $$\hat{\rho}{\text{delib}} = \operatorname*{median}{w \in \mathcal{W}{\text{pause}}} \frac{s{w,\text{end}} - s_{w,\text{start}}}{\Delta\tau_w}$$
using windows where corrective action is suspended or effectively delayed.
Appendix A uses:
[Assumption A2']
Operationally:
- Estimate
$\dot{\delta}_t$ (finite differences or filtered derivative). - Compute
$\widehat{F}_t = -\dot{\delta}_t + w_t$ where disturbance proxy$w_t$ is estimated from exogenous perturbation channels or residual balancing. - Form ratios
$r_t = (\delta_t^T \widehat{F}_t) / |\delta_t|^2$ on bins of$|\delta_t|$ . - Set conservative lower bound
$\hat{\alpha}$ as a low quantile (for example 10th percentile) of$r_t$ in the valid region.
Estimate
[Operational Criterion]
with a chosen violation tolerance
Define a mission-level performance metric
[Operational Definition] $$|\hat{\delta}{\text{critical}}| = \inf \left{ d : \mathbb{E}[J \mid |\delta| = d] < J{\min} \right}$$
This anchors TF-11's normalized persistence condition to real task outcomes.
- Fix mismatch representation
$\delta$ in one consistent unit system (prefer surprise-scale). - Estimate
$U_o$ from channel physics/calibration; estimate$U_M$ from model uncertainty. - Validate gain behavior against TF-06 (
$\hat{\eta}^*$ trend checks). - Estimate
$\rho_{\text{delib}}$ from pause windows (TF-09) and$\rho(t)$ from full traces (TF-11). - Estimate
$\alpha$ and$R$ from local correction dynamics (Appendix A). - Estimate
$|\delta_{\text{critical}}|$ from task-performance degradation. - Compute derived diagnostics: tempo margin $\hat{\mathcal{T}} - \hat{\rho}/|\hat{\delta}{\text{critical}}|$, reserve $\widehat{\Delta \rho^} = \hat{\alpha}\hat{R} - \hat{\rho}$, and deliberation feasibility $\Delta\eta^(\Delta\tau)|\delta{\text{post}}| - \hat{\rho}_{\text{delib}}\Delta\tau$.
End-to-end worked examples demonstrating the full TFT chain are provided in Appendix C (Kalman/linear-Gaussian domain, exact mapping) and Appendix D (nonstationary bandit/RL domain, approximate mapping).
For any domain report claiming TFT validation, include:
- Mismatch definition and units.
- Estimation method for
$U_M$ ,$U_o$ , and uncertainty calibration diagnostics. - Estimation method for
$\rho_{\text{delib}}$ and$\rho(t)$ with window definitions. - Sector-bound estimation method (
$\alpha$ ,$R$ ) and violation tolerance$\epsilon$ . - Task-level definition of
$|\delta_{\text{critical}}|$ . - At least one ablation where
$\eta$ is intentionally miscalibrated to test TF-06 predictions. - At least one induced-shock test to evaluate reserve prediction
$\Delta\rho^*$ .
The estimators in B.2 are point recipes. This section provides guidance on their reliability.
[Operational Guidance] $$\text{Var}(\hat{\eta}^*) \approx \hat{\eta}^{2}(1-\hat{\eta}^)^2 \left[\frac{\text{Var}(\hat{U}_M)}{\hat{U}_M^2} + \frac{\text{Var}(\hat{U}_o)}{\hat{U}_o^2}\right]$$
When
General nonstationarity caveat. All estimators assume approximate stationarity over the estimation window. If the environment is changing during estimation, the estimates characterize the average dynamics over that window, not instantaneous values. Use sliding windows matched to the expected stationarity timescale. When environment regime changes are suspected (TF-10), re-estimate from post-change data only.
The estimators above support three operational decision procedures.
The exploration weight
| Context |
|
Source |
|---|---|---|
| Finite bandits | Gittins index from dynamic programming | Exact (Gittins 1979) |
| Linear-Gaussian | Probing cost in quadratic objective | Exact (dual control) |
| Discrete MDP | Information-directed sampling (Russo & Van Roy) | |
| General | Heuristic: scale CIY weight by relative uncertainty |
For the heuristic: when
From Proposition 9.1 (TF-09), deliberation of duration
- Estimate
$\rho_{\text{delib}}$ from prior pause windows (B.2.2). - Before each deliberation episode, estimate
$|\delta_{\text{post}}|$ as current mismatch +$\rho_{\text{delib}} \cdot \Delta\tau_{\text{planned}}$ . - Estimate
$\Delta\eta^*(\Delta\tau)$ from the diminishing-returns profile of past deliberation episodes (or from the marginal improvement of the first few candidate actions evaluated). - Stop deliberating when the marginal improvement rate
$\partial \Delta\eta^* / \partial \Delta\tau$ drops below$\rho_{\text{delib}} / |\delta_{\text{post}}|$ (the first-order optimality condition from TF-09).
From Proposition 10.1 (TF-10), structural adaptation is indicated when parametric convergence leaves a mismatch floor. Operationally, the switching decision balances expected mismatch reduction against transition cost:
- Estimate the current mismatch floor
$|\delta|_{\text{floor}}$ from the converged residual statistics. - Estimate the post-switch expected mismatch as
$|\delta|_{\text{new}} \approx \rho / \alpha'$ where$\alpha'$ is the sector bound under the candidate new model class (may need pilot estimation). - Estimate transition cost
$C_{\text{switch}}$ : knowledge loss (parameters that don't transfer), retraining time ($\Delta\tau_{\text{switch}}$ ), and accumulated mismatch during transition ($\rho \cdot \Delta\tau_{\text{switch}}$ ). - Switch when: $(|\delta|{\text{floor}} - |\delta|{\text{new}}) \cdot T_{\text{horizon}} > C_{\text{switch}}$, where
$T_{\text{horizon}}$ is the expected time the new model class will remain adequate.