Any persisting agent maintains a model — a compressed representation of its interaction history that supports prediction and action selection.
Interaction History (
[Definition] $$\mathcal{C}t = (o_1, a_1, o_2, a_2, \ldots, a{t-1}, o_t)$$
This is the agent's only raw material. Everything the agent "knows" must be constructed from this.
Model (
[Definition]
where
Model Space (
| Model Space |
Instance |
|---|---|
|
|
Kalman filter |
|
|
Bayesian agent |
|
|
RL agent |
| Neural network weight space | Deep learning agent |
|
|
PID controller |
| (Procedures, beliefs, culture) | Organization |
| Antibody repertoire + memory cells | Immune system |
The model updates recursively as a consequence of the causal structure (TF-02):
The arrow of time, partial observability, and state completeness jointly determine this as the unique causal-respecting update form. See TF-02 for the derivation and TF-04 for the multi-channel event framework. The function
The model faces a fundamental trade-off formalized by the information bottleneck1:
[Formulation (IB-objective)] $$\phi^* = \arg\min_{\phi} \left[ I(M_t; \mathcal{C}t) - \beta \cdot I(M_t; o{t+1:\infty} \mid a_{t:\infty}) \right]$$
-
$I(M_t; \mathcal{C}_t)$ measures compression cost — how much of the history the model retains -
$I(M_t; o_{t+1:\infty} \mid a_{t:\infty})$ measures predictive power — how much the model can predict -
$\beta$ controls the trade-off: higher$\beta$ favors prediction; lower favors compression
Varying
A PID controller's model (
Connection to environmental volatility. The optimal
A model is adequate to the degree that it is a sufficient statistic2 for the history with respect to future prediction. The formal measure of adequacy — model sufficiency
The key intuition: when the model captures everything predictively relevant in the history, knowing the full history
This document is labeled "Formulation" rather than "Axiom" because it makes a specific modeling choice: we analyze adaptive systems as maintaining compressed predictive representations. This is a representational definition, not a psychological or metaphysical claim — "model" includes any internal state that induces non-random action-history dependence, from a thermostat's bimetallic strip to a Bayesian posterior.
The framing is deliberately broad. An agent whose actions are non-random with respect to outcomes is, by definition, using some function of its history — i.e., a model, however implicit. This makes the formulation nearly tautological within the theory's scope, which is by design: TFT does not claim that all systems "have models" in any deep sense, but rather that analyzing them as maintaining compressed representations is productive when the formal apparatus (sufficiency, information bottleneck, model class fitness) can be meaningfully applied.
The model may be explicit (a Kalman state vector, a Bayesian posterior) or implicit (a subsumption architecture's wiring, a PID controller's three-number state, an organization's culture). The formulation claims only that it exists, not that it takes any particular form. The theory's content comes not from this existence claim but from the specific mathematical framework applied to it.