rougier · GuanchunLi · Apr 23, 2026
diff --git a/braincraft/env3_player_bio.md b/braincraft/env3_player_bio.md
@@ -0,0 +1,155 @@
+# Bio Player — Environment 3
+
+## 1. Overview
+
+`env3_player_bio.py` is a pointwise-activation Echo State Network
+controller for Environment 3:
+
+```text
+X(t+1) = f(Win @ I(t) + W @ X(t))
+O(t+1) = Wout @ g(X(t+1))        (g = identity)
+```
+
+Every hidden activation is a scalar function of its own preactivation;
+all cross-neuron logic lives in the connectivity matrices. The model
+is produced by a single `yield` in `bio_player()`, so the matrices are
+fixed at build time (no iterative training).
+
+Env3 exposes colour (the sources are coloured), but this controller
+does not read colour or energy. The bot runs around the outer
+corridor with a reflex wall-follower and picks up whatever source it
+happens to cross. Only 7 of the 1000 hidden slots receive any
+incoming weight; the rest are dead.
+
+## 2. Network shape
+
+| Parameter     | Value                    |
+| ------------- | ------------------------ |
+| `n`           | `1000`                   |
+| `p`           | `64` (camera rays)       |
+| `n_inputs`    | `2*p + 3 = 131`          |
+| `warmup`      | `0`                      |
+| `leak` (λ)    | `1.0`                    |
+| `g`           | identity                 |
+| Actuator clip | `step_a = 5°`            |
+
+Module constants: `front_gain_mag = 20°`, `step_a = 5°`.
+
+## 3. Activations
+
+| Name        | Formula                     | Used by                                |
+| ----------- | --------------------------- | -------------------------------------- |
+| `relu_tanh` | `max(0, tanh(z))`           | reflex channels (slots 0..4), front-block |
+| `clip_a`    | `clip(z, -step_a, +step_a)` | `dtheta`                               |
+
+## 4. Inputs and slot layout
+
+```text
+I(t) = [prox[0..63](t), colour[0..63](t), hit(t), energy(t), 1]
+```
+
+Taps used by the controller (colour and energy columns are unread):
+
+```text
+L_idx          = 20      (left reflex proximity tap)
+R_idx          = 43      (right reflex proximity tap)
+left_side_idx  = 11      (left safety tap)
+right_side_idx = 52      (right safety tap)
+C1_idx, C2_idx = 31, 32  (centre-front proximity taps)
+hit_idx        = 128     (= 2*p)
+bias_idx       = 130     (= 2*p + 2)
+```
+
+Hidden slots (`n = 1000`; slots `7..999` are dead):
+
+| Slot | Name          | Activation  | Role                              |
+| ---- | ------------- | ----------- | --------------------------------- |
+| 0    | `hit_feat`    | `relu_tanh` | hit reflex                        |
+| 1    | `prox_left`   | `relu_tanh` | left proximity reflex             |
+| 2    | `prox_right`  | `relu_tanh` | right proximity reflex            |
+| 3    | `safe_left`   | `relu_tanh` | left safety feature               |
+| 4    | `safe_right`  | `relu_tanh` | right safety feature              |
+| 5    | `dtheta`      | `clip_a`    | one-step-lagged steering command  |
+| 6    | `front_block` | `relu_tanh` | unsigned front-block detector     |
+
+## 5. Circuits
+
+### 5.1 Reflex features and readout
+
+Five feed-forward proximity/hit detectors:
+
+```text
+hit_feat(t+1)   = relu_tanh(hit(t))
+prox_left(t+1)  = relu_tanh(prox[L_idx](t))
+prox_right(t+1) = relu_tanh(prox[R_idx](t))
+safe_left(t+1)  = relu_tanh(-prox[left_side_idx](t)  + 0.75)
+safe_right(t+1) = relu_tanh(-prox[right_side_idx](t) + 0.75)
+```
+
+Steering readout:
+
+```text
+O(t+1) = hit_turn          * hit_feat(t+1)
+       + heading_gain      * prox_left(t+1)
+       - heading_gain      * prox_right(t+1)
+       + safety_gain_left  * safe_left(t+1)
+       + safety_gain_right * safe_right(t+1)
+       + front_gain_mag    * front_block(t+1)
+```
+
+with
+
+```text
+hit_turn          = -10° / tanh(1)
+heading_gain      = -40°
+safety_gain_left  = -20°
+safety_gain_right = +20°
+front_gain_mag    = +20°
+```
+
+`dtheta` holds the clipped one-step-lagged command,
+`dtheta(t+1) = clip(O(t), ±step_a)`, implemented by mirroring the
+`Wout` row into `W[dtheta, :]`.
+
+### 5.2 Front block
+
+Unsigned sum of the two centre proximity taps:
+
+```text
+front_block(t+1) = relu_tanh(prox[C1_idx](t) + prox[C2_idx](t) - 1.4)
+```
+
+A positive reading turns the bot by `+20°` (CCW) — a fixed-direction
+escape that keeps the bot on the outer corridor.
+
+## 6. Nonzero readout weights
+
+```text
+Wout[hit_feat]    = -10° / tanh(1)
+Wout[prox_left]   = -40°
+Wout[prox_right]  = +40°
+Wout[safe_left]   = -20°
+Wout[safe_right]  = +20°
+Wout[front_block] = +20°
+```
+
+Six nonzero entries total. The same row is mirrored into
+`W[dtheta, :]`.
+
+## 7. Verification
+
+```bash
+python braincraft/env3_player_bio.py
+```
+
+Runs `train(bio_player, timeout=100)` then
+`evaluate(model, Bot, Environment, debug=False, seed=12345)` over 10
+episodes:
+
+```text
+Final score (distance): 14.40 +/- 0.49
+```
+
+500-seed sweep (`validate_env3_player_bio.py`, seeds 0..499):
+across-seed mean `14.50 ± 0.17`, min `14.00`, `0/500` seeds below
+`13.50`.
diff --git a/braincraft/env3_player_bio.py b/braincraft/env3_player_bio.py
@@ -0,0 +1,156 @@
+# Braincraft challenge — Bio Player for Environment 3
+# Copyright (C) 2026 Guanchun Li
+# Released under the GNU General Public License 3
+
+"""
+Bio Player for Environment 3.
+
+Pointwise-activation Echo State Network controller:
+
+    X(t+1) = f(Win @ I(t) + W @ X(t))
+    O(t+1) = Wout @ g(X(t+1))        (g = identity)
+
+Every hidden activation depends only on its own preactivation; all
+cross-neuron logic is carried by the connectivity matrices.
+
+Input (131 cols): I(t) = [prox[0..63](t), colour[0..63](t),
+                          hit(t), energy(t), 1].
+Env3 exposes colour, but this controller does not read it — the bot
+runs around the outer corridor with a reflex wall-follower and picks
+up whatever source it happens to cross.
+
+Seven hidden slots:
+
+    0..4    reflex features (hit, proximity, safety)
+    5       dtheta (clipped one-step-lagged steering command)
+    6       unsigned front-block escape channel
+"""
+
+import numpy as np
+
+if not hasattr(np, "atan2"):
+    np.atan2 = np.arctan2
+
+from bot import Bot
+from environment_3 import Environment
+
+
+front_gain_mag = np.radians(20.0)
+step_a         = np.radians(5.0)      # actuator clip (±5°)
+
+
+def _bio_indices():
+    idx = {
+        "hit_feat":    0,
+        "prox_left":   1,
+        "prox_right":  2,
+        "safe_left":   3,
+        "safe_right":  4,
+        "dtheta":      5,
+        "front_block": 6,
+    }
+    idx["bio_end"] = 7
+    return idx
+
+
+def make_activation(a, idx):
+    """Per-neuron pointwise activation: clip for dtheta, relu_tanh elsewhere."""
+    def f(x):
+        out = np.maximum(0.0, np.tanh(x))
+        out[idx["dtheta"], 0] = float(np.clip(x[idx["dtheta"], 0], -a, a))
+        return out
+
+    return f
+
+
+def bio_player():
+    """Build the env3 bio controller and yield a single frozen model."""
+
+    bot = Bot()
+    n = 1000
+    p = bot.camera.resolution          # 64
+    warmup = 0
+    leak = 1.0
+    g = lambda x: x
+
+    # Env3 feeds I = [depths, colours, hit, energy, 1]. Colour and energy
+    # columns are unread but the hit/bias indices sit at the full 2p+3
+    # offsets.
+    n_inputs = 2 * p + 3               # 131
+    Win  = np.zeros((n, n_inputs))
+    W    = np.zeros((n, n))
+    Wout = np.zeros((1, n))
+
+    hit_idx  = 2 * p                   # 128
+    bias_idx = 2 * p + 2               # 130
+
+    idx = _bio_indices()
+    a   = step_a
+
+    HIT_FEAT   = idx["hit_feat"]
+    PROX_LEFT  = idx["prox_left"]
+    PROX_RIGHT = idx["prox_right"]
+    SAFE_LEFT  = idx["safe_left"]
+    SAFE_RIGHT = idx["safe_right"]
+    DTHETA     = idx["dtheta"]
+    FB         = idx["front_block"]
+
+    L_idx, R_idx                  = 20, 43     # reflex proximity taps
+    left_side_idx, right_side_idx = 11, 52     # safety taps
+    C1_idx, C2_idx                = 31, 32     # centre-front proximity taps
+    front_thr                     = 1.4
+
+    TANH1 = np.tanh(1.0)
+    hit_turn          = np.radians(-10.0) / TANH1
+    heading_gain      = np.radians(-40.0)
+    safety_gain_left  = np.radians(-20.0)
+    safety_gain_right = -safety_gain_left
+    safety_target     = 0.75
+
+    # Reflex features and steering readout.
+    Win[HIT_FEAT,    hit_idx]        = 1.0
+    Win[PROX_LEFT,   L_idx]          = 1.0
+    Win[PROX_RIGHT,  R_idx]          = 1.0
+    Win[SAFE_LEFT,   left_side_idx]  = -1.0
+    Win[SAFE_RIGHT,  right_side_idx] = -1.0
+    Win[SAFE_LEFT,   bias_idx]       = safety_target
+    Win[SAFE_RIGHT,  bias_idx]       = safety_target
+
+    Wout[0, HIT_FEAT]   = hit_turn
+    Wout[0, PROX_LEFT]  = heading_gain
+    Wout[0, PROX_RIGHT] = -heading_gain
+    Wout[0, SAFE_LEFT]  = safety_gain_left
+    Wout[0, SAFE_RIGHT] = safety_gain_right
+
+    # Unsigned front-block: fires when the two centre proximity taps
+    # exceed front_thr and drives a fixed-direction (CCW) escape turn.
+    Win[FB, C1_idx]   = 1.0
+    Win[FB, C2_idx]   = 1.0
+    Win[FB, bias_idx] = -front_thr
+
+    Wout[0, FB] = front_gain_mag
+
+    # Mirror Wout into W[DTHETA, :] so dtheta(t+1) = clip(O(t), ±step_a).
+    for j in range(n):
+        if Wout[0, j] != 0.0:
+            W[DTHETA, j] = Wout[0, j]
+
+    f = make_activation(a, idx)
+    model = Win, W, Wout, warmup, leak, f, g
+    yield model
+
+
+if __name__ == "__main__":
+    import time
+    from challenge_3 import evaluate, train
+
+    seed = 12345
+    np.random.seed(seed)
+    print("Training bio player for env3...")
+    model = train(bio_player, timeout=100)
+
+    start_time = time.time()
+    score, std = evaluate(model, Bot, Environment, debug=False, seed=seed)
+    elapsed = time.time() - start_time
+    print(f"Evaluation completed after {elapsed:.2f} seconds")
+    print(f"Final score (distance): {score:.2f} +/- {std:.2f}")