Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions braincraft/env3_player_bio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Bio Player — Environment 3

## 1. Overview

`env3_player_bio.py` is a pointwise-activation Echo State Network
controller for Environment 3:

```text
X(t+1) = f(Win @ I(t) + W @ X(t))
O(t+1) = Wout @ g(X(t+1)) (g = identity)
```

Every hidden activation is a scalar function of its own preactivation;
all cross-neuron logic lives in the connectivity matrices. The model
is produced by a single `yield` in `bio_player()`, so the matrices are
fixed at build time (no iterative training).

Env3 exposes colour (the sources are coloured), but this controller
does not read colour or energy. The bot runs around the outer
corridor with a reflex wall-follower and picks up whatever source it
happens to cross. Only 7 of the 1000 hidden slots receive any
incoming weight; the rest are dead.

## 2. Network shape

| Parameter | Value |
| ------------- | ------------------------ |
| `n` | `1000` |
| `p` | `64` (camera rays) |
| `n_inputs` | `2*p + 3 = 131` |
| `warmup` | `0` |
| `leak` (λ) | `1.0` |
| `g` | identity |
| Actuator clip | `step_a = 5°` |

Module constants: `front_gain_mag = 20°`, `step_a = 5°`.

## 3. Activations

| Name | Formula | Used by |
| ----------- | --------------------------- | -------------------------------------- |
| `relu_tanh` | `max(0, tanh(z))` | reflex channels (slots 0..4), front-block |
| `clip_a` | `clip(z, -step_a, +step_a)` | `dtheta` |

## 4. Inputs and slot layout

```text
I(t) = [prox[0..63](t), colour[0..63](t), hit(t), energy(t), 1]
```

Taps used by the controller (colour and energy columns are unread):

```text
L_idx = 20 (left reflex proximity tap)
R_idx = 43 (right reflex proximity tap)
left_side_idx = 11 (left safety tap)
right_side_idx = 52 (right safety tap)
C1_idx, C2_idx = 31, 32 (centre-front proximity taps)
hit_idx = 128 (= 2*p)
bias_idx = 130 (= 2*p + 2)
```

Hidden slots (`n = 1000`; slots `7..999` are dead):

| Slot | Name | Activation | Role |
| ---- | ------------- | ----------- | --------------------------------- |
| 0 | `hit_feat` | `relu_tanh` | hit reflex |
| 1 | `prox_left` | `relu_tanh` | left proximity reflex |
| 2 | `prox_right` | `relu_tanh` | right proximity reflex |
| 3 | `safe_left` | `relu_tanh` | left safety feature |
| 4 | `safe_right` | `relu_tanh` | right safety feature |
| 5 | `dtheta` | `clip_a` | one-step-lagged steering command |
| 6 | `front_block` | `relu_tanh` | unsigned front-block detector |

## 5. Circuits

### 5.1 Reflex features and readout

Five feed-forward proximity/hit detectors:

```text
hit_feat(t+1) = relu_tanh(hit(t))
prox_left(t+1) = relu_tanh(prox[L_idx](t))
prox_right(t+1) = relu_tanh(prox[R_idx](t))
safe_left(t+1) = relu_tanh(-prox[left_side_idx](t) + 0.75)
safe_right(t+1) = relu_tanh(-prox[right_side_idx](t) + 0.75)
```

Steering readout:

```text
O(t+1) = hit_turn * hit_feat(t+1)
+ heading_gain * prox_left(t+1)
- heading_gain * prox_right(t+1)
+ safety_gain_left * safe_left(t+1)
+ safety_gain_right * safe_right(t+1)
+ front_gain_mag * front_block(t+1)
```

with

```text
hit_turn = -10° / tanh(1)
heading_gain = -40°
safety_gain_left = -20°
safety_gain_right = +20°
front_gain_mag = +20°
```

`dtheta` holds the clipped one-step-lagged command,
`dtheta(t+1) = clip(O(t), ±step_a)`, implemented by mirroring the
`Wout` row into `W[dtheta, :]`.

### 5.2 Front block

Unsigned sum of the two centre proximity taps:

```text
front_block(t+1) = relu_tanh(prox[C1_idx](t) + prox[C2_idx](t) - 1.4)
```

A positive reading turns the bot by `+20°` (CCW) — a fixed-direction
escape that keeps the bot on the outer corridor.

## 6. Nonzero readout weights

```text
Wout[hit_feat] = -10° / tanh(1)
Wout[prox_left] = -40°
Wout[prox_right] = +40°
Wout[safe_left] = -20°
Wout[safe_right] = +20°
Wout[front_block] = +20°
```

Six nonzero entries total. The same row is mirrored into
`W[dtheta, :]`.

## 7. Verification

```bash
python braincraft/env3_player_bio.py
```

Runs `train(bio_player, timeout=100)` then
`evaluate(model, Bot, Environment, debug=False, seed=12345)` over 10
episodes:

```text
Final score (distance): 14.40 +/- 0.49
```

500-seed sweep (`validate_env3_player_bio.py`, seeds 0..499):
across-seed mean `14.50 ± 0.17`, min `14.00`, `0/500` seeds below
`13.50`.
156 changes: 156 additions & 0 deletions braincraft/env3_player_bio.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Braincraft challenge — Bio Player for Environment 3
# Copyright (C) 2026 Guanchun Li
# Released under the GNU General Public License 3

"""
Bio Player for Environment 3.

Pointwise-activation Echo State Network controller:

X(t+1) = f(Win @ I(t) + W @ X(t))
O(t+1) = Wout @ g(X(t+1)) (g = identity)

Every hidden activation depends only on its own preactivation; all
cross-neuron logic is carried by the connectivity matrices.

Input (131 cols): I(t) = [prox[0..63](t), colour[0..63](t),
hit(t), energy(t), 1].
Env3 exposes colour, but this controller does not read it — the bot
runs around the outer corridor with a reflex wall-follower and picks
up whatever source it happens to cross.

Seven hidden slots:

0..4 reflex features (hit, proximity, safety)
5 dtheta (clipped one-step-lagged steering command)
6 unsigned front-block escape channel
"""

import numpy as np

if not hasattr(np, "atan2"):
np.atan2 = np.arctan2

from bot import Bot
from environment_3 import Environment


front_gain_mag = np.radians(20.0)
step_a = np.radians(5.0) # actuator clip (±5°)


def _bio_indices():
idx = {
"hit_feat": 0,
"prox_left": 1,
"prox_right": 2,
"safe_left": 3,
"safe_right": 4,
"dtheta": 5,
"front_block": 6,
}
idx["bio_end"] = 7
return idx


def make_activation(a, idx):
"""Per-neuron pointwise activation: clip for dtheta, relu_tanh elsewhere."""
def f(x):
out = np.maximum(0.0, np.tanh(x))
out[idx["dtheta"], 0] = float(np.clip(x[idx["dtheta"], 0], -a, a))
return out

return f


def bio_player():
"""Build the env3 bio controller and yield a single frozen model."""

bot = Bot()
n = 1000
p = bot.camera.resolution # 64
warmup = 0
leak = 1.0
g = lambda x: x

# Env3 feeds I = [depths, colours, hit, energy, 1]. Colour and energy
# columns are unread but the hit/bias indices sit at the full 2p+3
# offsets.
n_inputs = 2 * p + 3 # 131
Win = np.zeros((n, n_inputs))
W = np.zeros((n, n))
Wout = np.zeros((1, n))

hit_idx = 2 * p # 128
bias_idx = 2 * p + 2 # 130

idx = _bio_indices()
a = step_a

HIT_FEAT = idx["hit_feat"]
PROX_LEFT = idx["prox_left"]
PROX_RIGHT = idx["prox_right"]
SAFE_LEFT = idx["safe_left"]
SAFE_RIGHT = idx["safe_right"]
DTHETA = idx["dtheta"]
FB = idx["front_block"]

L_idx, R_idx = 20, 43 # reflex proximity taps
left_side_idx, right_side_idx = 11, 52 # safety taps
C1_idx, C2_idx = 31, 32 # centre-front proximity taps
front_thr = 1.4

TANH1 = np.tanh(1.0)
hit_turn = np.radians(-10.0) / TANH1
heading_gain = np.radians(-40.0)
safety_gain_left = np.radians(-20.0)
safety_gain_right = -safety_gain_left
safety_target = 0.75

# Reflex features and steering readout.
Win[HIT_FEAT, hit_idx] = 1.0
Win[PROX_LEFT, L_idx] = 1.0
Win[PROX_RIGHT, R_idx] = 1.0
Win[SAFE_LEFT, left_side_idx] = -1.0
Win[SAFE_RIGHT, right_side_idx] = -1.0
Win[SAFE_LEFT, bias_idx] = safety_target
Win[SAFE_RIGHT, bias_idx] = safety_target

Wout[0, HIT_FEAT] = hit_turn
Wout[0, PROX_LEFT] = heading_gain
Wout[0, PROX_RIGHT] = -heading_gain
Wout[0, SAFE_LEFT] = safety_gain_left
Wout[0, SAFE_RIGHT] = safety_gain_right

# Unsigned front-block: fires when the two centre proximity taps
# exceed front_thr and drives a fixed-direction (CCW) escape turn.
Win[FB, C1_idx] = 1.0
Win[FB, C2_idx] = 1.0
Win[FB, bias_idx] = -front_thr

Wout[0, FB] = front_gain_mag

# Mirror Wout into W[DTHETA, :] so dtheta(t+1) = clip(O(t), ±step_a).
for j in range(n):
if Wout[0, j] != 0.0:
W[DTHETA, j] = Wout[0, j]

f = make_activation(a, idx)
model = Win, W, Wout, warmup, leak, f, g
yield model


if __name__ == "__main__":
import time
from challenge_3 import evaluate, train

seed = 12345
np.random.seed(seed)
print("Training bio player for env3...")
model = train(bio_player, timeout=100)

start_time = time.time()
score, std = evaluate(model, Bot, Environment, debug=False, seed=seed)
elapsed = time.time() - start_time
print(f"Evaluation completed after {elapsed:.2f} seconds")
print(f"Final score (distance): {score:.2f} +/- {std:.2f}")