Low-cost eye tracking on a standard webcam with simple calibration, monitor & camera selection, and a transparent on-screen overlay. Built on Google MediaPipe Face Mesh with a compact, mathematically grounded pipeline (eye-contour PCA → per-axis normalization → ridge regression) for real-time inference.
Purpose: make eye-based communication more accessible for people with limited mobility—using only a lower cost home webcam.
- Google MediaPipe (Face Mesh with iris) — robust, cross-platform landmarking.
- Dead-simple calibration — pick rows/cols, per-point dwell, and delay; targets auto-advance.
- Device selection in UI — choose the display used for overlay/targets and the webcam.
- Mathematical pipeline — eye-contour PCA, anisotropic normalization (separate scales for ± directions), optional eye patches (CLAHE + z-norm), fast ridge regression (dual form, un-regularized intercept), and OneEuro + EMA smoothing.
- Clean artifacts — models saved as
YYYYMMDD_HHMMSS_Grid{R}x{C}_Patch{W}x{H}.pkl; datasets as.npz.
- OS: Ubuntu 22.04
- Camera tested: Logitech C920 Pro (others should work)
python -m venv .venv
source .venv/bin/activate # (Windows: .venv\Scripts\activate)
pip install --upgrade pip
pip install -r requirements.txtpython main.py- A Control Panel window appears (always on top).
- Pick Target/Overlay monitor and Webcam, set grid & timing, then press Start Calibration.
- After calibration, the red dot shows your live gaze on the chosen monitor.
- Target/Overlay monitor — which display shows the black calibration background & orange targets (and the final red gaze dot).
- Webcam — active camera device (switches live).
- Rows / Columns — grid for target positions (serpentine order).
- Per-point (sec) — dwell time per target.
- Delay (sec) — time to wait after the target moves before sampling starts (prevents early, off-target frames).
- Start Calibration — begins the target sequence and data capture.
- Stop Calibration — aborts the sequence (no model save).
- Load Model (.npz/.pkl) — load a previously saved model.
- Hide/Show Overlay — toggles the transparent overlay.
- Quit — exits the app.
- Iris centers / Iris 4-edges — yellow markers for each iris.
- Eye axes (fixed length; û, v̂) — principal axes from PCA, constant length for reference.
- Eye axes (eye scaled length; s_u, s_v) — axes scaled by eye geometry.
- u, v vectors / u, v vectors (bigger) — shows current normalized offsets; the latter uses a gain.
- Eye contour points / edges — raw eye polygon points and wireframe.
- Eye patch ROI boxes / Eye patch thumbnails — oriented crop boxes and zoomed mini-patches (L/R) for debugging.
- Height = Width × — vertical half-size as a ratio of horizontal half-size (keeps aspect).
- Width scale (û RMS → half_w) — scales ROI width from the eye’s horizontal spread.
- Patch width (px) / Patch height (px) — resolution of the extracted patches (affects feature dimension).
- OneEuro mincutoff / beta / dcutoff — jitter vs. responsiveness trade-off.
- EMA α — exponential moving average weight (higher = smoother, slower).
-
Pick devices In the Control Panel, choose the Target/Overlay monitor and Webcam.
-
Set grid & timing
- Rows / Columns: target layout (serpentine order).
- Per-point (sec): how long to dwell on each target.
- Delay (sec): wait time after the target moves before sampling starts.
-
Start Click Start Calibration. An orange ring appears on a black screen. After the delay, it turns into a filled dot—that’s when data is collected. Keep your eyes on the dot until it jumps to the next location.
-
Finish After the last point, the model is trained and saved automatically (e.g.,
YYYYMMDD_HHMMSS_Grid{R}x{C}_Patch{W}x{H}.pkl). The overlay switches to a red dot showing live gaze. -
Controls
- Stop Calibration: aborts the sequence.
- Keyboard (preview window):
cstart,sstop,ooverlay toggle,q/ESCquit.
Tips: keep Delay < Per-point, hold head steady, ensure even lighting, and avoid moving windows between monitors during calibration.
Using MediaPipe Face Mesh, we read dense facial landmarks, including iris points.
For each eye we gather contour points
Compute the eye centroid
- û (ax1): major principal direction
- v̂ (ax2): minor principal direction
To avoid visual flips when head pitch changes, we fix a sign convention per frame:
- force v̂ to point downward (image
$+y$ ) - enforce a right-handed frame (if
$\det[\hat u,\hat v]<0$ , flip$\hat u$ )
This makes patch warping and thumbnails temporally stable.
Let the iris center be
From the eye contour we estimate separate RMS scales for the positive and negative sides along each axis:
where
We then normalize piecewise
This captures eyelid asymmetry and improves vertical sensitivity.
We crop an oriented ROI around each eye using the axes:
- Horizontal half-size:
$\text{half}_w = \max(s_u^+, s_u^-)\times \text{scale}_w$ - Vertical half-size: $\text{half}_h = \text{half}w \times \text{ratio}{h\leftarrow w}$
We build an oriented rectangle from patch_w × patch_h.
Preprocessing: grayscale → CLAHE → flatten → z-normalize to a 1-D vector.
Concatenate:
-
12-D geometric:
$[u_L, v_L, u_R, v_R]$ plus quadratic/cross terms - Left patch vector + Right patch vector
We solve ridge in dual form on centered variables and recover the intercept:
This matches a primal ridge with no penalty on
Final 2-D gaze is filtered with OneEuro and EMA to reduce jitter while remaining responsive.
- models →
models/YYYYMMDD_HHMMSS_Grid{R}x{C}_Patch{W}x{H}.pklIncludesW,b, target screen size, and feature names. - data →
data/gaze_samples_YYYYMMDD_HHMMSS.npzFeature matrixX, labelsY, per-target index, timestamps, and calibration meta.
- Keep lighting even and frontal for stable iris/contours.
- Start with moderate patch sizes (e.g.,
40×40) and adjustWidth scale+Height ratiofor your camera/face distance. - If you switch monitors during calibration, the sequence restarts to keep coordinates consistent.
- Preview mirroring affects display only (not the learned model).
Apache License 2.0