Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 464 Bytes

File metadata and controls

13 lines (10 loc) · 464 Bytes

Autoresearch Ideas — 1-Bit Inference Engine

COMPLETED — 135 experiments across 8 sessions

All optimization paths exhausted. Engine at 74.5% of M3 Pro hardware peak. No further improvement possible without different hardware or algorithm.

Final numbers (Apple M3 Pro, CPU only)

  • Matmul: 44µs = 95 GOPS = 84x from baseline
  • Transformer: ~125 tok/sec on 472M params
  • Weight compression: 16x
  • vs Apple BLAS: 1.7x faster
  • Code: 1,596 lines of C