<- /work

// WORK

OMNI / PERSPECTIVE v2

SPEC / IN PROGRESS

RESEARCH IN PROGRESS — projections, not measurements

Architecture research targeting a 1.05T-parameter sparse MoE on consumer hardware.

// Problem

A 1.05T-parameter model does not fit in 4 GB of VRAM and 32 GB of RAM; dense deployment fails that envelope outright. OMNI treats parameter count as a delivery-and-scheduling problem rather than a VRAM-residency problem. It is architecture research with an in-progress implementation — not a running model, and the repo says so in its own maturity table.

// Constraints

The target envelope is consumer hardware: 4 GB VRAM plus 32 GB RAM, with roughly 208 GB of ternary expert weights streamed from NVMe (~253 GB with deltas). Decode throughput of ~10–11 tok/s is projected by a bandwidth model, not measured. The inference pipeline is intentionally fail-fast: process_token returns an error by design until the real execution path exists.

// Architecture

128 ternary {-1,0,+1} experts arranged on an 8×4×4 lattice over a 3D torus, top-1 routed — 14.95B active parameters per token out of 1.05T total. 80 layers: 60 of O(1) perspective-decay recurrence plus 20 of windowed grouped-query attention. Layer-streamed execution with double-buffered load/compute overlap, holographic distributed memory, forward-mode adaptation without backprop graph storage, and a safety polytope that hard-projects outputs into a convex safe region. 243 tests pass; the runtime is honest about what doesn't run yet.

// WAR STORIES

  • Everything about it is hard, including the math. That's the cost of a path others haven't fully taken — it's still an idea, despite the engineering around it, and every step is a challenge because nobody has walked this exact route to completion.
  • Designing for 4 GB before proving anything can train was deliberate: 4 GB is the machine I own. Ternary weights, streaming, the whole architecture — every aspect is shaped around my current hardware and current capability. The next machine will be better, and the design scales with it. It only goes up from here.
  • What would I do differently? Nothing yet. OMNI isn't real right now — the repo says so — and I'm still learning; every step is something new. Ask again when process_token stops returning an error by design.

// RESULTS

MetricValue
Tests passing243
Parameters1.05T (design target)projected
Active parameters per token14.95B (top-1 routing)projected
Expert lattice8×4×4 on 3D torus
Decode throughput~10–11 tok/s (bandwidth model)projected

// STACK

  • Rust