// WORK

OMNI / PERSPECTIVE v2

SPEC / IN PROGRESS

RESEARCH IN PROGRESS — projections, not measurements

Architecture research targeting a 1.05T-parameter sparse MoE on consumer hardware.

// Problem

A 1.05T-parameter model does not fit in 4 GB of VRAM and 32 GB of RAM; dense deployment fails that envelope outright. OMNI treats parameter count as a delivery-and-scheduling problem rather than a VRAM-residency problem. It is architecture research with an in-progress implementation — not a running model, and the repo says so in its own maturity table.

// Constraints

The target envelope is consumer hardware: 4 GB VRAM plus 32 GB RAM, with roughly 208 GB of ternary expert weights streamed from NVMe (~253 GB with deltas). Decode throughput of ~10–11 tok/s is projected by a bandwidth model, not measured. The inference pipeline is intentionally fail-fast: process_token returns an error by design until the real execution path exists.

// Architecture

128 ternary {-1,0,+1} experts arranged on an 8×4×4 lattice over a 3D torus, top-1 routed — 14.95B active parameters per token out of 1.05T total. 80 layers: 60 of O(1) perspective-decay recurrence plus 20 of windowed grouped-query attention. Layer-streamed execution with double-buffered load/compute overlap, holographic distributed memory, forward-mode adaptation without backprop graph storage, and a safety polytope that hard-projects outputs into a convex safe region. 243 tests pass; the runtime is honest about what doesn't run yet.

// WAR STORIES

Everything about it is hard, including the math. That's the cost of a path others haven't fully taken — it's still an idea, despite the engineering around it, and every step is a challenge because nobody has walked this exact route to completion.
Designing for 4 GB before proving anything can train was deliberate: 4 GB is the machine I own. Ternary weights, streaming, the whole architecture — every aspect is shaped around my current hardware and current capability. The next machine will be better, and the design scales with it. It only goes up from here.
What would I do differently? Nothing yet. OMNI isn't real right now — the repo says so — and I'm still learning; every step is something new. Ask again when process_token stops returning an error by design.

// RESULTS

Metric	Value
Tests passing	243
Parameters	1.05T (design target)projected
Active parameters per token	14.95B (top-1 routing)projected
Expert lattice	8×4×4 on 3D torus
Decode throughput	~10–11 tok/s (bandwidth model)projected

// STACK

Rust