// WORK
OMNI / PERSPECTIVE v2
SPEC / IN PROGRESSRESEARCH IN PROGRESS — projections, not measurements
Architecture research targeting a 1.05T-parameter sparse MoE on consumer hardware.
// Problem
A 1.05T-parameter model does not fit in 4 GB of VRAM and 32 GB of RAM; dense deployment fails that envelope outright. OMNI treats parameter count as a delivery-and-scheduling problem rather than a VRAM-residency problem. It is architecture research with an in-progress implementation — not a running model, and the repo says so in its own maturity table.
// Constraints
The target envelope is consumer hardware: 4 GB VRAM plus 32 GB RAM, with roughly 208 GB of ternary expert weights streamed from NVMe (~253 GB with deltas). Decode throughput of ~10–11 tok/s is projected by a bandwidth model, not measured. The inference pipeline is intentionally fail-fast: process_token returns an error by design until the real execution path exists.
// Architecture
128 ternary {-1,0,+1} experts arranged on an 8×4×4 lattice over a 3D torus, top-1 routed — 14.95B active parameters per token out of 1.05T total. 80 layers: 60 of O(1) perspective-decay recurrence plus 20 of windowed grouped-query attention. Layer-streamed execution with double-buffered load/compute overlap, holographic distributed memory, forward-mode adaptation without backprop graph storage, and a safety polytope that hard-projects outputs into a convex safe region. 243 tests pass; the runtime is honest about what doesn't run yet.
// WAR STORIES
- Everything about it is hard, including the math. That's the cost of a path others haven't fully taken — it's still an idea, despite the engineering around it, and every step is a challenge because nobody has walked this exact route to completion.
- Designing for 4 GB before proving anything can train was deliberate: 4 GB is the machine I own. Ternary weights, streaming, the whole architecture — every aspect is shaped around my current hardware and current capability. The next machine will be better, and the design scales with it. It only goes up from here.
- What would I do differently? Nothing yet. OMNI isn't real right now — the repo says so — and I'm still learning; every step is something new. Ask again when process_token stops returning an error by design.
// RESULTS
| Metric | Value |
|---|---|
| Tests passing | 243 |
| Parameters | 1.05T (design target)projected |
| Active parameters per token | 14.95B (top-1 routing)projected |
| Expert lattice | 8×4×4 on 3D torus |
| Decode throughput | ~10–11 tok/s (bandwidth model)projected |
// STACK
- Rust