| Record | Model | Params | Size | Speed | Response |
|---|---|---|---|---|---|
| 1st | Athene-v2 (72B) | 72B | 44.5 GB | 0.21 tok/s | "Hello! It's great" |
| 2nd | Llama 4 Scout (109B MoE) | 109B | 57.3 GB | 0.20 tok/s | "Four." |
| 3rd | Mixtral 8x22B (141B MoE) | 141B | 63.1 GB | 0.027 tok/s | "Hi there!" |
| RECORD | Qwen3-235B-A22B | 235B | 79.8 GB | ~0.14 tok/s | "Okay, the user wants me to say hello..." |
Also proven: Qwen3-Coder 30B at 7.5 tok/s (interactive speed, daily use)
Mixtral 8x22B (141B MoE, 63.1 GB Q3_K_M) loaded and generated across the mesh.
| Metric | Value | Notes |
|---|---|---|
| Model | Mixtral 8x22B (141B MoE) | Q3_K_M quantization, distributed across mesh |
| Total Parameters | 141,000,000,000 | 57 layers, 8 experts × 22B each |
| Model Size | 63.1 GB | Distributed across 2 machines, 3 memory pools |
| Prompt Speed | 0.20 tok/s | 31 tokens in 153 seconds |
| Generation Speed | 0.027 tok/s | 3 tokens in 110 seconds (~37 sec/token) |
| Total Time | ~4.4 minutes | Swap thrashing (47 GB on 40 GB Mac RAM) |
| Output | "Hi there!" CORRECT | |
| Status | PROVEN 141B on consumer hardware | |
Qwen3-235B-A22B (235B MoE, 79.8 GB Q2_K) loaded and generated across the mesh using the proprietary Dual-RPC Breakthrough — a 4-way tensor distribution configuration.
| Metric | Value | Notes |
|---|---|---|
| Model | Qwen3-235B-A22B (MoE) | Q2_K quantization, 22B active per token |
| Total Parameters | 235,000,000,000 | 95 layers, MoE architecture |
| Model Size | 79.8 GB | Distributed across 4 memory pools on 2 machines |
| Generation Speed | ~0.14 tok/s | ~7 sec/token — 5x faster than 141B dense |
| Output | "Okay, the user wants me to say hello..." REASONING | |
| Key Innovation | Dual-RPC: Proprietary 4-way tensor distribution — 5x faster than standard configuration | |
| Status | RECORD Largest model ever on this mesh — ABSOLUTE RECORD | |
A proprietary configuration that expands tensor distribution from three memory pools to four, eliminating swap thrashing and delivering a 5x speedup on the same hardware.
At Q4_K_M quantization (4 bits per weight), every gigabyte of memory holds approximately 1.6 billion neural network parameters. This ratio is the foundation of the scaling math.
| Model | Parameters | Size (Q4) | Params/GB |
|---|---|---|---|
| Qwen3:8B | 8B | 4.9 GB | 1.63B/GB |
| Qwen3-coder:30B | 30B | 17.3 GB | 1.73B/GB |
| Athene-v2 | 72B | 44.2 GB | 1.63B/GB |
| Llama 4 Scout | 109B | 62.8 GB | 1.74B/GB |
| Mixtral 8x7B | 47B | 24.6 GB | 1.91B/GB |
| Tier | Memory Pool | Max Model (Q4) |
|---|---|---|
| CURRENT | 88 GB | ~140B params |
| NEXT | 184 GB | ~300B params |
| Node | Role | RAM | GPU VRAM | Pool Contribution | Operations |
|---|---|---|---|---|---|
| Mac Orchestrator | Memory Master • Routing | 40 GB | 4 GB | 40 GB | OFFLINE |
| i7 GPU Rig | GPU Workhorse • Vulkan | 128 GB | 16 GB | 144 GB | OFFLINE |
| i7 Office | CPU Overflow • Backup | 64 GB | 16 GB | 80 GB | OFFLINE |
| Wrecking Crew #1 | Distributed Compute | 128 GB | 16 GB | 144 GB | Autonomous • Check-in Prompts |
| Wrecking Crew #2 | Distributed Compute | 128 GB | 16 GB | 144 GB | Autonomous • Check-in Prompts |
| Wrecking Crew #3 | Distributed Compute | 128 GB | 16 GB | 144 GB | Autonomous • Check-in Prompts |
| Wrecking Crew #4 | Distributed Compute | 128 GB | 16 GB | 144 GB | Autonomous • Check-in Prompts |
| Wrecking Crew #5 | Distributed Compute | 128 GB | 16 GB | 144 GB | Autonomous • Check-in Prompts |
| Wrecking Crew #6 | Distributed Compute | 128 GB | 16 GB | 144 GB | Autonomous • Check-in Prompts |
| TOTAL WRECKING CREW | 1,000 GB | 132 GB | 1,132 GB | ||
What we proved on April 19, 2026 isn't just a benchmark. It's a proof of concept for the democratization of frontier AI.
Right now, the ability to run models above 70B parameters is controlled by a handful of companies with billion-dollar GPU clusters. Access is mediated through APIs, subscriptions, and terms of service. Your data flows through their servers. Your usage is metered, monitored, and monetized.
Gold Road breaks that model.
The RPC mesh protocol doesn't care where the memory lives. A bedroom. A garage. A school. A fire station. A government office. Every machine that joins the mesh adds linear capacity. There is no architectural ceiling. The same protocol that connected two desktops tonight connects two thousand.
A family's gaming PCs and old laptops become a private AI cluster. Medical questions, homework help, financial analysis — all running locally. No data leaves the house. No subscription required. The AI belongs to the family.
50 homes contributing one node each = 7,200 GB = 11.5 trillion parameters. A neighborhood running models that exceed GPT-4's parameter count. Community-owned, community-governed AI infrastructure.
10,000 consumer nodes across government offices = 1.4 petabytes = 2.2 quadrillion parameters. Sovereign AI capability that no sanctions can touch, no API can revoke, no foreign corporation controls. Total cost: $6.6M — less than a single fighter jet.
AGI shouldn't be a product you subscribe to. It should be infrastructure you own. Like electricity, like water, like the internet itself. Distributed meshes of consumer hardware make frontier AI a public utility, not a private monopoly. Gold Road is the protocol. The mesh is the proof.
| Scale | Nodes | Memory | Parameters | Equivalent To |
|---|---|---|---|---|
| PROVEN | 2 | 88 GB | 140 B | Llama 3 70B class |
| NEXT | 3 | 264 GB | 422 B | Llama 3 405B class |
| Community (50 homes) | 50 | 7,200 GB | 11.5 T | Beyond any public model |
| Municipal (500 nodes) | 500 | 72 TB | 115 T | Sovereign city-scale AI |
| National (10,000) | 10,000 | 1.4 PB | 2,240 T | Sovereign AGI infrastructure |
| Time | 235B Call | Actual | Grade |
|---|---|---|---|
| Pre-market (8:00 AM) | ES BEARISH, gap down, SHORT at open | ES opened down | CORRECT |
| Pre-market | ES target low: 7140 | Actual low: 7089 | EXCEEDED |
| 9:12 AM update | Flipped BULLISH (signals flipped) | V-bounce occurred | CORRECT |
| 10:32 AM update | ES 7085 by 2 PM, morning low BREAKS | ES hit 7089 by 10:30 AM | CORRECT (early) |
| 10:32 AM | SHORT ES at 7125, buy puts | War Room confirmed 95% GO SHORT | CONFIRMED |
| Signal | Call | Result | Grade |
|---|---|---|---|
| IHS_BREAKOUT | Neckline 7160.5 | Price used as support — EXACT | A+ |
| IHS_BREAKOUT | Target 7164 | Hit + 7.75 pt overshoot to 7182.5 | A+ |
| SPREAD BWB BULL 88% | 8:31 AM push up | +8.25 pts in 15 min | A+ |
| SPREAD Iron Condor | 8:53 AM topping range | +3.0 then reversed — called the top | A+ |
| SPREAD BWB BEAR | 9:19 AM selloff | Waterfall began 40 min later | A+ |
| KILL_CHAIN bearish | 9:48 AM at 7150.75 | −25.75 pts in 15 min to 7125 | A+ |
| KILL_CHAIN bounce | 10:03 AM at 7125 | +8 pts bounce to 7133 | A+ |
| FRACTAL_SCALE (4 targets) | 7159→7151→7147→7143 | All 4 hit AND exceeded — multi-fractal cascade | A+ |
| Bear Call BEAR | 9:59 AM at 7094 | Fired AT the low, not before | B — late |
| KILL_CHAIN bullish | 9:08-9:31 stayed bullish | ES dropped 7169→7164 — missed rollover top | C — 20 min late |
| Time | Test | Speed | Result |
|---|---|---|---|
| 12:27 AM | Late Night Signals | 0.350 tok/s | Parsed GC bearish, NQ neutral correctly |
| 12:39 AM | Overnight Futures | 0.461 tok/s | Identified ES resistance levels |
| 2:07 AM | Asia Session | 0.451 tok/s | Nikkei/Shanghai correlation analysis |
| 4:05 AM | Europe Open | 0.592 tok/s | DAX/FTSE impact on US futures — FASTEST 235B ever |
| 6:13 AM | Pre-Market Setup | 0.378 tok/s | Parsed 10 live signals, identified bull/bear conflict |
Two independent AI systems analyzed the same market simultaneously:
| System | Models | Call | Confidence |
|---|---|---|---|
| War Room (6 agents) | qwen3:8b fleet | GO SHORT | 95% unanimous |
| 235B Mesh | Qwen3-235B-A22B | SHORT ES, buy puts | Morning low will BREAK |
| OPEX Signals | Signal Pipeline (51 types) | BEARISH confluence | Multi-signal confluence confirmed |
The Project Colossus mesh isn't just for trading. Now distributed far beyond two local machines, the architecture is a universal nervous system that applies to any domain requiring distributed AI:
A family's gaming PCs, old laptops, and NAS pool memory into a private AI brain. Medical questions answered locally. Kids' homework help with no data leaving the house. 235B reasoning accessible from any phone on the WiFi. Voice assistants that actually understand context because Gold Road remembers every conversation.
Multiple compute units inside a single robot body form an internal mesh — Head (vision + reasoning), Spine (coordination), Limbs (reflexes). The robot's nervous system uses the same distributed protocol. Touch hot = arm pulls back instantly (local reflex, no brain round trip). "Pick up the cup" = brain plans, spine coordinates, hand adjusts grip. 256 GB internal pool = 409B parameters in one robot body.
8B on-device for millisecond reflexes (braking, steering, lane keep). 235B on home mesh via 5G for deep reasoning (edge cases, route planning, "what does this construction worker's wave mean?"). Gold Road remembers: "this intersection was icy last winter." The car has fast local brain AND deep home brain. <5ms local latency vs 100-500ms cloud.
Autonomous legal department: deadline tracking across 8+ cases, auto-drafting responses via mesh 30B, case law research via 235B, Legal War Room debates (prosecutor vs devil's advocate vs simulated judge). Filing monitor watches for new court documents, auto-analyzes, calculates deadlines, alerts before anything is due. Gold Road persists case strategy across sessions.
During market hours, a vision AI model analyzes the live chart with 4 parallel AI passes:
Vision results are cross-validated against algorithmic pattern recognition before feeding into the War Room + 235B analysis pipeline.
| Topology | Dual RPC + Vulkan |
| Build | latest |
| Parameters | 235 BILLION |
| GPU Layers | 95/95 (all offloaded) |
| Vulkan VRAM | 13.1 GB |
| i7 Local RPC | 26.1 GB |
| Mac RPC RAM | 40.4 GB |
| Generation | ~0.14 tok/s |
| Per Token | ~7 seconds |
| Topology | Single RPC + Vulkan |
| Parameters | 141 billion |
| Vulkan VRAM | 15.8 GB |
| Mac RPC RAM | 47.2 GB |
| Generation | 0.027 tok/s |
| Per Token | ~37 seconds |
| Response | "Hi there!" |
| Topology | i7 Server + Mac RPC |
| Build | latest (latest) |
| Parameters | 109 BILLION |
| GPU Layers | 49/49 (all offloaded) |
| Vulkan VRAM | 15.0 GB |
| Mac RPC RAM | 43.1 GB |
| i7 CPU RAM | 0.5 GB |
| Prompt | 0.31 tok/s |
| Generation | 0.20 tok/s |
| Total Time | 86 seconds |
| Topology | i7 Server + Mac RPC |
| Build | latest (latest) |
| GPU Layers | 81/81 (all offloaded) |
| Vulkan VRAM | 12.7 GB |
| Mac RPC RAM | 31.8 GB |
| Prompt | 0.28 tok/s |
| Generation | 0.21 tok/s |
| Per Token | ~4.8 seconds |
| Load Time | ~8 minutes |
| Topology | i7 Server + Mac RPC |
| Build | initial (old) |
| GPU Layers | 81/81 (all offloaded) |
| Vulkan VRAM | 11.6 GB |
| Mac RPC RAM | 32.9 GB |
| Prompt | 0.4 tok/s |
| Generation | 0.014 tok/s |
| Per Token | ~73 seconds |
| Load Time | ~8 minutes |
| Topology | i7 solo, partial GPU offload |
| Build | initial |
| GPU Layers | 28/81 |
| Vulkan VRAM | 15.9 GB |
| i7 CPU RAM | 29.3 GB |
| Prompt | 0.9 tok/s |
| Generation | 0.06 tok/s |
| Per Token | ~16.7 seconds |
| Load Time | ~25 seconds |
Just as Bitcoin mining pools let anyone contribute hash power to collectively solve blocks no single machine could, memory pooling lets anyone contribute RAM to collectively run AI models no single machine could hold.
The current AI industry runs a brutal cycle:
Raise $10B → Build data center → Train model 6 months → Stop → Deploy static → Model goes stale → Raise more money → Build bigger center → Train new model → Kill old one → Repeat forever
Each generation costs MORE. GPT-4: $100M. GPT-5: $500M+. Only 3-4 companies on Earth can play.
This is how humans work. Your neural architecture doesn't change after age 25. But you get smarter every year because you accumulate knowledge and experience on top of a fixed reasoning substrate. Gold Road gives AI the same capability.
| Innovation | Status | Description |
|---|---|---|
| Gold Road Memory Mesh | NOVEL | Persistent distributed AI memory layer. Multi-node knowledge sharing with automatic replication. Proprietary protocol. |
| Unified Memory Pool | NOVEL | All node memory (CPU RAM + GPU VRAM) presented as one addressable space. Single query searches the entire mesh. |
| GPU Mesh Compute | ENGINEERING | Consumer GPUs serving model layers through the distributed mesh. Cross-platform, cross-vendor. |
| Full Orchestration Stack | INTEGRATION | Remote node control, automated deployment, 617-agent command center, Colossus dashboard — all self-built. |
| llama.cpp RPC Protocol | UPSTREAM | Tensor splitting protocol by ggml-org. We build on it, didn't invent it. The pipe is theirs — what flows through it is ours. |