INTERNAL ENGINEERING DOCUMENT

Jeremy — The Evolution

A one-page history of how the stack changed, and why.

SCOPE v1.0 (Jan 2026) → v1.2 (Apr 2026)
AUTHOR Pro Se Network Engineering
CURRENT v1.2 — free-first API stack, tiered TTS, no silent fallbacks
LINKS Tech Architecture v1.2 · Investor Deck

Jeremy did not start where he is now. The stack that's live on prosenetwork.org today is the third iteration of a deliberate cost-down: every swap was driven by one rule — save money first, without going brain-dead.

Three Versions, One Thesis

v1.0 · January 2026
Claude-Heavy, Paid-First
The first production build was a 3-tier stack anchored on Anthropic: Gemini Flash Lite handled cheap greetings at Tier 0, Claude Sonnet did the standard guidance at Tier 1, and Claude Opus caught the complex legal analysis at Tier 2. Voice was a single provider — ElevenLabs Carter — on the eleven_flash_v2_5 model, $5/month Starter plan, 30K characters. It worked. It was also premium-API-first, which meant every user interaction had a direct marginal cost against a tiny budget.
v1.1 · March 2026
Gemini Rebalance
Google released Gemini 3.1 Pro Preview at competitive pricing, and the free Gemini 2.5 tier got strong enough to carry real load. The router was rewritten into a 4-tier stack: Gemini 2.5 Flash Lite, Gemini 2.5 Flash, Gemini 3.1 Pro Preview, and Claude Sonnet as a premium top-tier fallback. Claude Opus was dropped from production. The median request moved from a ~$0.003 Sonnet call to a $0.00 Gemini call. Voice was unchanged — ElevenLabs Carter was still doing the talking — but its ~$5/mo spend was now the only line item.
v1.2 · April 2026 · CURRENT
Free-First, Tiered TTS, No Silent Fallbacks
Two simultaneous shifts. First, the TTS pipeline was rebuilt after the ElevenLabs quota ran dry mid-sprint: a 3-tier TTS fallback was introduced — Edge-TTS Steffan primary (Microsoft's free neural endpoint, sub-1s), Kokoro ONNX am_michael as a fully-local fallback, and ElevenLabs Carter held in reserve as a premium tertiary tier. Second, the browser speechSynthesis degradation was killed — v1.1 silently fell back to a generic browser voice when ElevenLabs failed, which meant users were regularly hearing "Jeremy" in the wrong voice without knowing it. v1.2 returns a clean 503 if every server tier fails. Voice integrity over voice presence. Total operating cost: $0.00/month.

What Changed, Line by Line

ComponentBefore (v1.0 / v1.1)After (v1.2)
AI Tier 1 Gemini 2.5 Flash Lite Gemini 2.5 Flash Lite (unchanged)
AI Tier 2 Claude Sonnet 4.6 Gemini 2.5 Flash (free, default for tool sessions)
AI Tier 3 Claude Opus 4.6 Gemini 3.1 Pro Preview (~$0.01, long inputs only)
AI Tier 4 (did not exist) Claude Sonnet 4 (premium fallback)
TTS primary ElevenLabs Carter / eleven_flash_v2_5 Edge-TTS en-US-SteffanNeural
TTS fallback Browser speechSynthesis (wrong voice) Kokoro ONNX am_michael (local) → ElevenLabs eleven_v3 → 503
Voice on failure Silent downgrade to browser voice Clean 503, UI fails visibly
Python runtime 3.11 3.12 (VM default)
Monthly cost ~$5.00 (ElevenLabs Starter) $0.00
Premium APIs Required (Anthropic on most requests) Optional (gated by env key, fallback-only)

Three Lessons Baked Into the Stack

1 · FREE FIRST, PREMIUM ON DEMAND

Every tier is gated by can the free path do this? The premium tiers (Gemini 3.1 Pro, Claude Sonnet, ElevenLabs) only run when the free tiers demonstrably fail. This is the Prime Directive encoded directly into the router's fallback walk.

2 · HARDWARE-AWARE TIERING

Kokoro ONNX has the best voice of the three TTS engines, but benchmarked at ~1 second of inference per character of input on the production Oracle ARM A1 Micro. A 300-char response would blow past 4 minutes of CPU. The tier order was inverted for the live VM: a remote free API (Edge-TTS) beats a local free model (Kokoro) when the local CPU is the bottleneck. On an x86 VM Kokoro would likely retake primary.

3 · FAIL LOUD, NOT IN DISGUISE

v1.1 had a hidden failure mode: when ElevenLabs 503'd, the frontend quietly fell back to window.speechSynthesis — a generic robot voice impersonating Jeremy. Users never knew. v1.2 killed that fallback entirely. If every server tier fails, the voice stops. Voice integrity over voice presence. The same principle is being pushed out to privacy claims, capability claims, and anywhere the system might be tempted to bluff.