A one-page history of how the stack changed, and why.
Jeremy did not start where he is now. The stack that's live on prosenetwork.org today is the third iteration of a deliberate cost-down: every swap was driven by one rule — save money first, without going brain-dead.
eleven_flash_v2_5 model, $5/month Starter plan, 30K characters.
It worked. It was also premium-API-first, which meant every user interaction had
a direct marginal cost against a tiny budget.
speechSynthesis degradation was killed —
v1.1 silently fell back to a generic browser voice when ElevenLabs failed, which
meant users were regularly hearing "Jeremy" in the wrong voice without knowing it.
v1.2 returns a clean 503 if every server tier fails. Voice integrity over voice
presence. Total operating cost: $0.00/month.
| Component | Before (v1.0 / v1.1) | After (v1.2) |
|---|---|---|
| AI Tier 1 | Gemini 2.5 Flash Lite | Gemini 2.5 Flash Lite (unchanged) |
| AI Tier 2 | Claude Sonnet 4.6 | Gemini 2.5 Flash (free, default for tool sessions) |
| AI Tier 3 | Claude Opus 4.6 | Gemini 3.1 Pro Preview (~$0.01, long inputs only) |
| AI Tier 4 | (did not exist) | Claude Sonnet 4 (premium fallback) |
| TTS primary | ElevenLabs Carter / eleven_flash_v2_5 |
Edge-TTS en-US-SteffanNeural |
| TTS fallback | Browser speechSynthesis (wrong voice) |
Kokoro ONNX am_michael (local) → ElevenLabs eleven_v3 → 503 |
| Voice on failure | Silent downgrade to browser voice | Clean 503, UI fails visibly |
| Python runtime | 3.11 | 3.12 (VM default) |
| Monthly cost | ~$5.00 (ElevenLabs Starter) | $0.00 |
| Premium APIs | Required (Anthropic on most requests) | Optional (gated by env key, fallback-only) |
Every tier is gated by can the free path do this? The premium tiers (Gemini 3.1 Pro, Claude Sonnet, ElevenLabs) only run when the free tiers demonstrably fail. This is the Prime Directive encoded directly into the router's fallback walk.
Kokoro ONNX has the best voice of the three TTS engines, but benchmarked at ~1 second of inference per character of input on the production Oracle ARM A1 Micro. A 300-char response would blow past 4 minutes of CPU. The tier order was inverted for the live VM: a remote free API (Edge-TTS) beats a local free model (Kokoro) when the local CPU is the bottleneck. On an x86 VM Kokoro would likely retake primary.
v1.1 had a hidden failure mode: when ElevenLabs 503'd, the frontend quietly fell back to window.speechSynthesis — a generic robot voice impersonating Jeremy. Users never knew. v1.2 killed that fallback entirely. If every server tier fails, the voice stops. Voice integrity over voice presence. The same principle is being pushed out to privacy claims, capability claims, and anywhere the system might be tempted to bluff.