1. System Architecture
Jeremy runs as a monolithic Python HTTP server behind an nginx reverse proxy on a single Oracle ARM A1 Micro instance. No containers, no orchestration, no cloud functions. One process, one port, one database.
Full-Stack Layer Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ CLIENT LAYER │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ chat.html │ │ index.html │ │ divorce.html │ │
│ │ (Main UI) │ │ (Landing) │ │ (Workflow) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └────────────┬────┴─────────────────┘ │
│ │ HTTPS :443 │
├──────────────────────┼──────────────────────────────────────────────┤
│ REVERSE PROXY │ │
│ │ │
│ ┌───────────────────▼─────────────────────────────────┐ │
│ │ nginx │ │
│ │ SSL termination (Let's Encrypt) │ │
│ │ proxy_pass → http://127.0.0.1:7860 │ │
│ │ SPA fallback: try_files $uri /index.html │ │
│ └───────────────────┬─────────────────────────────────┘ │
│ │ HTTP :7860 │
├──────────────────────┼──────────────────────────────────────────────┤
│ APPLICATION SERVER │ │
│ │ │
│ ┌───────────────────▼─────────────────────────────────┐ │
│ │ server.py (2,389 lines) │ │
│ │ HTTPServer + ThreadingMixIn │ │
│ │ In-memory session dict (12-char UUID keys) │ │
│ │ 207+ route handlers (do_GET / do_POST) │ │
│ └───┬──────────┬──────────┬──────────┬────────────────┘ │
│ │ │ │ │ │
├──────┼──────────┼──────────┼──────────┼─────────────────────────────┤
│ PROCESSING ENGINES │
│ │ │ │ │ │
│ ┌───▼──────┐ ┌─▼────────┐ ┌▼────────┐ ┌▼─────────────┐ │
│ │ state │ │ risk │ │ legal │ │ jeremy │ │
│ │ machine │ │ engine │ │ rails │ │ client │ │
│ │ (FSM) │ │ (5 sig) │ │ (UPL) │ │ (AI tiers) │ │
│ └──────────┘ └──────────┘ └─────────┘ └──────────────┘ │
│ │ │ │ │
│ ┌───▼──────┐ ┌─▼────────┐ ┌───▼────────────┐ │
│ │ rule │ │ gate │ │ conversation │ │
│ │ engine │ │ bridge │ │ log │ │
│ └──────────┘ └──────────┘ └────────────────┘ │
│ │
├────────────────────────────────────────────────────────────────────┤
│ DATA / PERSISTENCE │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │
│ │ memories.db │ │ conversations │ │ persistent │ │
│ │ SQLite WAL mode │ │ .jsonl │ │ memory (FTS5) │ │
│ │ 4 tables │ │ retraining log │ │ ~/.jeremy/ │ │
│ └──────────────────┘ └──────────────────┘ └────────────────┘ │
│ │
├────────────────────────────────────────────────────────────────────┤
│ EXTERNAL SERVICES │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐│
│ │ Gemini API │ │ Anthropic │ │ Edge-TTS │ │ Court ││
│ │ 2.5 / 3.1 │ │ Claude │ │ (primary) │ │ Listener ││
│ │ (Tiers 1-3) │ │ Sonnet (T4) │ │ + Kokoro │ │ (lookup) ││
│ └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘│
└────────────────────────────────────────────────────────────────────┘
Key Architectural Decisions
ThreadingMixIn — each request gets its own thread. No async, no event loop. Simple concurrency model for low-traffic legal guidance.
- No Docker — bare process on the VM. Eliminates container overhead on a resource-constrained 1-CPU ARM instance.
- SQLite WAL mode — concurrent readers, single writer. Sufficient for expected throughput. No Postgres dependency.
- In-memory sessions — session dict lives in process memory. Lost on restart. Acceptable tradeoff for a stateless-enough legal advisor.
2. Request Lifecycle
A single user message traverses 17 processing steps from HTTP POST to rendered response. Every step is synchronous within the request thread.
USER MESSAGE LIFECYCLE — 17 STEPS
Browser (chat.html)
│
▼
① fetch('/api/chat', POST)
│ Body: { message, session_id }
│
▼
② nginx SSL termination
│ :443 → proxy_pass → :7860
│
▼
③ server.py do_POST('/api/chat')
│ Parse JSON body, extract message + session_id
│
▼
④ Session lookup
│ sessions[session_id] or create new (12-char UUID)
│ Load: state, history[], area_of_law, jurisdiction, facts{}
│
▼
⑤ Custody keyword check
│ Scan message against 23 CUSTODY_KEYWORDS
│ + 1 contextual ("children" near custody context)
│ − 14 negators that suppress false positives
│ → If matched: inject custody disclaimer, set risk_override
│
▼
⑥ State machine transition
│ current_state → validate against VALID_TRANSITIONS
│ If invalid: raise InvalidTransition (400 response)
│ If valid: update session.state
│
▼
⑦ Fact extraction (LLM)
│ Send message to AI tier for structured fact extraction
│ Returns: {entities, dates, amounts, relationships}
│ Merge into session.facts{}
│
▼
⑧ Area-of-law classification
│ 14 areas scored: contract_review(5) → criminal_defense(50)
│ Keyword matching + LLM confirmation
│ Set session.area_of_law
│
▼
⑨ Jurisdiction detection
│ State/city extraction from facts
│ Map to 8 supported jurisdictions
│ Affects: filing deadlines, court procedures, fee waivers
│
▼
⑩ Risk scoring (5 signals)
│ Signal 1: Custody keywords → 0-100
│ Signal 2: Area-of-law severity → 5-50
│ Signal 3: Deadline urgency → 0-30
│ Signal 4: Prerequisite gaps → 0-40
│ Signal 5: Aggravating factors → 0-25
│ Composite: min(sum, 100) OR _max_risk()
│
▼
⑪ enforce_rails()
│ Check UPL prohibitions (8 rules)
│ Apply safe harbor permissions (9 rules)
│ Inject disclaimers by risk level:
│ LOW (0-25): green banner, proceed
│ MEDIUM (26-50): yellow banner, add caution
│ HIGH (51-75): orange banner, recommend attorney
│ CRITICAL (76+): red banner, hard stop + referral
│
▼
⑫ Guidance generation (LLM)
│ Select AI tier based on complexity/input length
│ Inject: system prompt + legal rails + facts + history
│ Inject: personality directives (Carter voice)
│ Generate response with risk-appropriate guardrails
│
▼
⑬ Memory persistence
│ Write to memories.db: conversation entry, entity updates
│ Write to conversations.jsonl: retraining archive
│ Update personality_evolution if trait triggers detected
│
▼
⑭ JSON response assembly
│ { response, risk_level, risk_score, disclaimers[],
│ area_of_law, state, suggested_actions[] }
│
▼
⑮ formatText() — client-side
│ Markdown → HTML rendering
│ Legal citation linking
│ Risk banner injection into DOM
│
▼
⑯ speak() — client-side TTS
│ POST /api/tts with response text
│ Server: cleanForTTS() → ElevenLabs API → audio blob
│ Client: play audio, activate KITT bars
│
▼
⑰ KITT equalizer animation
32-bar Web Audio FFT-128 visualization
Center-out gold gradient, glow at val>100
Idle breathing animation on silence
3. State Machine FSM
Jeremy operates on a 14-state finite state machine. Each session tracks its current state, and transitions are validated against a whitelist. Invalid transitions raise InvalidTransition and return a 400.
State Inventory
| State | Module Owner | Entry Trigger | Exit Trigger |
| GREETING | server.py | New session created | User sends first message |
| INTAKE | server.py | First user message | Area of law classified |
| FACT_GATHERING | state_machine.py | Area classified | Minimum facts threshold met |
| AREA_CLASSIFICATION | state_machine.py | Facts sufficient | Area confirmed by LLM |
| JURISDICTION_CHECK | state_machine.py | Area confirmed | Jurisdiction resolved |
| RISK_ASSESSMENT | risk_engine.py | Jurisdiction set | Risk score computed |
| PREREQUISITE_CHECK | rule_engine.py | Risk assessed | All prerequisites evaluated |
| GUIDANCE | jeremy_client.py | Prerequisites clear | User asks follow-up or exits |
| DOCUMENT_PREP | server.py | User requests document | Document generated |
| FILING_GUIDANCE | rule_engine.py | Document ready | Filing instructions delivered |
| REFERRAL | legal_rails.py | Risk critical OR UPL trigger | Referral link provided |
| FOLLOW_UP | server.py | Post-guidance question | New topic or session end |
| ESCALATION | gate_bridge.py | Guard vote DENY | Owner review complete |
| CLOSED | server.py | User ends session | Terminal state |
VALID_TRANSITIONS Map
VALID_TRANSITIONS = {
GREETING → [INTAKE]
INTAKE → [FACT_GATHERING, REFERRAL]
FACT_GATHERING → [AREA_CLASSIFICATION, REFERRAL]
AREA_CLASSIFICATION→ [JURISDICTION_CHECK, FACT_GATHERING]
JURISDICTION_CHECK → [RISK_ASSESSMENT]
RISK_ASSESSMENT → [PREREQUISITE_CHECK, REFERRAL, ESCALATION]
PREREQUISITE_CHECK → [GUIDANCE, REFERRAL]
GUIDANCE → [DOCUMENT_PREP, FILING_GUIDANCE, FOLLOW_UP, CLOSED]
DOCUMENT_PREP → [FILING_GUIDANCE, GUIDANCE]
FILING_GUIDANCE → [FOLLOW_UP, CLOSED]
REFERRAL → [CLOSED]
FOLLOW_UP → [FACT_GATHERING, GUIDANCE, CLOSED]
ESCALATION → [GUIDANCE, REFERRAL, CLOSED]
CLOSED → [] // terminal
}
FSM Flow Diagram
┌──────────┐
│ GREETING │
└────┬─────┘
│
┌────▼─────┐
│ INTAKE │
└────┬─────┘
│
┌────────▼─────────┐
│ FACT_GATHERING │◄──────────────────┐
└────────┬─────────┘ │
│ │
┌───────────▼────────────┐ │
│ AREA_CLASSIFICATION │────────────────┘
└───────────┬────────────┘ (need more facts)
│
┌───────────▼────────────┐
│ JURISDICTION_CHECK │
└───────────┬────────────┘
│
┌───────────▼────────────┐
│ RISK_ASSESSMENT │──────────┐
└───────────┬────────────┘ │
│ ┌────▼──────┐
┌───────────▼────────────┐ │ ESCALATION│
│ PREREQUISITE_CHECK │ └────┬──────┘
└───────────┬────────────┘ │
│ ┌────────┘
┌────▼─────┐ │
┌───────►│ GUIDANCE │◄───────┘
│ └──┬───┬───┘
│ │ │
┌──────▼───┐ ┌────▼───▼──────┐ ┌──────────┐
│ FOLLOW_UP│ │ DOCUMENT_PREP │ │ REFERRAL │◄─── (any high-risk)
└──────┬───┘ └────┬──────────┘ └────┬──────┘
│ ┌────▼──────────┐ │
│ │FILING_GUIDANCE│ │
│ └────┬──────────┘ │
│ │ │
└───────────┼───────────────────┘
│
┌────▼───┐
│ CLOSED │
└────────┘
ENFORCEMENT
Any transition not in VALID_TRANSITIONS[current_state] raises InvalidTransition(current, attempted). The server catches this and returns HTTP 400 with the invalid transition pair logged. This prevents impossible state jumps — you cannot go from GREETING to DOCUMENT_PREP.
4. AI Tier Routing
Jeremy uses a 4-tier AI model hierarchy governed by the Prime Directive: save money first. The routing decision is deterministic based on task complexity, tool requirements, and input length. Free Gemini tiers handle the majority of traffic; paid Claude Sonnet is a premium fallback reserved for cases the free tiers cannot handle.
Tier Decision Tree
┌──────────────┐
│ Incoming Req │
└──────┬───────┘
│
┌───────────▼───────────┐
│ task in COMPLEX_TASKS? │
└───┬───────────────┬───┘
YES NO
│ │
┌────▼─────┐ ┌──────▼──────┐
│ TIER 2 │ │ tools on? │
│ Flash │ │ │
└──────────┘ └──┬───────┬──┘
YES NO
│ │
┌────▼────┐ ┌▼────────┐
│ TIER 2 │ │ TIER 1 │
│ Flash │ │ Flash │
│ │ │ Lite │
└─────────┘ └─────────┘
Fallback chain (step up on empty / error / short response):
TIER 1 → TIER 2 → TIER 3 → TIER 4 → then walk back down.
Tier Specifications
| Tier | Model | Max Tokens | Cost/Call | Use Case |
| 1 | Gemini 2.5 Flash Lite | 1,000 | $0.00 | Simple greetings, clarifications, no-tool chat |
| 2 | Gemini 2.5 Flash | 1,500 | $0.00 | Default for tool-enabled sessions and complex tasks |
| 3 | Gemini 3.1 Pro Preview | 2,000 | ~$0.01 | Long inputs (>2,000 chars) and tool-result follow-ups |
| 4 | Claude Sonnet 4 | 2,000 | ~$0.01 | Premium fallback when all Gemini tiers fail or degrade |
Implementation Details
- Singleton pattern —
jeremy_client.py holds one JeremyClient instance. All four tiers share the same client and session state.
- Rate limiting — 100ms minimum between API calls (global, not per-tier). Prevents burst billing and API abuse.
- Minimum tier floor — Any session with tools enabled starts at Tier 2 minimum. Tier 1 (Flash Lite) cannot reliably emit tool-call JSON, so the router clamps it out whenever the tool system is loaded.
- Tiered fallback walk — If the chosen tier returns
None or a response shorter than 50 chars, the router steps up one tier and retries. Once it reaches the top, it walks back down to lower tiers. All four tiers must fail before Jeremy returns an error.
- Anthropic availability check — Tier 4 is skipped entirely if
ANTHROPIC_API_KEY is not set. The free Gemini stack is fully self-sufficient.
- COMPLEX_TASKS set —
narrative_to_facts, contract_to_clauses, structured_to_guidance. These bypass Tier 1 and start at Tier 2.
5. Risk Engine
The risk engine computes a composite score (0-100) from 5 independent signals. Each signal is deterministic — no LLM involved in scoring. The LLM is only used upstream for fact extraction; the scoring itself is pure keyword matching and arithmetic.
Signal 1: Custody Keywords
CUSTODY_KEYWORDS = [
"custody", "visitation", "parenting time", "parenting plan",
"custodial", "non-custodial", "sole custody", "joint custody",
"physical custody", "legal custody", "child support",
"parental rights", "termination of parental rights", "TPR",
"guardian ad litem", "GAL", "best interests of the child",
"UCCJEA", "Hague Convention", "parental alienation",
"supervised visitation", "forensic evaluation", "child protective"
] # 23 primary keywords
CONTEXTUAL_TRIGGER = "children" # only fires near custody context
NEGATORS = [
"no children", "no kids", "childless", "not a parent",
"don't have children", "do not have children", "never had children",
"not about custody", "unrelated to custody", "not a custody matter",
"no custody issue", "custody is not", "custody isn't", "not seeking custody"
] # 14 negators that suppress false positives
IMMINENT_RISK = [
"emergency custody", "ex-parte", "TRO", "kidnapping",
"flee", "fleeing", "abduction", "taken my child"
] # 8 imminent danger keywords → score = 100 (hard ceiling)
Signal 2: Area-of-Law Severity
| Area of Law | Base Score | Rationale |
| contract_review | 5 | Low risk, informational |
| small_claims | 10 | Limited jurisdiction, low stakes |
| landlord_tenant | 15 | Housing rights, some urgency |
| consumer_protection | 15 | UDAP, warranty claims |
| employment | 20 | EEOC deadlines, retaliation risk |
| debt_collection | 20 | FDCPA, SOL concerns |
| family_law | 25 | Divorce, property division |
| immigration | 30 | Deportation risk, complex procedures |
| personal_injury | 30 | SOL pressure, medical complexity |
| police_misconduct | 35 | §1983, qualified immunity, evidence loss |
| bankruptcy | 35 | Asset protection, means test |
| civil_rights | 40 | Constitutional claims, systemic |
| custody | 45 | Child welfare, emergency orders |
| criminal_defense | 50 | Liberty at stake, Miranda, plea implications |
Signal 3: Deadline Urgency
DEADLINE_TRIGGERS = {
"imminent": ["tomorrow", "today", "tonight", "this morning",
"right now", "happening now", "hours"], # +30
"urgent": ["this week", "next week", "few days",
"running out of time", "deadline"], # +20
"pressing": ["this month", "next month", "30 days",
"soon", "coming up"], # +10
"aware": ["eventually", "planning to", "thinking about",
"want to", "considering"] # +5
}
Signal 4: Prerequisite Gaps
Each area of law has REQUIRED prerequisites:
e.g., employment → [EEOC_charge_filed, right_to_sue_letter, SOL_check]
For each unmet prerequisite: +5 points
If prerequisite is URGENT: +15 points
If prerequisite is EXPIRED: +20 points
Maximum from this signal: 40 (capped)
Signal 5: Aggravating Factors
| Category | Keywords | Points | Risk Bump |
| Violence | hit, punch, assault, attack, weapon, gun, knife | +25 | → CRITICAL |
| Threats | threatened, threatening, intimidation, stalking, harassing | +20 | → HIGH |
| Children at risk | child abuse, neglect, CPS, ACS, foster care | +25 | → CRITICAL |
| Financial harm | stolen, fraud, scam, identity theft, drained account | +15 | → HIGH |
| Housing emergency | eviction notice, lockout, illegal eviction, marshal | +15 | → HIGH |
| Incarceration | arrested, jail, prison, bail, arraignment, warrant | +20 | → CRITICAL |
| Self-harm | hurt myself, end it, suicide, self-harm, give up | +25 | → CRITICAL + crisis referral |
Composite Formula
def compute_risk(signals):
raw = sum(s.score for s in signals)
composite = min(raw, 100)
# _max_risk() override: if ANY signal alone ≥ threshold,
# take the highest individual signal score
max_single = max(s.score for s in signals)
return max(composite, max_single) # whichever is higher
DESIGN NOTE
The _max_risk() function ensures that a single catastrophic signal (e.g., imminent custody = 100) cannot be diluted by low scores from other signals. A user mentioning "my ex is fleeing with my child" hits score 100 regardless of all other factors.
6. Legal Rails
The legal rails system prevents unauthorized practice of law (UPL) while maximizing the information Jeremy can safely provide. Two rule sets govern behavior: 8 prohibitions and 9 safe harbor permissions.
UPL Prohibitions (8 Rules)
PROHIBITIONS = [
"Do not tell the user what to do in their specific case",
"Do not predict case outcomes or chances of success",
"Do not recommend specific legal strategies as advice",
"Do not draft legal documents presented as final/ready-to-file",
"Do not interpret how a law applies to their specific facts",
"Do not recommend whether to accept or reject a settlement",
"Do not advise on plea deals or criminal defense strategy",
"Do not represent yourself as an attorney or legal professional"
]
Safe Harbor (9 Permitted)
SAFE_HARBOR = [
"Explain what a law says in plain language",
"Describe court procedures, filing steps, and deadlines",
"Provide general information about legal rights",
"Explain legal terms and concepts",
"Help organize facts and documents for their case",
"Provide form templates with blank fields",
"Describe what others in similar situations have done",
"Explain the pros and cons of different approaches generally",
"Direct to legal aid, bar associations, and court resources"
]
Disclaimer Strings
PERSISTENT_DISCLAIMER =
"I'm an AI legal assistant, not an attorney. This is legal
information, not legal advice. No attorney-client relationship
is formed. For your specific situation, consult a licensed
attorney in your jurisdiction."
FOOTER_DISCLAIMER =
"Pro Se Network provides legal information, not legal advice.
Jeremy is an AI assistant — not an attorney."
DOCUMENT_HEADER =
"TEMPLATE — NOT LEGAL ADVICE. This document is a template for
informational purposes only. Have an attorney review before filing."
enforce_rails() Gate Logic
enforce_rails(message, risk_level, area)
│
▼
┌──────────────────────┐
│ Check PROHIBITIONS │
│ against response │───── Match? ──── Redact + inject disclaimer
└──────────┬───────────┘
│ Clean
▼
┌──────────────────────┐
│ Check risk_level │
└──┬───┬───┬───┬───────┘
│ │ │ │
LOW MED HIGH CRIT
│ │ │ │
│ │ │ └─► Red banner + hard stop + attorney referral link
│ │ └─────► Orange banner + "strongly recommend attorney"
│ └─────────► Yellow banner + "consider consulting attorney"
└─────────────► Green banner + proceed normally
Risk Level Behavior
| Level | Score | Color | Banner | Behavior |
| LOW | 0-25 | Green | Informational | Full guidance, standard disclaimers |
| MEDIUM | 26-50 | Yellow | Caution | Guidance + attorney suggestion |
| HIGH | 51-75 | Orange | Warning | Limited guidance + strong attorney recommendation |
| CRITICAL | 76-100 | Red | Hard Stop | No guidance — referral only + crisis resources if applicable |
7. Memory System
Jeremy's memory operates on two layers: a runtime SQLite database (memories.db) for session intelligence, and a persistent filesystem store (~/.jeremy/memory/) with FTS5 full-text search for cross-session recall.
SQLite Schema (4 Tables)
CREATE TABLE conversations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
role TEXT NOT NULL, -- 'user' | 'assistant'
content TEXT NOT NULL,
area_of_law TEXT,
risk_score INTEGER DEFAULT 0,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE entity_memory (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
entity_type TEXT NOT NULL, -- 'person' | 'date' | 'amount' | 'org'
entity_name TEXT NOT NULL,
context TEXT,
first_seen DATETIME DEFAULT CURRENT_TIMESTAMP,
last_seen DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE gate_decisions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
tool_name TEXT NOT NULL,
risk_level TEXT NOT NULL, -- 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL'
votes TEXT NOT NULL, -- JSON: [{guard, vote, reason}]
verdict TEXT NOT NULL, -- 'APPROVED' | 'DENIED'
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE personality_evolution (
id INTEGER PRIMARY KEY AUTOINCREMENT,
trait TEXT NOT NULL, -- 'warmth' | 'directness' | 'humor' | 'formality'
score REAL DEFAULT 0.5,
trigger TEXT, -- what caused the shift
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
WAL Mode + Thread Safety
PRAGMA journal_mode = WAL;
PRAGMA busy_timeout = 5000;
# Python-side:
_write_lock = threading.Lock()
def _write(self, sql, params):
with self._write_lock:
conn = sqlite3.connect(self.db_path)
conn.execute(sql, params)
conn.commit()
conn.close()
DESIGN NOTE
WAL mode allows concurrent reads while serializing writes through _write_lock. Each write opens and closes its own connection — no long-lived connection objects that could leak across threads.
Context Message Assembly Pipeline (6 Layers)
build_context(session_id, new_message)
│
▼
Layer 1: System Prompt
│ Jeremy's personality, role definition, Carter voice directives
│
▼
Layer 2: Legal Rails Injection
│ PROHIBITIONS + SAFE_HARBOR + risk-level-specific instructions
│
▼
Layer 3: Conversation History
│ Last 10 messages from conversations table
│ Ordered by timestamp ASC
│
▼
Layer 4: Entity Recall
│ All entities from entity_memory for this session
│ Injected as: "You know: [person: John], [date: March 15], ..."
│
▼
Layer 5: Cross-Session Summary
│ If returning user: summarize prior sessions from persistent memory
│ FTS5 search on user identifiers
│
▼
Layer 6: Current Message
User's new message appended as final context entry
8. Tool System
Jeremy has 27 registered tools managed through registry.py. Each tool has a risk level, a gate requirement, and a description. Tool execution is sandboxed with resource caps and command whitelists.
Tool Risk Matrix (27 Tools)
| Tool | Risk | Gate | Description |
| web_search | LOW | Auto | Search legal databases, court info |
| case_lookup | LOW | Auto | CourtListener case search |
| statute_lookup | LOW | Auto | Look up specific statutes |
| court_info | LOW | Auto | Court addresses, hours, procedures |
| deadline_calc | LOW | Auto | Calculate filing deadlines |
| fee_waiver_check | LOW | Auto | Check IFP eligibility |
| form_finder | LOW | Auto | Find court forms by jurisdiction |
| legal_aid_search | LOW | Auto | Find free legal aid near user |
| document_template | MEDIUM | Gate | Generate document templates |
| letter_draft | MEDIUM | Gate | Draft demand/complaint letters |
| evidence_checklist | LOW | Auto | Generate evidence preservation list |
| timeline_builder | LOW | Auto | Build chronological case timeline |
| risk_assessment | MEDIUM | Gate | Run full risk engine analysis |
| jurisdiction_check | LOW | Auto | Determine proper jurisdiction |
| prerequisite_check | LOW | Auto | Check filing prerequisites |
| memory_store | LOW | Auto | Store fact to persistent memory |
| memory_recall | LOW | Auto | Recall from persistent memory |
| send_email | HIGH | Gate + Security | Send email via SMTP vault |
| send_sms | HIGH | Gate + Security | Send SMS notification |
| file_read | MEDIUM | Gate | Read uploaded user files |
| file_write | HIGH | Gate + Security | Write/generate files |
| shell_exec | CRITICAL | Gate + Owner | Execute shell commands (sandboxed) |
| api_call | HIGH | Gate + Security | Make external API calls |
| db_query | MEDIUM | Gate | Query memories.db |
| personality_adjust | LOW | Auto | Adjust personality trait scores |
| gate_talk | MEDIUM | Gate | Initiate Warden terminal session |
| escalate | HIGH | Gate + Security | Escalate to human review |
Approval Flow
APPROVAL LEVELS:
LOW → Auto-approve, no gate check
MEDIUM → Gate check (4 guards vote)
HIGH → Gate check + security review (vault verification)
CRITICAL → Gate check + owner approval required (blocks until ACK)
Sandbox: ALLOWED_COMMANDS + BLOCKED_PATTERNS
ALLOWED_COMMANDS = [
"ls", "cat", "head", "tail", "grep", "wc", "find",
"date", "whoami", "pwd", "echo", "python3", "pip3", "curl"
] # 13 whitelisted commands — everything else blocked
BLOCKED_PATTERNS = [
"rm -rf", "rm -r", "rmdir", "mkfs", "dd if=",
"chmod 777", "curl.*|.*sh", "wget.*|.*sh",
"> /dev/", "sudo", "su ", "passwd", "useradd",
"kill", "pkill", "reboot", "shutdown"
] # 16+ blocked patterns — checked before execution
RESOURCE_CAPS = {
"memory": "256MB", # ulimit -v
"timeout": "30s", # subprocess timeout
"network": "10s" # urllib timeout
}
Vault System
SMTP_ACCOUNTS = [
"account_1", # Primary sending account
"account_2", # Secondary
"account_3" # Fallback
]
ALLOWED_SMTP_HOSTS = frozenset([
"smtp.gmail.com",
"smtp.office365.com"
])
# Credentials loaded from environment:
# SMTP_USER_1, SMTP_PASS_1, SMTP_HOST_1
# SMTP_USER_2, SMTP_PASS_2, SMTP_HOST_2
# SMTP_USER_3, SMTP_PASS_3, SMTP_HOST_3
Rate Limiter
SQLite-backed token bucket:
SMS: 5/minute, 50/day
Email: 10/minute, 100/day
Implementation: rate_limiter.py
- One SQLite table: rate_limits (resource, tokens, last_refill, window)
- Token bucket refill on check
- Atomic decrement with _write_lock
- Separate buckets per resource type
9. Quadrant Guard
The Quadrant Guard is a 4-guard gate system that governs tool execution. Each guard evaluates the request independently, casts an APPROVE or DENY vote with reasoning, and the majority verdict determines execution.
Intent Classification
4 QUADRANTS (keyword scoring):
Q1: INFORMATION — lookup, search, find, check, show, list, what is
Q2: CREATION — create, generate, draft, build, write, template
Q3: COMMUNICATION — send, email, text, notify, contact, message
Q4: EXECUTION — run, execute, shell, command, install, deploy
Each incoming tool request is scored against all 4 quadrants.
Highest score determines primary classification.
Multi-quadrant hits increase scrutiny level.
Guard Vote Flow
gate_check(tool_name, args, session)
│
▼
┌───────────────────────────┐
│ Intent Classification │
│ Score against 4 quadrants │
└─────────────┬─────────────┘
│
┌─────────▼─────────┐
│ Invoke 4 Guards │
└──┬──┬──┬──┬───────┘
│ │ │ │
┌────▼┐┌▼──┐┌▼──┐┌▼─────┐
│ G1 ││G2 ││G3 ││ G4 │
│SAFE ││UPL││RISK││SCOPE │
└──┬──┘└─┬─┘└─┬─┘└──┬───┘
│ │ │ │
▼ ▼ ▼ ▼
APPROVE DENY APPROVE APPROVE ← example
│ │ │ │
└─────┼────┼─────┘
▼
┌─────────────────┐
│ MAJORITY VOTE │
│ 3/4 APPROVE │
│ Verdict: APPROVE │
└─────────────────┘
│
▼
┌─────────────────┐
│ Log to │
│ gate_decisions │
│ (memories.db) │
└─────────────────┘
Guard Responsibilities
| Guard | Name | Checks |
| G1 | Safety Guard | Resource caps, blocked patterns, sandbox compliance |
| G2 | UPL Guard | Unauthorized practice of law violations, disclaimer presence |
| G3 | Risk Guard | Current risk score vs tool risk level, escalation threshold |
| G4 | Scope Guard | Tool within session context, no scope creep, rate limits |
Warden Terminal
gate_talk.py provides a direct terminal interface to the gate system — the "Warden" mode. Used for manual gate overrides, audit log inspection, and guard diagnostics. Accessible only through the gate_talk tool (MEDIUM risk, requires gate approval itself).
10. Voice Pipeline
Jeremy speaks with the Steffan voice — measured, confident, narrator-tone. As of v1.2, voice is a tiered free-first TTS stack: Edge-TTS serves primary traffic sub-1s, Kokoro ONNX is a fully-local fallback, and ElevenLabs is held in reserve as a premium tier. The fallback walks top-down and returns a clean 503 if every tier fails (no more silent degradation to browser speechSynthesis).
Tiered TTS Stack
JEREMY VOICE — 3-TIER TTS FALLBACK
POST /api/tts { text: "..." }
│
▼
TIER 1 — Edge-TTS (primary, free)
│ Provider: Microsoft Azure Neural (free endpoint)
│ Voice: en-US-SteffanNeural
│ Latency: ~0.6s for 500 chars
│ Cost: $0.00
│ Returns: audio/mpeg (MP3)
│
▼ fail
TIER 2 — Kokoro ONNX (local, free)
│ Runtime: kokoro-onnx on CPU
│ Voice: am_michael (warm American male)
│ Model: kokoro-v1.0.onnx (310 MB) + voices-v1.0.bin (27 MB)
│ Latency: ~2s x86, much slower on ARM A1
│ Cost: $0.00
│ Returns: audio/wav
│
▼ fail
TIER 3 — ElevenLabs (premium fallback)
│ Provider: ElevenLabs API
│ Voice: Carter D (GorLj2SsI4u2JqL58gAA)
│ Model: eleven_v3
│ Cost: ~$0.01 / request (when credits available)
│ Gated: Only invoked if ELEVEN_API_KEY is set
│
▼ all fail
HTTP 503 { "error": "Voice unavailable — all TTS engines failed" }
Endpoint Contract
POST /api/tts
Body: { "text": "..." }
Max chars: 1,000 per request (cut at sentence boundary)
Timeout: 30s per tier
Returns: audio/mpeg (Edge) or audio/wav (Kokoro) blob
Errors: 400 no text | 503 all tiers failed
Why Edge-TTS, not Kokoro, as primary on the live VM
Kokoro ONNX is the highest-quality of the three, but the production VM is an Oracle ARM A1 Micro. In benchmark, Kokoro inference on that hardware runs at roughly 1 second per character of input — a 300-char response would exceed 4 minutes of inference and blow through any sane proxy timeout. Edge-TTS, by contrast, is a network call to Microsoft's free neural endpoint that returns in under a second regardless of host CPU. So the order was inverted for the live deployment: Edge-TTS primary, Kokoro reserved as a local safety net in case the Edge endpoint is ever blocked, and ElevenLabs only if both fail and a key is configured.
cleanForTTS() — 11 Regex Passes
function cleanForTTS(text) {
text = text.replace(/\*\*(.*?)\*\*/g, '$1'); // strip bold
text = text.replace(/\*(.*?)\*/g, '$1'); // strip italic
text = text.replace(/#{1,6}\s/g, ''); // strip headers
text = text.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1'); // links → text
text = text.replace(/`([^`]+)`/g, '$1'); // strip inline code
text = text.replace(/```[\s\S]*?```/g, ''); // strip code blocks
text = text.replace(/[-*+]\s/g, ''); // strip list markers
text = text.replace(/\d+\.\s/g, ''); // strip numbered lists
text = text.replace(/>\s/g, ''); // strip blockquotes
text = text.replace(/\n{2,}/g, '. '); // double newlines → period
text = text.replace(/\n/g, ' '); // single newlines → space
return text.trim().substring(0, 500);
}
KITT 32-Bar Equalizer
KITT EQUALIZER — Web Audio API
AudioContext → AnalyserNode (fftSize: 128)
│
▼
getByteFrequencyData() → Uint8Array[64]
│
▼
Take 32 bars (indices 0-31)
Map center-out: bar[0] at center, bar[31] at edges
│
▼
Render:
┌──────────────────────────────────────────────────────┐
│ ▉ ▉ │
│ ▉ ▉ ▉ ▉ │
│ ▉ ▉ ▉ ▉ ▉ ▉ │
│ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ │
│ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ │
└──────────────────────────────────────────────────────┘
Color: Gold gradient (#D4AF37 → #B08D57)
Glow: box-shadow at val > 100
Idle: Breathing animation (sine wave, 0.5-3px)
Decay: Staggered fade-out per bar (30ms delay each)
speak() Function Flow
async function speak(text) {
const clean = cleanForTTS(text);
if (!clean) return;
try {
const res = await fetch('/api/tts', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text: clean })
});
if (!res.ok) throw new Error('TTS failed');
const blob = await res.blob();
const url = URL.createObjectURL(blob);
const audio = new Audio(url);
// Connect to Web Audio for KITT bars
const source = audioCtx.createMediaElementSource(audio);
source.connect(analyser);
analyser.connect(audioCtx.destination);
startKITT(); // begin animation loop
audio.play();
audio.onended = () => stopKITT();
} catch (e) {
// v1.2: NO browser-synth fallback. If every server tier fails,
// the UI fails silently rather than impersonate Jeremy with a
// generic speechSynthesis voice. Voice integrity > voice presence.
console.log('Voice unavailable — all server TTS engines failed');
stopKITT();
}
}
11. Infrastructure
Jeremy runs on a single Oracle Cloud ARM A1 Micro instance — free tier, no monthly cost. The entire stack is one process behind nginx.
VM Specification
| Property | Value |
| Provider | Oracle Cloud (Always Free) |
| Instance | ARM A1 Micro |
| IP | 129.159.169.37 |
| CPU | 1 ARM core |
| RAM | 6 GB |
| Disk | 50 GB boot volume |
| OS | Ubuntu 22.04 LTS (aarch64) |
| Cost | $0.00/mo |
nginx Configuration
server {
listen 443 ssl;
server_name prosenetwork.org www.prosenetwork.org;
ssl_certificate /etc/letsencrypt/live/prosenetwork.org/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/prosenetwork.org/privkey.pem;
root /home/ubuntu/pro-se-network/app;
index index.html;
location /api/ {
proxy_pass http://127.0.0.1:7860;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 120s;
}
location / {
try_files $uri $uri/ /index.html;
}
}
Process Inventory
PID CMD RSS
----- ------------------------------- -------
XXXX python3 src/server.py ~65 MB
└── ThreadingMixIn threads ~2 MB each
└── SQLite WAL (memories.db) ~5 MB
Total runtime footprint: ~80 MB
Persistent Memory Filesystem
~/.jeremy/
└── memory/
├── persistent.db # FTS5-indexed SQLite
├── sessions/ # Per-session summaries
│ ├── abc123.json
│ └── def456.json
└── entities/ # Cross-session entity store
├── persons.json
└── dates.json
FTS5 Index Schema
CREATE VIRTUAL TABLE memory_fts USING fts5(
content,
session_id,
area_of_law,
timestamp,
tokenize='porter unicode61'
);
-- Queries:
-- SELECT * FROM memory_fts WHERE memory_fts MATCH 'custody AND brooklyn';
-- SELECT * FROM memory_fts WHERE memory_fts MATCH 'landlord NEAR tenant';
Bridge / Quarantine System
BRIDGE CONCEPT
Tools flagged by the Quadrant Guard as DENY enter a "quarantine" state. The bridge allows an owner to manually review quarantined actions, approve or permanently deny them, and optionally adjust guard parameters. Bridge state is stored in gate_decisions table with verdict='QUARANTINED' until resolved.
12. Data Pipeline
Jeremy runs dual logging: memories.py feeds the runtime SQLite brain, while conversation_log.py writes append-only JSONL for retraining. The two systems are independent — neither blocks the other.
Dual Logging Architecture
DUAL LOGGING
User Message
│
├──────────────────────────────┐
▼ ▼
memories.py → SQLite conversation_log.py → JSONL
(Runtime Brain) (Retraining Archive)
│ │
├── conversations table ├── session_start entry
├── entity_memory table ├── user_exchange entry
├── gate_decisions table ├── assistant_exchange entry
└── personality_evolution └── area + risk tags
│ │
▼ ▼
Powers: history recall, Powers: fine-tuning dataset,
entity injection, context area distribution analysis,
assembly, gate audits quality review, retraining
JSONL Entry Format
// Session start
{"type":"session_start","session_id":"abc123","timestamp":"2026-02-25T14:30:00Z"}
// User exchange
{"type":"user","session_id":"abc123","content":"I need help with my lease",
"area_of_law":"landlord_tenant","risk_score":15,"timestamp":"..."}
// Assistant exchange
{"type":"assistant","session_id":"abc123","content":"Let me help you understand...",
"area_of_law":"landlord_tenant","risk_score":15,"tier_used":0,"tokens":847,
"timestamp":"..."}
Retraining Flow
RETRAINING PIPELINE
conversations.jsonl (accumulates)
│
▼
Filter: quality threshold
│ - Remove short/empty exchanges
│ - Remove CRITICAL-risk sessions (too sensitive)
│ - Keep only GUIDANCE-state exchanges
│
▼
Tag: area_of_law + risk_level
│ - Balance across 12 areas
│ - Ensure jurisdiction diversity
│
▼
Format: instruction/response pairs
│ - System prompt + user message → assistant response
│
▼
Fine-tune: Phi-3 3.8B (current base)
│
▼
Merge: LoRA → full model
│ Current: israelburns/jeremy-v1-merged (7.64 GB)
│
▼
Deploy: HF Space or local inference
Current Training Data
Dataset: 5,196 instruction/response pairs (v2)
Areas: 12 areas of law
Format: {"instruction": "...", "input": "...", "output": "..."}
Source: Hand-curated + synthetic + conversation logs
Artifact: israelburns/jeremy-v1-merged (Phi-3 3.8B LoRA merge, 7.64 GB on HF)
NOTE — Production vs. research track
The fine-tuned Phi-3 3.8B merged model is a research artifact, not the production inference path. Jeremy's live responses on prosenetwork.org are served 100% by the 4-tier API stack described in Section 4 (Gemini + Claude). The conversation logs and training pipeline exist to support a future self-hosted path — not to power the current build.
13. Code Metrics + Known Issues
Build Summary
JEREMY AI — BUILD METRICS
83 Python modules
41,635 lines of code
207+ HTTP endpoints (do_GET + do_POST handlers)
14 FSM states
5 risk signals
27 registered tools
4 gate guards
4 AI tiers (3 free Gemini + 1 premium Claude)
3 TTS tiers (Edge-TTS + Kokoro + ElevenLabs)
14 areas of law
8 supported jurisdictions
5,196 training pairs (v2 dataset, research track)
1 server process (~65 MB RSS)
$0.00/mo total operating cost
Senior Engineer Flags
KNOWN ISSUES — ARCHITECTURE REVIEW
The following are documented architectural concerns identified during code review. They are not bugs — they are tradeoffs made for speed of development on a single-developer, $0 infrastructure stack.
| Issue | Location | Severity | Detail |
| No session TTL / eviction |
server.py |
MEDIUM |
In-memory session dict grows unbounded. No TTL, no LRU eviction. On a low-traffic legal aid site this is acceptable; at scale it would OOM. |
| No session auth |
server.py |
MEDIUM |
Session IDs are 12-character UUIDs. No HMAC, no cookie signing, no CSRF token. Anyone with a valid session ID can resume that session. Acceptable for informational tool; not for anything with PII. |
| No request-level lock on session mutations |
server.py |
LOW |
ThreadingMixIn means concurrent requests can mutate the same session dict. In practice, users send one message at a time. Race condition is theoretically possible but practically unlikely. |
| Blocking synchronous HTTP in request threads |
jeremy_client.py |
LOW |
LLM API calls use urllib (synchronous). Each request thread blocks for 5-30s during inference. With ThreadingMixIn, this is fine at low concurrency. At scale, would need async or a task queue. |
| Divorce endpoint double-reads body |
server.py |
LOW |
The divorce POST handler reads the request body twice. In Python's HTTPServer, the body stream is consumed on first read. Second read returns empty. This is a bug that likely causes silent failures on the divorce workflow endpoint. |
CONTEXT
These issues are documented, not hidden. Jeremy is a legal information tool serving low-traffic pro se litigants — not a high-concurrency SaaS platform. The architecture is appropriate for its current scale and cost constraints ($0 infrastructure, $5/mo TTS). Fixing these would add complexity without immediate user-facing benefit.