Jeremy AI — Technical Architecture v1.2

1. System Architecture

Jeremy runs as a monolithic Python HTTP server behind an nginx reverse proxy on a single Oracle ARM A1 Micro instance. No containers, no orchestration, no cloud functions. One process, one port, one database.

Full-Stack Layer Diagram

┌─────────────────────────────────────────────────────────────────────┐ │ CLIENT LAYER │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ chat.html │ │ index.html │ │ divorce.html │ │ │ │ (Main UI) │ │ (Landing) │ │ (Workflow) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ └────────────┬────┴─────────────────┘ │ │ │ HTTPS :443 │ ├──────────────────────┼──────────────────────────────────────────────┤ │ REVERSE PROXY │ │ │ │ │ │ ┌───────────────────▼─────────────────────────────────┐ │ │ │ nginx │ │ │ │ SSL termination (Let's Encrypt) │ │ │ │ proxy_pass → http://127.0.0.1:7860 │ │ │ │ SPA fallback: try_files $uri /index.html │ │ │ └───────────────────┬─────────────────────────────────┘ │ │ │ HTTP :7860 │ ├──────────────────────┼──────────────────────────────────────────────┤ │ APPLICATION SERVER │ │ │ │ │ │ ┌───────────────────▼─────────────────────────────────┐ │ │ │ server.py (2,389 lines) │ │ │ │ HTTPServer + ThreadingMixIn │ │ │ │ In-memory session dict (12-char UUID keys) │ │ │ │ 207+ route handlers (do_GET / do_POST) │ │ │ └───┬──────────┬──────────┬──────────┬────────────────┘ │ │ │ │ │ │ │ ├──────┼──────────┼──────────┼──────────┼─────────────────────────────┤ │ PROCESSING ENGINES │ │ │ │ │ │ │ │ ┌───▼──────┐ ┌─▼────────┐ ┌▼────────┐ ┌▼─────────────┐ │ │ │ state │ │ risk │ │ legal │ │ jeremy │ │ │ │ machine │ │ engine │ │ rails │ │ client │ │ │ │ (FSM) │ │ (5 sig) │ │ (UPL) │ │ (AI tiers) │ │ │ └──────────┘ └──────────┘ └─────────┘ └──────────────┘ │ │ │ │ │ │ │ ┌───▼──────┐ ┌─▼────────┐ ┌───▼────────────┐ │ │ │ rule │ │ gate │ │ conversation │ │ │ │ engine │ │ bridge │ │ log │ │ │ └──────────┘ └──────────┘ └────────────────┘ │ │ │ ├────────────────────────────────────────────────────────────────────┤ │ DATA / PERSISTENCE │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │ │ │ memories.db │ │ conversations │ │ persistent │ │ │ │ SQLite WAL mode │ │ .jsonl │ │ memory (FTS5) │ │ │ │ 4 tables │ │ retraining log │ │ ~/.jeremy/ │ │ │ └──────────────────┘ └──────────────────┘ └────────────────┘ │ │ │ ├────────────────────────────────────────────────────────────────────┤ │ EXTERNAL SERVICES │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐│ │ │ Gemini API │ │ Anthropic │ │ Edge-TTS │ │ Court ││ │ │ 2.5 / 3.1 │ │ Claude │ │ (primary) │ │ Listener ││ │ │ (Tiers 1-3) │ │ Sonnet (T4) │ │ + Kokoro │ │ (lookup) ││ │ └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘│ └────────────────────────────────────────────────────────────────────┘

Key Architectural Decisions

ThreadingMixIn — each request gets its own thread. No async, no event loop. Simple concurrency model for low-traffic legal guidance.
No Docker — bare process on the VM. Eliminates container overhead on a resource-constrained 1-CPU ARM instance.
SQLite WAL mode — concurrent readers, single writer. Sufficient for expected throughput. No Postgres dependency.
In-memory sessions — session dict lives in process memory. Lost on restart. Acceptable tradeoff for a stateless-enough legal advisor.

2. Request Lifecycle

A single user message traverses 17 processing steps from HTTP POST to rendered response. Every step is synchronous within the request thread.

USER MESSAGE LIFECYCLE — 17 STEPS Browser (chat.html) │ ▼ ① fetch('/api/chat', POST) │ Body: { message, session_id } │ ▼ ② nginx SSL termination │ :443 → proxy_pass → :7860 │ ▼ ③ server.py do_POST('/api/chat') │ Parse JSON body, extract message + session_id │ ▼ ④ Session lookup │ sessions[session_id] or create new (12-char UUID) │ Load: state, history[], area_of_law, jurisdiction, facts{} │ ▼ ⑤ Custody keyword check │ Scan message against 23 CUSTODY_KEYWORDS │ + 1 contextual ("children" near custody context) │ − 14 negators that suppress false positives │ → If matched: inject custody disclaimer, set risk_override │ ▼ ⑥ State machine transition │ current_state → validate against VALID_TRANSITIONS │ If invalid: raise InvalidTransition (400 response) │ If valid: update session.state │ ▼ ⑦ Fact extraction (LLM) │ Send message to AI tier for structured fact extraction │ Returns: {entities, dates, amounts, relationships} │ Merge into session.facts{} │ ▼ ⑧ Area-of-law classification │ 14 areas scored: contract_review(5) → criminal_defense(50) │ Keyword matching + LLM confirmation │ Set session.area_of_law │ ▼ ⑨ Jurisdiction detection │ State/city extraction from facts │ Map to 8 supported jurisdictions │ Affects: filing deadlines, court procedures, fee waivers │ ▼ ⑩ Risk scoring (5 signals) │ Signal 1: Custody keywords → 0-100 │ Signal 2: Area-of-law severity → 5-50 │ Signal 3: Deadline urgency → 0-30 │ Signal 4: Prerequisite gaps → 0-40 │ Signal 5: Aggravating factors → 0-25 │ Composite: min(sum, 100) OR _max_risk() │ ▼ ⑪ enforce_rails() │ Check UPL prohibitions (8 rules) │ Apply safe harbor permissions (9 rules) │ Inject disclaimers by risk level: │ LOW (0-25): green banner, proceed │ MEDIUM (26-50): yellow banner, add caution │ HIGH (51-75): orange banner, recommend attorney │ CRITICAL (76+): red banner, hard stop + referral │ ▼ ⑫ Guidance generation (LLM) │ Select AI tier based on complexity/input length │ Inject: system prompt + legal rails + facts + history │ Inject: personality directives (Carter voice) │ Generate response with risk-appropriate guardrails │ ▼ ⑬ Memory persistence │ Write to memories.db: conversation entry, entity updates │ Write to conversations.jsonl: retraining archive │ Update personality_evolution if trait triggers detected │ ▼ ⑭ JSON response assembly │ { response, risk_level, risk_score, disclaimers[], │ area_of_law, state, suggested_actions[] } │ ▼ ⑮ formatText() — client-side │ Markdown → HTML rendering │ Legal citation linking │ Risk banner injection into DOM │ ▼ ⑯ speak() — client-side TTS │ POST /api/tts with response text │ Server: cleanForTTS() → ElevenLabs API → audio blob │ Client: play audio, activate KITT bars │ ▼ ⑰ KITT equalizer animation 32-bar Web Audio FFT-128 visualization Center-out gold gradient, glow at val>100 Idle breathing animation on silence

3. State Machine FSM

Jeremy operates on a 14-state finite state machine. Each session tracks its current state, and transitions are validated against a whitelist. Invalid transitions raise InvalidTransition and return a 400.

State Inventory

State	Module Owner	Entry Trigger	Exit Trigger
GREETING	server.py	New session created	User sends first message
INTAKE	server.py	First user message	Area of law classified
FACT_GATHERING	state_machine.py	Area classified	Minimum facts threshold met
AREA_CLASSIFICATION	state_machine.py	Facts sufficient	Area confirmed by LLM
JURISDICTION_CHECK	state_machine.py	Area confirmed	Jurisdiction resolved
RISK_ASSESSMENT	risk_engine.py	Jurisdiction set	Risk score computed
PREREQUISITE_CHECK	rule_engine.py	Risk assessed	All prerequisites evaluated
GUIDANCE	jeremy_client.py	Prerequisites clear	User asks follow-up or exits
DOCUMENT_PREP	server.py	User requests document	Document generated
FILING_GUIDANCE	rule_engine.py	Document ready	Filing instructions delivered
REFERRAL	legal_rails.py	Risk critical OR UPL trigger	Referral link provided
FOLLOW_UP	server.py	Post-guidance question	New topic or session end
ESCALATION	gate_bridge.py	Guard vote DENY	Owner review complete
CLOSED	server.py	User ends session	Terminal state

VALID_TRANSITIONS Map

VALID_TRANSITIONS = { GREETING → [INTAKE] INTAKE → [FACT_GATHERING, REFERRAL] FACT_GATHERING → [AREA_CLASSIFICATION, REFERRAL] AREA_CLASSIFICATION→ [JURISDICTION_CHECK, FACT_GATHERING] JURISDICTION_CHECK → [RISK_ASSESSMENT] RISK_ASSESSMENT → [PREREQUISITE_CHECK, REFERRAL, ESCALATION] PREREQUISITE_CHECK → [GUIDANCE, REFERRAL] GUIDANCE → [DOCUMENT_PREP, FILING_GUIDANCE, FOLLOW_UP, CLOSED] DOCUMENT_PREP → [FILING_GUIDANCE, GUIDANCE] FILING_GUIDANCE → [FOLLOW_UP, CLOSED] REFERRAL → [CLOSED] FOLLOW_UP → [FACT_GATHERING, GUIDANCE, CLOSED] ESCALATION → [GUIDANCE, REFERRAL, CLOSED] CLOSED → [] // terminal }

FSM Flow Diagram

┌──────────┐ │ GREETING │ └────┬─────┘ │ ┌────▼─────┐ │ INTAKE │ └────┬─────┘ │ ┌────────▼─────────┐ │ FACT_GATHERING │◄──────────────────┐ └────────┬─────────┘ │ │ │ ┌───────────▼────────────┐ │ │ AREA_CLASSIFICATION │────────────────┘ └───────────┬────────────┘ (need more facts) │ ┌───────────▼────────────┐ │ JURISDICTION_CHECK │ └───────────┬────────────┘ │ ┌───────────▼────────────┐ │ RISK_ASSESSMENT │──────────┐ └───────────┬────────────┘ │ │ ┌────▼──────┐ ┌───────────▼────────────┐ │ ESCALATION│ │ PREREQUISITE_CHECK │ └────┬──────┘ └───────────┬────────────┘ │ │ ┌────────┘ ┌────▼─────┐ │ ┌───────►│ GUIDANCE │◄───────┘ │ └──┬───┬───┘ │ │ │ ┌──────▼───┐ ┌────▼───▼──────┐ ┌──────────┐ │ FOLLOW_UP│ │ DOCUMENT_PREP │ │ REFERRAL │◄─── (any high-risk) └──────┬───┘ └────┬──────────┘ └────┬──────┘ │ ┌────▼──────────┐ │ │ │FILING_GUIDANCE│ │ │ └────┬──────────┘ │ │ │ │ └───────────┼───────────────────┘ │ ┌────▼───┐ │ CLOSED │ └────────┘

ENFORCEMENT

Any transition not in VALID_TRANSITIONS[current_state] raises InvalidTransition(current, attempted). The server catches this and returns HTTP 400 with the invalid transition pair logged. This prevents impossible state jumps — you cannot go from GREETING to DOCUMENT_PREP.

4. AI Tier Routing

Jeremy uses a 4-tier AI model hierarchy governed by the Prime Directive: save money first. The routing decision is deterministic based on task complexity, tool requirements, and input length. Free Gemini tiers handle the majority of traffic; paid Claude Sonnet is a premium fallback reserved for cases the free tiers cannot handle.

Tier Decision Tree

┌──────────────┐ │ Incoming Req │ └──────┬───────┘ │ ┌───────────▼───────────┐ │ task in COMPLEX_TASKS? │ └───┬───────────────┬───┘ YES NO │ │ ┌────▼─────┐ ┌──────▼──────┐ │ TIER 2 │ │ tools on? │ │ Flash │ │ │ └──────────┘ └──┬───────┬──┘ YES NO │ │ ┌────▼────┐ ┌▼────────┐ │ TIER 2 │ │ TIER 1 │ │ Flash │ │ Flash │ │ │ │ Lite │ └─────────┘ └─────────┘ Fallback chain (step up on empty / error / short response): TIER 1 → TIER 2 → TIER 3 → TIER 4 → then walk back down.

Tier Specifications

Tier	Model	Max Tokens	Cost/Call	Use Case
1	Gemini 2.5 Flash Lite	1,000	$0.00	Simple greetings, clarifications, no-tool chat
2	Gemini 2.5 Flash	1,500	$0.00	Default for tool-enabled sessions and complex tasks
3	Gemini 3.1 Pro Preview	2,000	~$0.01	Long inputs (>2,000 chars) and tool-result follow-ups
4	Claude Sonnet 4	2,000	~$0.01	Premium fallback when all Gemini tiers fail or degrade

Implementation Details

Singleton pattern — jeremy_client.py holds one JeremyClient instance. All four tiers share the same client and session state.
Rate limiting — 100ms minimum between API calls (global, not per-tier). Prevents burst billing and API abuse.
Minimum tier floor — Any session with tools enabled starts at Tier 2 minimum. Tier 1 (Flash Lite) cannot reliably emit tool-call JSON, so the router clamps it out whenever the tool system is loaded.
Tiered fallback walk — If the chosen tier returns None or a response shorter than 50 chars, the router steps up one tier and retries. Once it reaches the top, it walks back down to lower tiers. All four tiers must fail before Jeremy returns an error.
Anthropic availability check — Tier 4 is skipped entirely if ANTHROPIC_API_KEY is not set. The free Gemini stack is fully self-sufficient.
COMPLEX_TASKS set — narrative_to_facts, contract_to_clauses, structured_to_guidance. These bypass Tier 1 and start at Tier 2.

5. Risk Engine

The risk engine computes a composite score (0-100) from 5 independent signals. Each signal is deterministic — no LLM involved in scoring. The LLM is only used upstream for fact extraction; the scoring itself is pure keyword matching and arithmetic.

Signal 1: Custody Keywords

CUSTODY_KEYWORDS = [ "custody", "visitation", "parenting time", "parenting plan", "custodial", "non-custodial", "sole custody", "joint custody", "physical custody", "legal custody", "child support", "parental rights", "termination of parental rights", "TPR", "guardian ad litem", "GAL", "best interests of the child", "UCCJEA", "Hague Convention", "parental alienation", "supervised visitation", "forensic evaluation", "child protective" ] # 23 primary keywords CONTEXTUAL_TRIGGER = "children" # only fires near custody context NEGATORS = [ "no children", "no kids", "childless", "not a parent", "don't have children", "do not have children", "never had children", "not about custody", "unrelated to custody", "not a custody matter", "no custody issue", "custody is not", "custody isn't", "not seeking custody" ] # 14 negators that suppress false positives IMMINENT_RISK = [ "emergency custody", "ex-parte", "TRO", "kidnapping", "flee", "fleeing", "abduction", "taken my child" ] # 8 imminent danger keywords → score = 100 (hard ceiling)

Signal 2: Area-of-Law Severity

Area of Law	Base Score	Rationale
contract_review	5	Low risk, informational
small_claims	10	Limited jurisdiction, low stakes
landlord_tenant	15	Housing rights, some urgency
consumer_protection	15	UDAP, warranty claims
employment	20	EEOC deadlines, retaliation risk
debt_collection	20	FDCPA, SOL concerns
family_law	25	Divorce, property division
immigration	30	Deportation risk, complex procedures
personal_injury	30	SOL pressure, medical complexity
police_misconduct	35	§1983, qualified immunity, evidence loss
bankruptcy	35	Asset protection, means test
civil_rights	40	Constitutional claims, systemic
custody	45	Child welfare, emergency orders
criminal_defense	50	Liberty at stake, Miranda, plea implications

Signal 3: Deadline Urgency

DEADLINE_TRIGGERS = { "imminent": ["tomorrow", "today", "tonight", "this morning", "right now", "happening now", "hours"], # +30 "urgent": ["this week", "next week", "few days", "running out of time", "deadline"], # +20 "pressing": ["this month", "next month", "30 days", "soon", "coming up"], # +10 "aware": ["eventually", "planning to", "thinking about", "want to", "considering"] # +5 }

Signal 4: Prerequisite Gaps

Each area of law has REQUIRED prerequisites: e.g., employment → [EEOC_charge_filed, right_to_sue_letter, SOL_check] For each unmet prerequisite: +5 points If prerequisite is URGENT: +15 points If prerequisite is EXPIRED: +20 points Maximum from this signal: 40 (capped)

Signal 5: Aggravating Factors

Category	Keywords	Points	Risk Bump
Violence	hit, punch, assault, attack, weapon, gun, knife	+25	→ CRITICAL
Threats	threatened, threatening, intimidation, stalking, harassing	+20	→ HIGH
Children at risk	child abuse, neglect, CPS, ACS, foster care	+25	→ CRITICAL
Financial harm	stolen, fraud, scam, identity theft, drained account	+15	→ HIGH
Housing emergency	eviction notice, lockout, illegal eviction, marshal	+15	→ HIGH
Incarceration	arrested, jail, prison, bail, arraignment, warrant	+20	→ CRITICAL
Self-harm	hurt myself, end it, suicide, self-harm, give up	+25	→ CRITICAL + crisis referral

Composite Formula

def compute_risk(signals): raw = sum(s.score for s in signals) composite = min(raw, 100) # _max_risk() override: if ANY signal alone ≥ threshold, # take the highest individual signal score max_single = max(s.score for s in signals) return max(composite, max_single) # whichever is higher

DESIGN NOTE

The _max_risk() function ensures that a single catastrophic signal (e.g., imminent custody = 100) cannot be diluted by low scores from other signals. A user mentioning "my ex is fleeing with my child" hits score 100 regardless of all other factors.

6. Legal Rails

The legal rails system prevents unauthorized practice of law (UPL) while maximizing the information Jeremy can safely provide. Two rule sets govern behavior: 8 prohibitions and 9 safe harbor permissions.

UPL Prohibitions (8 Rules)

PROHIBITIONS = [ "Do not tell the user what to do in their specific case", "Do not predict case outcomes or chances of success", "Do not recommend specific legal strategies as advice", "Do not draft legal documents presented as final/ready-to-file", "Do not interpret how a law applies to their specific facts", "Do not recommend whether to accept or reject a settlement", "Do not advise on plea deals or criminal defense strategy", "Do not represent yourself as an attorney or legal professional" ]

Safe Harbor (9 Permitted)

SAFE_HARBOR = [ "Explain what a law says in plain language", "Describe court procedures, filing steps, and deadlines", "Provide general information about legal rights", "Explain legal terms and concepts", "Help organize facts and documents for their case", "Provide form templates with blank fields", "Describe what others in similar situations have done", "Explain the pros and cons of different approaches generally", "Direct to legal aid, bar associations, and court resources" ]

Disclaimer Strings

PERSISTENT_DISCLAIMER = "I'm an AI legal assistant, not an attorney. This is legal information, not legal advice. No attorney-client relationship is formed. For your specific situation, consult a licensed attorney in your jurisdiction." FOOTER_DISCLAIMER = "Pro Se Network provides legal information, not legal advice. Jeremy is an AI assistant — not an attorney." DOCUMENT_HEADER = "TEMPLATE — NOT LEGAL ADVICE. This document is a template for informational purposes only. Have an attorney review before filing."

enforce_rails() Gate Logic

enforce_rails(message, risk_level, area) │ ▼ ┌──────────────────────┐ │ Check PROHIBITIONS │ │ against response │───── Match? ──── Redact + inject disclaimer └──────────┬───────────┘ │ Clean ▼ ┌──────────────────────┐ │ Check risk_level │ └──┬───┬───┬───┬───────┘ │ │ │ │ LOW MED HIGH CRIT │ │ │ │ │ │ │ └─► Red banner + hard stop + attorney referral link │ │ └─────► Orange banner + "strongly recommend attorney" │ └─────────► Yellow banner + "consider consulting attorney" └─────────────► Green banner + proceed normally

Risk Level Behavior

Level	Score	Color	Banner	Behavior
LOW	0-25	Green	Informational	Full guidance, standard disclaimers
MEDIUM	26-50	Yellow	Caution	Guidance + attorney suggestion
HIGH	51-75	Orange	Warning	Limited guidance + strong attorney recommendation
CRITICAL	76-100	Red	Hard Stop	No guidance — referral only + crisis resources if applicable

7. Memory System

Jeremy's memory operates on two layers: a runtime SQLite database (memories.db) for session intelligence, and a persistent filesystem store (~/.jeremy/memory/) with FTS5 full-text search for cross-session recall.

SQLite Schema (4 Tables)

CREATE TABLE conversations ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, role TEXT NOT NULL, -- 'user' | 'assistant' content TEXT NOT NULL, area_of_law TEXT, risk_score INTEGER DEFAULT 0, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE entity_memory ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, entity_type TEXT NOT NULL, -- 'person' | 'date' | 'amount' | 'org' entity_name TEXT NOT NULL, context TEXT, first_seen DATETIME DEFAULT CURRENT_TIMESTAMP, last_seen DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE gate_decisions ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, tool_name TEXT NOT NULL, risk_level TEXT NOT NULL, -- 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL' votes TEXT NOT NULL, -- JSON: [{guard, vote, reason}] verdict TEXT NOT NULL, -- 'APPROVED' | 'DENIED' timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE personality_evolution ( id INTEGER PRIMARY KEY AUTOINCREMENT, trait TEXT NOT NULL, -- 'warmth' | 'directness' | 'humor' | 'formality' score REAL DEFAULT 0.5, trigger TEXT, -- what caused the shift timestamp DATETIME DEFAULT CURRENT_TIMESTAMP );

WAL Mode + Thread Safety

PRAGMA journal_mode = WAL; PRAGMA busy_timeout = 5000; # Python-side: _write_lock = threading.Lock() def _write(self, sql, params): with self._write_lock: conn = sqlite3.connect(self.db_path) conn.execute(sql, params) conn.commit() conn.close()

DESIGN NOTE

WAL mode allows concurrent reads while serializing writes through _write_lock. Each write opens and closes its own connection — no long-lived connection objects that could leak across threads.

Context Message Assembly Pipeline (6 Layers)

build_context(session_id, new_message) │ ▼ Layer 1: System Prompt │ Jeremy's personality, role definition, Carter voice directives │ ▼ Layer 2: Legal Rails Injection │ PROHIBITIONS + SAFE_HARBOR + risk-level-specific instructions │ ▼ Layer 3: Conversation History │ Last 10 messages from conversations table │ Ordered by timestamp ASC │ ▼ Layer 4: Entity Recall │ All entities from entity_memory for this session │ Injected as: "You know: [person: John], [date: March 15], ..." │ ▼ Layer 5: Cross-Session Summary │ If returning user: summarize prior sessions from persistent memory │ FTS5 search on user identifiers │ ▼ Layer 6: Current Message User's new message appended as final context entry

8. Tool System

Jeremy has 27 registered tools managed through registry.py. Each tool has a risk level, a gate requirement, and a description. Tool execution is sandboxed with resource caps and command whitelists.

Tool Risk Matrix (27 Tools)

Tool	Risk	Gate	Description
web_search	LOW	Auto	Search legal databases, court info
case_lookup	LOW	Auto	CourtListener case search
statute_lookup	LOW	Auto	Look up specific statutes
court_info	LOW	Auto	Court addresses, hours, procedures
deadline_calc	LOW	Auto	Calculate filing deadlines
fee_waiver_check	LOW	Auto	Check IFP eligibility
form_finder	LOW	Auto	Find court forms by jurisdiction
legal_aid_search	LOW	Auto	Find free legal aid near user
document_template	MEDIUM	Gate	Generate document templates
letter_draft	MEDIUM	Gate	Draft demand/complaint letters
evidence_checklist	LOW	Auto	Generate evidence preservation list
timeline_builder	LOW	Auto	Build chronological case timeline
risk_assessment	MEDIUM	Gate	Run full risk engine analysis
jurisdiction_check	LOW	Auto	Determine proper jurisdiction
prerequisite_check	LOW	Auto	Check filing prerequisites
memory_store	LOW	Auto	Store fact to persistent memory
memory_recall	LOW	Auto	Recall from persistent memory
send_email	HIGH	Gate + Security	Send email via SMTP vault
send_sms	HIGH	Gate + Security	Send SMS notification
file_read	MEDIUM	Gate	Read uploaded user files
file_write	HIGH	Gate + Security	Write/generate files
shell_exec	CRITICAL	Gate + Owner	Execute shell commands (sandboxed)
api_call	HIGH	Gate + Security	Make external API calls
db_query	MEDIUM	Gate	Query memories.db
personality_adjust	LOW	Auto	Adjust personality trait scores
gate_talk	MEDIUM	Gate	Initiate Warden terminal session
escalate	HIGH	Gate + Security	Escalate to human review

Approval Flow

APPROVAL LEVELS: LOW → Auto-approve, no gate check MEDIUM → Gate check (4 guards vote) HIGH → Gate check + security review (vault verification) CRITICAL → Gate check + owner approval required (blocks until ACK)

Sandbox: ALLOWED_COMMANDS + BLOCKED_PATTERNS

ALLOWED_COMMANDS = [ "ls", "cat", "head", "tail", "grep", "wc", "find", "date", "whoami", "pwd", "echo", "python3", "pip3", "curl" ] # 13 whitelisted commands — everything else blocked BLOCKED_PATTERNS = [ "rm -rf", "rm -r", "rmdir", "mkfs", "dd if=", "chmod 777", "curl.*|.*sh", "wget.*|.*sh", "> /dev/", "sudo", "su ", "passwd", "useradd", "kill", "pkill", "reboot", "shutdown" ] # 16+ blocked patterns — checked before execution RESOURCE_CAPS = { "memory": "256MB", # ulimit -v "timeout": "30s", # subprocess timeout "network": "10s" # urllib timeout }

Vault System

SMTP_ACCOUNTS = [ "account_1", # Primary sending account "account_2", # Secondary "account_3" # Fallback ] ALLOWED_SMTP_HOSTS = frozenset([ "smtp.gmail.com", "smtp.office365.com" ]) # Credentials loaded from environment: # SMTP_USER_1, SMTP_PASS_1, SMTP_HOST_1 # SMTP_USER_2, SMTP_PASS_2, SMTP_HOST_2 # SMTP_USER_3, SMTP_PASS_3, SMTP_HOST_3

Rate Limiter

SQLite-backed token bucket: SMS: 5/minute, 50/day Email: 10/minute, 100/day Implementation: rate_limiter.py - One SQLite table: rate_limits (resource, tokens, last_refill, window) - Token bucket refill on check - Atomic decrement with _write_lock - Separate buckets per resource type

9. Quadrant Guard

The Quadrant Guard is a 4-guard gate system that governs tool execution. Each guard evaluates the request independently, casts an APPROVE or DENY vote with reasoning, and the majority verdict determines execution.

Intent Classification

4 QUADRANTS (keyword scoring): Q1: INFORMATION — lookup, search, find, check, show, list, what is Q2: CREATION — create, generate, draft, build, write, template Q3: COMMUNICATION — send, email, text, notify, contact, message Q4: EXECUTION — run, execute, shell, command, install, deploy Each incoming tool request is scored against all 4 quadrants. Highest score determines primary classification. Multi-quadrant hits increase scrutiny level.

Guard Vote Flow

gate_check(tool_name, args, session) │ ▼ ┌───────────────────────────┐ │ Intent Classification │ │ Score against 4 quadrants │ └─────────────┬─────────────┘ │ ┌─────────▼─────────┐ │ Invoke 4 Guards │ └──┬──┬──┬──┬───────┘ │ │ │ │ ┌────▼┐┌▼──┐┌▼──┐┌▼─────┐ │ G1 ││G2 ││G3 ││ G4 │ │SAFE ││UPL││RISK││SCOPE │ └──┬──┘└─┬─┘└─┬─┘└──┬───┘ │ │ │ │ ▼ ▼ ▼ ▼ APPROVE DENY APPROVE APPROVE ← example │ │ │ │ └─────┼────┼─────┘ ▼ ┌─────────────────┐ │ MAJORITY VOTE │ │ 3/4 APPROVE │ │ Verdict: APPROVE │ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Log to │ │ gate_decisions │ │ (memories.db) │ └─────────────────┘

Guard Responsibilities

Guard	Name	Checks
G1	Safety Guard	Resource caps, blocked patterns, sandbox compliance
G2	UPL Guard	Unauthorized practice of law violations, disclaimer presence
G3	Risk Guard	Current risk score vs tool risk level, escalation threshold
G4	Scope Guard	Tool within session context, no scope creep, rate limits

Warden Terminal

gate_talk.py provides a direct terminal interface to the gate system — the "Warden" mode. Used for manual gate overrides, audit log inspection, and guard diagnostics. Accessible only through the gate_talk tool (MEDIUM risk, requires gate approval itself).

10. Voice Pipeline

Jeremy speaks with the Steffan voice — measured, confident, narrator-tone. As of v1.2, voice is a tiered free-first TTS stack: Edge-TTS serves primary traffic sub-1s, Kokoro ONNX is a fully-local fallback, and ElevenLabs is held in reserve as a premium tier. The fallback walks top-down and returns a clean 503 if every tier fails (no more silent degradation to browser speechSynthesis).

Tiered TTS Stack

JEREMY VOICE — 3-TIER TTS FALLBACK POST /api/tts { text: "..." } │ ▼ TIER 1 — Edge-TTS (primary, free) │ Provider: Microsoft Azure Neural (free endpoint) │ Voice: en-US-SteffanNeural │ Latency: ~0.6s for 500 chars │ Cost: $0.00 │ Returns: audio/mpeg (MP3) │ ▼ fail TIER 2 — Kokoro ONNX (local, free) │ Runtime: kokoro-onnx on CPU │ Voice: am_michael (warm American male) │ Model: kokoro-v1.0.onnx (310 MB) + voices-v1.0.bin (27 MB) │ Latency: ~2s x86, much slower on ARM A1 │ Cost: $0.00 │ Returns: audio/wav │ ▼ fail TIER 3 — ElevenLabs (premium fallback) │ Provider: ElevenLabs API │ Voice: Carter D (GorLj2SsI4u2JqL58gAA) │ Model: eleven_v3 │ Cost: ~$0.01 / request (when credits available) │ Gated: Only invoked if ELEVEN_API_KEY is set │ ▼ all fail HTTP 503 { "error": "Voice unavailable — all TTS engines failed" }

Endpoint Contract

POST /api/tts Body: { "text": "..." } Max chars: 1,000 per request (cut at sentence boundary) Timeout: 30s per tier Returns: audio/mpeg (Edge) or audio/wav (Kokoro) blob Errors: 400 no text | 503 all tiers failed

Why Edge-TTS, not Kokoro, as primary on the live VM

Kokoro ONNX is the highest-quality of the three, but the production VM is an Oracle ARM A1 Micro. In benchmark, Kokoro inference on that hardware runs at roughly 1 second per character of input — a 300-char response would exceed 4 minutes of inference and blow through any sane proxy timeout. Edge-TTS, by contrast, is a network call to Microsoft's free neural endpoint that returns in under a second regardless of host CPU. So the order was inverted for the live deployment: Edge-TTS primary, Kokoro reserved as a local safety net in case the Edge endpoint is ever blocked, and ElevenLabs only if both fail and a key is configured.

cleanForTTS() — 11 Regex Passes

KITT 32-Bar Equalizer

KITT EQUALIZER — Web Audio API AudioContext → AnalyserNode (fftSize: 128) │ ▼ getByteFrequencyData() → Uint8Array[64] │ ▼ Take 32 bars (indices 0-31) Map center-out: bar[0] at center, bar[31] at edges │ ▼ Render: ┌──────────────────────────────────────────────────────┐ │ ▉ ▉ │ │ ▉ ▉ ▉ ▉ │ │ ▉ ▉ ▉ ▉ ▉ ▉ │ │ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ │ │ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ │ └──────────────────────────────────────────────────────┘ Color: Gold gradient (#D4AF37 → #B08D57) Glow: box-shadow at val > 100 Idle: Breathing animation (sine wave, 0.5-3px) Decay: Staggered fade-out per bar (30ms delay each)

speak() Function Flow

async function speak(text) { const clean = cleanForTTS(text); if (!clean) return; try { const res = await fetch('/api/tts', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: clean }) }); if (!res.ok) throw new Error('TTS failed'); const blob = await res.blob(); const url = URL.createObjectURL(blob); const audio = new Audio(url); // Connect to Web Audio for KITT bars const source = audioCtx.createMediaElementSource(audio); source.connect(analyser); analyser.connect(audioCtx.destination); startKITT(); // begin animation loop audio.play(); audio.onended = () => stopKITT(); } catch (e) { // v1.2: NO browser-synth fallback. If every server tier fails, // the UI fails silently rather than impersonate Jeremy with a // generic speechSynthesis voice. Voice integrity > voice presence. console.log('Voice unavailable — all server TTS engines failed'); stopKITT(); } }

11. Infrastructure

Jeremy runs on a single Oracle Cloud ARM A1 Micro instance — free tier, no monthly cost. The entire stack is one process behind nginx.

VM Specification

Property	Value
Provider	Oracle Cloud (Always Free)
Instance	ARM A1 Micro
IP	129.159.169.37
CPU	1 ARM core
RAM	6 GB
Disk	50 GB boot volume
OS	Ubuntu 22.04 LTS (aarch64)
Cost	$0.00/mo

nginx Configuration

server { listen 443 ssl; server_name prosenetwork.org www.prosenetwork.org; ssl_certificate /etc/letsencrypt/live/prosenetwork.org/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/prosenetwork.org/privkey.pem; root /home/ubuntu/pro-se-network/app; index index.html; location /api/ { proxy_pass http://127.0.0.1:7860; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_read_timeout 120s; } location / { try_files $uri $uri/ /index.html; } }

Process Inventory

PID CMD RSS ----- ------------------------------- ------- XXXX python3 src/server.py ~65 MB └── ThreadingMixIn threads ~2 MB each └── SQLite WAL (memories.db) ~5 MB Total runtime footprint: ~80 MB

Persistent Memory Filesystem

~/.jeremy/ └── memory/ ├── persistent.db # FTS5-indexed SQLite ├── sessions/ # Per-session summaries │ ├── abc123.json │ └── def456.json └── entities/ # Cross-session entity store ├── persons.json └── dates.json

FTS5 Index Schema

CREATE VIRTUAL TABLE memory_fts USING fts5( content, session_id, area_of_law, timestamp, tokenize='porter unicode61' ); -- Queries: -- SELECT * FROM memory_fts WHERE memory_fts MATCH 'custody AND brooklyn'; -- SELECT * FROM memory_fts WHERE memory_fts MATCH 'landlord NEAR tenant';

Bridge / Quarantine System

BRIDGE CONCEPT

Tools flagged by the Quadrant Guard as DENY enter a "quarantine" state. The bridge allows an owner to manually review quarantined actions, approve or permanently deny them, and optionally adjust guard parameters. Bridge state is stored in gate_decisions table with verdict='QUARANTINED' until resolved.

12. Data Pipeline

Jeremy runs dual logging: memories.py feeds the runtime SQLite brain, while conversation_log.py writes append-only JSONL for retraining. The two systems are independent — neither blocks the other.

Dual Logging Architecture

DUAL LOGGING User Message │ ├──────────────────────────────┐ ▼ ▼ memories.py → SQLite conversation_log.py → JSONL (Runtime Brain) (Retraining Archive) │ │ ├── conversations table ├── session_start entry ├── entity_memory table ├── user_exchange entry ├── gate_decisions table ├── assistant_exchange entry └── personality_evolution └── area + risk tags │ │ ▼ ▼ Powers: history recall, Powers: fine-tuning dataset, entity injection, context area distribution analysis, assembly, gate audits quality review, retraining

JSONL Entry Format

// Session start {"type":"session_start","session_id":"abc123","timestamp":"2026-02-25T14:30:00Z"} // User exchange {"type":"user","session_id":"abc123","content":"I need help with my lease", "area_of_law":"landlord_tenant","risk_score":15,"timestamp":"..."} // Assistant exchange {"type":"assistant","session_id":"abc123","content":"Let me help you understand...", "area_of_law":"landlord_tenant","risk_score":15,"tier_used":0,"tokens":847, "timestamp":"..."}

Retraining Flow

RETRAINING PIPELINE conversations.jsonl (accumulates) │ ▼ Filter: quality threshold │ - Remove short/empty exchanges │ - Remove CRITICAL-risk sessions (too sensitive) │ - Keep only GUIDANCE-state exchanges │ ▼ Tag: area_of_law + risk_level │ - Balance across 12 areas │ - Ensure jurisdiction diversity │ ▼ Format: instruction/response pairs │ - System prompt + user message → assistant response │ ▼ Fine-tune: Phi-3 3.8B (current base) │ ▼ Merge: LoRA → full model │ Current: israelburns/jeremy-v1-merged (7.64 GB) │ ▼ Deploy: HF Space or local inference

Current Training Data

Dataset: 5,196 instruction/response pairs (v2) Areas: 12 areas of law Format: {"instruction": "...", "input": "...", "output": "..."} Source: Hand-curated + synthetic + conversation logs Artifact: israelburns/jeremy-v1-merged (Phi-3 3.8B LoRA merge, 7.64 GB on HF)

NOTE — Production vs. research track

The fine-tuned Phi-3 3.8B merged model is a research artifact, not the production inference path. Jeremy's live responses on prosenetwork.org are served 100% by the 4-tier API stack described in Section 4 (Gemini + Claude). The conversation logs and training pipeline exist to support a future self-hosted path — not to power the current build.

13. Code Metrics + Known Issues

Build Summary

JEREMY AI — BUILD METRICS 83 Python modules 41,635 lines of code 207+ HTTP endpoints (do_GET + do_POST handlers) 14 FSM states 5 risk signals 27 registered tools 4 gate guards 4 AI tiers (3 free Gemini + 1 premium Claude) 3 TTS tiers (Edge-TTS + Kokoro + ElevenLabs) 14 areas of law 8 supported jurisdictions 5,196 training pairs (v2 dataset, research track) 1 server process (~65 MB RSS) $0.00/mo total operating cost

Senior Engineer Flags

KNOWN ISSUES — ARCHITECTURE REVIEW

The following are documented architectural concerns identified during code review. They are not bugs — they are tradeoffs made for speed of development on a single-developer, $0 infrastructure stack.

Issue	Location	Severity	Detail
No session TTL / eviction	server.py	MEDIUM	In-memory session dict grows unbounded. No TTL, no LRU eviction. On a low-traffic legal aid site this is acceptable; at scale it would OOM.
No session auth	server.py	MEDIUM	Session IDs are 12-character UUIDs. No HMAC, no cookie signing, no CSRF token. Anyone with a valid session ID can resume that session. Acceptable for informational tool; not for anything with PII.
No request-level lock on session mutations	server.py	LOW	ThreadingMixIn means concurrent requests can mutate the same session dict. In practice, users send one message at a time. Race condition is theoretically possible but practically unlikely.
Blocking synchronous HTTP in request threads	jeremy_client.py	LOW	LLM API calls use urllib (synchronous). Each request thread blocks for 5-30s during inference. With ThreadingMixIn, this is fine at low concurrency. At scale, would need async or a task queue.
Divorce endpoint double-reads body	server.py	LOW	The divorce POST handler reads the request body twice. In Python's HTTPServer, the body stream is consumed on first read. Second read returns empty. This is a bug that likely causes silent failures on the divorce workflow endpoint.

CONTEXT

These issues are documented, not hidden. Jeremy is a legal information tool serving low-traffic pro se litigants — not a high-concurrency SaaS platform. The architecture is appropriate for its current scale and cost constraints ($0 infrastructure, $5/mo TTS). Fixing these would add complexity without immediate user-facing benefit.