INTERNAL ENGINEERING DOCUMENT

JEREMY AI

Technical Architecture v1.2

Pro Se Network Engineering Reference · See Evolution →

BUILD 83 modules | 41,635 LOC | 207+ endpoints | 14-state FSM STACK Python 3.12 + SQLite3 WAL + Gemini API + Anthropic API + Edge-TTS + Kokoro ONNX INFRA Oracle ARM A1 Micro | nginx + Let's Encrypt | No Docker (bare process) COST $0.00/mo (all-free tier stack as of v1.2)

1. System Architecture

Jeremy runs as a monolithic Python HTTP server behind an nginx reverse proxy on a single Oracle ARM A1 Micro instance. No containers, no orchestration, no cloud functions. One process, one port, one database.

Full-Stack Layer Diagram

┌─────────────────────────────────────────────────────────────────────┐ │ CLIENT LAYER │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ chat.html │ │ index.html │ │ divorce.html │ │ │ │ (Main UI) │ │ (Landing) │ │ (Workflow) │ │ │ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ │ └────────────┬────┴─────────────────┘ │ │ │ HTTPS :443 │ ├──────────────────────┼──────────────────────────────────────────────┤ │ REVERSE PROXY │ │ │ │ │ │ ┌───────────────────▼─────────────────────────────────┐ │ │ │ nginx │ │ │ │ SSL termination (Let's Encrypt) │ │ │ │ proxy_pass → http://127.0.0.1:7860 │ │ │ │ SPA fallback: try_files $uri /index.html │ │ │ └───────────────────┬─────────────────────────────────┘ │ │ │ HTTP :7860 │ ├──────────────────────┼──────────────────────────────────────────────┤ │ APPLICATION SERVER │ │ │ │ │ │ ┌───────────────────▼─────────────────────────────────┐ │ │ │ server.py (2,389 lines) │ │ │ │ HTTPServer + ThreadingMixIn │ │ │ │ In-memory session dict (12-char UUID keys) │ │ │ │ 207+ route handlers (do_GET / do_POST) │ │ │ └───┬──────────┬──────────┬──────────┬────────────────┘ │ │ │ │ │ │ │ ├──────┼──────────┼──────────┼──────────┼─────────────────────────────┤ │ PROCESSING ENGINES │ │ │ │ │ │ │ │ ┌───▼──────┐ ┌─▼────────┐ ┌▼────────┐ ┌▼─────────────┐ │ │ │ state │ │ risk │ │ legal │ │ jeremy │ │ │ │ machine │ │ engine │ │ rails │ │ client │ │ │ │ (FSM) │ │ (5 sig) │ │ (UPL) │ │ (AI tiers) │ │ │ └──────────┘ └──────────┘ └─────────┘ └──────────────┘ │ │ │ │ │ │ │ ┌───▼──────┐ ┌─▼────────┐ ┌───▼────────────┐ │ │ │ rule │ │ gate │ │ conversation │ │ │ │ engine │ │ bridge │ │ log │ │ │ └──────────┘ └──────────┘ └────────────────┘ │ │ │ ├────────────────────────────────────────────────────────────────────┤ │ DATA / PERSISTENCE │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ ┌────────────────┐ │ │ │ memories.db │ │ conversations │ │ persistent │ │ │ │ SQLite WAL mode │ │ .jsonl │ │ memory (FTS5) │ │ │ │ 4 tables │ │ retraining log │ │ ~/.jeremy/ │ │ │ └──────────────────┘ └──────────────────┘ └────────────────┘ │ │ │ ├────────────────────────────────────────────────────────────────────┤ │ EXTERNAL SERVICES │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌───────────┐│ │ │ Gemini API │ │ Anthropic │ │ Edge-TTS │ │ Court ││ │ │ 2.5 / 3.1 │ │ Claude │ │ (primary) │ │ Listener ││ │ │ (Tiers 1-3) │ │ Sonnet (T4) │ │ + Kokoro │ │ (lookup) ││ │ └──────────────┘ └──────────────┘ └──────────────┘ └───────────┘│ └────────────────────────────────────────────────────────────────────┘

Key Architectural Decisions

  • ThreadingMixIn — each request gets its own thread. No async, no event loop. Simple concurrency model for low-traffic legal guidance.
  • No Docker — bare process on the VM. Eliminates container overhead on a resource-constrained 1-CPU ARM instance.
  • SQLite WAL mode — concurrent readers, single writer. Sufficient for expected throughput. No Postgres dependency.
  • In-memory sessions — session dict lives in process memory. Lost on restart. Acceptable tradeoff for a stateless-enough legal advisor.

2. Request Lifecycle

A single user message traverses 17 processing steps from HTTP POST to rendered response. Every step is synchronous within the request thread.

USER MESSAGE LIFECYCLE — 17 STEPS Browser (chat.html) │ ▼ ① fetch('/api/chat', POST) │ Body: { message, session_id } │ ▼ ② nginx SSL termination │ :443 → proxy_pass → :7860 │ ▼ ③ server.py do_POST('/api/chat') │ Parse JSON body, extract message + session_id │ ▼ ④ Session lookup │ sessions[session_id] or create new (12-char UUID) │ Load: state, history[], area_of_law, jurisdiction, facts{} │ ▼ ⑤ Custody keyword check │ Scan message against 23 CUSTODY_KEYWORDS │ + 1 contextual ("children" near custody context) │ − 14 negators that suppress false positives │ → If matched: inject custody disclaimer, set risk_override │ ▼ ⑥ State machine transition │ current_state → validate against VALID_TRANSITIONS │ If invalid: raise InvalidTransition (400 response) │ If valid: update session.state │ ▼ ⑦ Fact extraction (LLM) │ Send message to AI tier for structured fact extraction │ Returns: {entities, dates, amounts, relationships} │ Merge into session.facts{} │ ▼ ⑧ Area-of-law classification │ 14 areas scored: contract_review(5) → criminal_defense(50) │ Keyword matching + LLM confirmation │ Set session.area_of_law │ ▼ ⑨ Jurisdiction detection │ State/city extraction from facts │ Map to 8 supported jurisdictions │ Affects: filing deadlines, court procedures, fee waivers │ ▼ ⑩ Risk scoring (5 signals) │ Signal 1: Custody keywords → 0-100 │ Signal 2: Area-of-law severity → 5-50 │ Signal 3: Deadline urgency → 0-30 │ Signal 4: Prerequisite gaps → 0-40 │ Signal 5: Aggravating factors → 0-25 │ Composite: min(sum, 100) OR _max_risk() │ ▼ ⑪ enforce_rails() │ Check UPL prohibitions (8 rules) │ Apply safe harbor permissions (9 rules) │ Inject disclaimers by risk level: │ LOW (0-25): green banner, proceed │ MEDIUM (26-50): yellow banner, add caution │ HIGH (51-75): orange banner, recommend attorney │ CRITICAL (76+): red banner, hard stop + referral │ ▼ ⑫ Guidance generation (LLM) │ Select AI tier based on complexity/input length │ Inject: system prompt + legal rails + facts + history │ Inject: personality directives (Carter voice) │ Generate response with risk-appropriate guardrails │ ▼ ⑬ Memory persistence │ Write to memories.db: conversation entry, entity updates │ Write to conversations.jsonl: retraining archive │ Update personality_evolution if trait triggers detected │ ▼ ⑭ JSON response assembly │ { response, risk_level, risk_score, disclaimers[], │ area_of_law, state, suggested_actions[] } │ ▼ ⑮ formatText() — client-side │ Markdown → HTML rendering │ Legal citation linking │ Risk banner injection into DOM │ ▼ ⑯ speak() — client-side TTS │ POST /api/tts with response text │ Server: cleanForTTS() → ElevenLabs API → audio blob │ Client: play audio, activate KITT bars │ ▼ ⑰ KITT equalizer animation 32-bar Web Audio FFT-128 visualization Center-out gold gradient, glow at val>100 Idle breathing animation on silence

3. State Machine FSM

Jeremy operates on a 14-state finite state machine. Each session tracks its current state, and transitions are validated against a whitelist. Invalid transitions raise InvalidTransition and return a 400.

State Inventory

StateModule OwnerEntry TriggerExit Trigger
GREETINGserver.pyNew session createdUser sends first message
INTAKEserver.pyFirst user messageArea of law classified
FACT_GATHERINGstate_machine.pyArea classifiedMinimum facts threshold met
AREA_CLASSIFICATIONstate_machine.pyFacts sufficientArea confirmed by LLM
JURISDICTION_CHECKstate_machine.pyArea confirmedJurisdiction resolved
RISK_ASSESSMENTrisk_engine.pyJurisdiction setRisk score computed
PREREQUISITE_CHECKrule_engine.pyRisk assessedAll prerequisites evaluated
GUIDANCEjeremy_client.pyPrerequisites clearUser asks follow-up or exits
DOCUMENT_PREPserver.pyUser requests documentDocument generated
FILING_GUIDANCErule_engine.pyDocument readyFiling instructions delivered
REFERRALlegal_rails.pyRisk critical OR UPL triggerReferral link provided
FOLLOW_UPserver.pyPost-guidance questionNew topic or session end
ESCALATIONgate_bridge.pyGuard vote DENYOwner review complete
CLOSEDserver.pyUser ends sessionTerminal state

VALID_TRANSITIONS Map

VALID_TRANSITIONS = { GREETING → [INTAKE] INTAKE → [FACT_GATHERING, REFERRAL] FACT_GATHERING → [AREA_CLASSIFICATION, REFERRAL] AREA_CLASSIFICATION→ [JURISDICTION_CHECK, FACT_GATHERING] JURISDICTION_CHECK → [RISK_ASSESSMENT] RISK_ASSESSMENT → [PREREQUISITE_CHECK, REFERRAL, ESCALATION] PREREQUISITE_CHECK → [GUIDANCE, REFERRAL] GUIDANCE → [DOCUMENT_PREP, FILING_GUIDANCE, FOLLOW_UP, CLOSED] DOCUMENT_PREP → [FILING_GUIDANCE, GUIDANCE] FILING_GUIDANCE → [FOLLOW_UP, CLOSED] REFERRAL → [CLOSED] FOLLOW_UP → [FACT_GATHERING, GUIDANCE, CLOSED] ESCALATION → [GUIDANCE, REFERRAL, CLOSED] CLOSED → [] // terminal }

FSM Flow Diagram

┌──────────┐ │ GREETING │ └────┬─────┘ │ ┌────▼─────┐ │ INTAKE │ └────┬─────┘ │ ┌────────▼─────────┐ │ FACT_GATHERING │◄──────────────────┐ └────────┬─────────┘ │ │ │ ┌───────────▼────────────┐ │ │ AREA_CLASSIFICATION │────────────────┘ └───────────┬────────────┘ (need more facts) │ ┌───────────▼────────────┐ │ JURISDICTION_CHECK │ └───────────┬────────────┘ │ ┌───────────▼────────────┐ │ RISK_ASSESSMENT │──────────┐ └───────────┬────────────┘ │ │ ┌────▼──────┐ ┌───────────▼────────────┐ │ ESCALATION│ │ PREREQUISITE_CHECK │ └────┬──────┘ └───────────┬────────────┘ │ │ ┌────────┘ ┌────▼─────┐ │ ┌───────►│ GUIDANCE │◄───────┘ │ └──┬───┬───┘ │ │ │ ┌──────▼───┐ ┌────▼───▼──────┐ ┌──────────┐ │ FOLLOW_UP│ │ DOCUMENT_PREP │ │ REFERRAL │◄─── (any high-risk) └──────┬───┘ └────┬──────────┘ └────┬──────┘ │ ┌────▼──────────┐ │ │ │FILING_GUIDANCE│ │ │ └────┬──────────┘ │ │ │ │ └───────────┼───────────────────┘ │ ┌────▼───┐ │ CLOSED │ └────────┘

ENFORCEMENT

Any transition not in VALID_TRANSITIONS[current_state] raises InvalidTransition(current, attempted). The server catches this and returns HTTP 400 with the invalid transition pair logged. This prevents impossible state jumps — you cannot go from GREETING to DOCUMENT_PREP.


4. AI Tier Routing

Jeremy uses a 4-tier AI model hierarchy governed by the Prime Directive: save money first. The routing decision is deterministic based on task complexity, tool requirements, and input length. Free Gemini tiers handle the majority of traffic; paid Claude Sonnet is a premium fallback reserved for cases the free tiers cannot handle.

Tier Decision Tree

┌──────────────┐ │ Incoming Req │ └──────┬───────┘ │ ┌───────────▼───────────┐ │ task in COMPLEX_TASKS? │ └───┬───────────────┬───┘ YES NO │ │ ┌────▼─────┐ ┌──────▼──────┐ │ TIER 2 │ │ tools on? │ │ Flash │ │ │ └──────────┘ └──┬───────┬──┘ YES NO │ │ ┌────▼────┐ ┌▼────────┐ │ TIER 2 │ │ TIER 1 │ │ Flash │ │ Flash │ │ │ │ Lite │ └─────────┘ └─────────┘ Fallback chain (step up on empty / error / short response): TIER 1 → TIER 2 → TIER 3 → TIER 4 → then walk back down.

Tier Specifications

TierModelMax TokensCost/CallUse Case
1Gemini 2.5 Flash Lite1,000$0.00Simple greetings, clarifications, no-tool chat
2Gemini 2.5 Flash1,500$0.00Default for tool-enabled sessions and complex tasks
3Gemini 3.1 Pro Preview2,000~$0.01Long inputs (>2,000 chars) and tool-result follow-ups
4Claude Sonnet 42,000~$0.01Premium fallback when all Gemini tiers fail or degrade

Implementation Details

  • Singleton patternjeremy_client.py holds one JeremyClient instance. All four tiers share the same client and session state.
  • Rate limiting — 100ms minimum between API calls (global, not per-tier). Prevents burst billing and API abuse.
  • Minimum tier floor — Any session with tools enabled starts at Tier 2 minimum. Tier 1 (Flash Lite) cannot reliably emit tool-call JSON, so the router clamps it out whenever the tool system is loaded.
  • Tiered fallback walk — If the chosen tier returns None or a response shorter than 50 chars, the router steps up one tier and retries. Once it reaches the top, it walks back down to lower tiers. All four tiers must fail before Jeremy returns an error.
  • Anthropic availability check — Tier 4 is skipped entirely if ANTHROPIC_API_KEY is not set. The free Gemini stack is fully self-sufficient.
  • COMPLEX_TASKS setnarrative_to_facts, contract_to_clauses, structured_to_guidance. These bypass Tier 1 and start at Tier 2.

5. Risk Engine

The risk engine computes a composite score (0-100) from 5 independent signals. Each signal is deterministic — no LLM involved in scoring. The LLM is only used upstream for fact extraction; the scoring itself is pure keyword matching and arithmetic.

Signal 1: Custody Keywords

CUSTODY_KEYWORDS = [ "custody", "visitation", "parenting time", "parenting plan", "custodial", "non-custodial", "sole custody", "joint custody", "physical custody", "legal custody", "child support", "parental rights", "termination of parental rights", "TPR", "guardian ad litem", "GAL", "best interests of the child", "UCCJEA", "Hague Convention", "parental alienation", "supervised visitation", "forensic evaluation", "child protective" ] # 23 primary keywords CONTEXTUAL_TRIGGER = "children" # only fires near custody context NEGATORS = [ "no children", "no kids", "childless", "not a parent", "don't have children", "do not have children", "never had children", "not about custody", "unrelated to custody", "not a custody matter", "no custody issue", "custody is not", "custody isn't", "not seeking custody" ] # 14 negators that suppress false positives IMMINENT_RISK = [ "emergency custody", "ex-parte", "TRO", "kidnapping", "flee", "fleeing", "abduction", "taken my child" ] # 8 imminent danger keywords → score = 100 (hard ceiling)

Signal 2: Area-of-Law Severity

Area of LawBase ScoreRationale
contract_review5Low risk, informational
small_claims10Limited jurisdiction, low stakes
landlord_tenant15Housing rights, some urgency
consumer_protection15UDAP, warranty claims
employment20EEOC deadlines, retaliation risk
debt_collection20FDCPA, SOL concerns
family_law25Divorce, property division
immigration30Deportation risk, complex procedures
personal_injury30SOL pressure, medical complexity
police_misconduct35§1983, qualified immunity, evidence loss
bankruptcy35Asset protection, means test
civil_rights40Constitutional claims, systemic
custody45Child welfare, emergency orders
criminal_defense50Liberty at stake, Miranda, plea implications

Signal 3: Deadline Urgency

DEADLINE_TRIGGERS = { "imminent": ["tomorrow", "today", "tonight", "this morning", "right now", "happening now", "hours"], # +30 "urgent": ["this week", "next week", "few days", "running out of time", "deadline"], # +20 "pressing": ["this month", "next month", "30 days", "soon", "coming up"], # +10 "aware": ["eventually", "planning to", "thinking about", "want to", "considering"] # +5 }

Signal 4: Prerequisite Gaps

Each area of law has REQUIRED prerequisites: e.g., employment → [EEOC_charge_filed, right_to_sue_letter, SOL_check] For each unmet prerequisite: +5 points If prerequisite is URGENT: +15 points If prerequisite is EXPIRED: +20 points Maximum from this signal: 40 (capped)

Signal 5: Aggravating Factors

CategoryKeywordsPointsRisk Bump
Violencehit, punch, assault, attack, weapon, gun, knife+25→ CRITICAL
Threatsthreatened, threatening, intimidation, stalking, harassing+20→ HIGH
Children at riskchild abuse, neglect, CPS, ACS, foster care+25→ CRITICAL
Financial harmstolen, fraud, scam, identity theft, drained account+15→ HIGH
Housing emergencyeviction notice, lockout, illegal eviction, marshal+15→ HIGH
Incarcerationarrested, jail, prison, bail, arraignment, warrant+20→ CRITICAL
Self-harmhurt myself, end it, suicide, self-harm, give up+25→ CRITICAL + crisis referral

Composite Formula

def compute_risk(signals): raw = sum(s.score for s in signals) composite = min(raw, 100) # _max_risk() override: if ANY signal alone ≥ threshold, # take the highest individual signal score max_single = max(s.score for s in signals) return max(composite, max_single) # whichever is higher

DESIGN NOTE

The _max_risk() function ensures that a single catastrophic signal (e.g., imminent custody = 100) cannot be diluted by low scores from other signals. A user mentioning "my ex is fleeing with my child" hits score 100 regardless of all other factors.


6. Legal Rails

The legal rails system prevents unauthorized practice of law (UPL) while maximizing the information Jeremy can safely provide. Two rule sets govern behavior: 8 prohibitions and 9 safe harbor permissions.

UPL Prohibitions (8 Rules)

PROHIBITIONS = [ "Do not tell the user what to do in their specific case", "Do not predict case outcomes or chances of success", "Do not recommend specific legal strategies as advice", "Do not draft legal documents presented as final/ready-to-file", "Do not interpret how a law applies to their specific facts", "Do not recommend whether to accept or reject a settlement", "Do not advise on plea deals or criminal defense strategy", "Do not represent yourself as an attorney or legal professional" ]

Safe Harbor (9 Permitted)

SAFE_HARBOR = [ "Explain what a law says in plain language", "Describe court procedures, filing steps, and deadlines", "Provide general information about legal rights", "Explain legal terms and concepts", "Help organize facts and documents for their case", "Provide form templates with blank fields", "Describe what others in similar situations have done", "Explain the pros and cons of different approaches generally", "Direct to legal aid, bar associations, and court resources" ]

Disclaimer Strings

PERSISTENT_DISCLAIMER = "I'm an AI legal assistant, not an attorney. This is legal information, not legal advice. No attorney-client relationship is formed. For your specific situation, consult a licensed attorney in your jurisdiction." FOOTER_DISCLAIMER = "Pro Se Network provides legal information, not legal advice. Jeremy is an AI assistant — not an attorney." DOCUMENT_HEADER = "TEMPLATE — NOT LEGAL ADVICE. This document is a template for informational purposes only. Have an attorney review before filing."

enforce_rails() Gate Logic

enforce_rails(message, risk_level, area) │ ▼ ┌──────────────────────┐ │ Check PROHIBITIONS │ │ against response │───── Match? ──── Redact + inject disclaimer └──────────┬───────────┘ │ Clean ▼ ┌──────────────────────┐ │ Check risk_level │ └──┬───┬───┬───┬───────┘ │ │ │ │ LOW MED HIGH CRIT │ │ │ │ │ │ │ └─► Red banner + hard stop + attorney referral link │ │ └─────► Orange banner + "strongly recommend attorney" │ └─────────► Yellow banner + "consider consulting attorney" └─────────────► Green banner + proceed normally

Risk Level Behavior

LevelScoreColorBannerBehavior
LOW0-25GreenInformationalFull guidance, standard disclaimers
MEDIUM26-50YellowCautionGuidance + attorney suggestion
HIGH51-75OrangeWarningLimited guidance + strong attorney recommendation
CRITICAL76-100RedHard StopNo guidance — referral only + crisis resources if applicable

7. Memory System

Jeremy's memory operates on two layers: a runtime SQLite database (memories.db) for session intelligence, and a persistent filesystem store (~/.jeremy/memory/) with FTS5 full-text search for cross-session recall.

SQLite Schema (4 Tables)

CREATE TABLE conversations ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, role TEXT NOT NULL, -- 'user' | 'assistant' content TEXT NOT NULL, area_of_law TEXT, risk_score INTEGER DEFAULT 0, timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE entity_memory ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, entity_type TEXT NOT NULL, -- 'person' | 'date' | 'amount' | 'org' entity_name TEXT NOT NULL, context TEXT, first_seen DATETIME DEFAULT CURRENT_TIMESTAMP, last_seen DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE gate_decisions ( id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, tool_name TEXT NOT NULL, risk_level TEXT NOT NULL, -- 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL' votes TEXT NOT NULL, -- JSON: [{guard, vote, reason}] verdict TEXT NOT NULL, -- 'APPROVED' | 'DENIED' timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE personality_evolution ( id INTEGER PRIMARY KEY AUTOINCREMENT, trait TEXT NOT NULL, -- 'warmth' | 'directness' | 'humor' | 'formality' score REAL DEFAULT 0.5, trigger TEXT, -- what caused the shift timestamp DATETIME DEFAULT CURRENT_TIMESTAMP );

WAL Mode + Thread Safety

PRAGMA journal_mode = WAL; PRAGMA busy_timeout = 5000; # Python-side: _write_lock = threading.Lock() def _write(self, sql, params): with self._write_lock: conn = sqlite3.connect(self.db_path) conn.execute(sql, params) conn.commit() conn.close()

DESIGN NOTE

WAL mode allows concurrent reads while serializing writes through _write_lock. Each write opens and closes its own connection — no long-lived connection objects that could leak across threads.

Context Message Assembly Pipeline (6 Layers)

build_context(session_id, new_message) │ ▼ Layer 1: System Prompt │ Jeremy's personality, role definition, Carter voice directives │ ▼ Layer 2: Legal Rails Injection │ PROHIBITIONS + SAFE_HARBOR + risk-level-specific instructions │ ▼ Layer 3: Conversation History │ Last 10 messages from conversations table │ Ordered by timestamp ASC │ ▼ Layer 4: Entity Recall │ All entities from entity_memory for this session │ Injected as: "You know: [person: John], [date: March 15], ..." │ ▼ Layer 5: Cross-Session Summary │ If returning user: summarize prior sessions from persistent memory │ FTS5 search on user identifiers │ ▼ Layer 6: Current Message User's new message appended as final context entry

8. Tool System

Jeremy has 27 registered tools managed through registry.py. Each tool has a risk level, a gate requirement, and a description. Tool execution is sandboxed with resource caps and command whitelists.

Tool Risk Matrix (27 Tools)

ToolRiskGateDescription
web_searchLOWAutoSearch legal databases, court info
case_lookupLOWAutoCourtListener case search
statute_lookupLOWAutoLook up specific statutes
court_infoLOWAutoCourt addresses, hours, procedures
deadline_calcLOWAutoCalculate filing deadlines
fee_waiver_checkLOWAutoCheck IFP eligibility
form_finderLOWAutoFind court forms by jurisdiction
legal_aid_searchLOWAutoFind free legal aid near user
document_templateMEDIUMGateGenerate document templates
letter_draftMEDIUMGateDraft demand/complaint letters
evidence_checklistLOWAutoGenerate evidence preservation list
timeline_builderLOWAutoBuild chronological case timeline
risk_assessmentMEDIUMGateRun full risk engine analysis
jurisdiction_checkLOWAutoDetermine proper jurisdiction
prerequisite_checkLOWAutoCheck filing prerequisites
memory_storeLOWAutoStore fact to persistent memory
memory_recallLOWAutoRecall from persistent memory
send_emailHIGHGate + SecuritySend email via SMTP vault
send_smsHIGHGate + SecuritySend SMS notification
file_readMEDIUMGateRead uploaded user files
file_writeHIGHGate + SecurityWrite/generate files
shell_execCRITICALGate + OwnerExecute shell commands (sandboxed)
api_callHIGHGate + SecurityMake external API calls
db_queryMEDIUMGateQuery memories.db
personality_adjustLOWAutoAdjust personality trait scores
gate_talkMEDIUMGateInitiate Warden terminal session
escalateHIGHGate + SecurityEscalate to human review

Approval Flow

APPROVAL LEVELS: LOW → Auto-approve, no gate check MEDIUM → Gate check (4 guards vote) HIGH → Gate check + security review (vault verification) CRITICAL → Gate check + owner approval required (blocks until ACK)

Sandbox: ALLOWED_COMMANDS + BLOCKED_PATTERNS

ALLOWED_COMMANDS = [ "ls", "cat", "head", "tail", "grep", "wc", "find", "date", "whoami", "pwd", "echo", "python3", "pip3", "curl" ] # 13 whitelisted commands — everything else blocked BLOCKED_PATTERNS = [ "rm -rf", "rm -r", "rmdir", "mkfs", "dd if=", "chmod 777", "curl.*|.*sh", "wget.*|.*sh", "> /dev/", "sudo", "su ", "passwd", "useradd", "kill", "pkill", "reboot", "shutdown" ] # 16+ blocked patterns — checked before execution RESOURCE_CAPS = { "memory": "256MB", # ulimit -v "timeout": "30s", # subprocess timeout "network": "10s" # urllib timeout }

Vault System

SMTP_ACCOUNTS = [ "account_1", # Primary sending account "account_2", # Secondary "account_3" # Fallback ] ALLOWED_SMTP_HOSTS = frozenset([ "smtp.gmail.com", "smtp.office365.com" ]) # Credentials loaded from environment: # SMTP_USER_1, SMTP_PASS_1, SMTP_HOST_1 # SMTP_USER_2, SMTP_PASS_2, SMTP_HOST_2 # SMTP_USER_3, SMTP_PASS_3, SMTP_HOST_3

Rate Limiter

SQLite-backed token bucket: SMS: 5/minute, 50/day Email: 10/minute, 100/day Implementation: rate_limiter.py - One SQLite table: rate_limits (resource, tokens, last_refill, window) - Token bucket refill on check - Atomic decrement with _write_lock - Separate buckets per resource type

9. Quadrant Guard

The Quadrant Guard is a 4-guard gate system that governs tool execution. Each guard evaluates the request independently, casts an APPROVE or DENY vote with reasoning, and the majority verdict determines execution.

Intent Classification

4 QUADRANTS (keyword scoring): Q1: INFORMATION — lookup, search, find, check, show, list, what is Q2: CREATION — create, generate, draft, build, write, template Q3: COMMUNICATION — send, email, text, notify, contact, message Q4: EXECUTION — run, execute, shell, command, install, deploy Each incoming tool request is scored against all 4 quadrants. Highest score determines primary classification. Multi-quadrant hits increase scrutiny level.

Guard Vote Flow

gate_check(tool_name, args, session) │ ▼ ┌───────────────────────────┐ │ Intent Classification │ │ Score against 4 quadrants │ └─────────────┬─────────────┘ │ ┌─────────▼─────────┐ │ Invoke 4 Guards │ └──┬──┬──┬──┬───────┘ │ │ │ │ ┌────▼┐┌▼──┐┌▼──┐┌▼─────┐ │ G1 ││G2 ││G3 ││ G4 │ │SAFE ││UPL││RISK││SCOPE │ └──┬──┘└─┬─┘└─┬─┘└──┬───┘ │ │ │ │ ▼ ▼ ▼ ▼ APPROVE DENY APPROVE APPROVE ← example │ │ │ │ └─────┼────┼─────┘ ▼ ┌─────────────────┐ │ MAJORITY VOTE │ │ 3/4 APPROVE │ │ Verdict: APPROVE │ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Log to │ │ gate_decisions │ │ (memories.db) │ └─────────────────┘

Guard Responsibilities

GuardNameChecks
G1Safety GuardResource caps, blocked patterns, sandbox compliance
G2UPL GuardUnauthorized practice of law violations, disclaimer presence
G3Risk GuardCurrent risk score vs tool risk level, escalation threshold
G4Scope GuardTool within session context, no scope creep, rate limits

Warden Terminal

gate_talk.py provides a direct terminal interface to the gate system — the "Warden" mode. Used for manual gate overrides, audit log inspection, and guard diagnostics. Accessible only through the gate_talk tool (MEDIUM risk, requires gate approval itself).


10. Voice Pipeline

Jeremy speaks with the Steffan voice — measured, confident, narrator-tone. As of v1.2, voice is a tiered free-first TTS stack: Edge-TTS serves primary traffic sub-1s, Kokoro ONNX is a fully-local fallback, and ElevenLabs is held in reserve as a premium tier. The fallback walks top-down and returns a clean 503 if every tier fails (no more silent degradation to browser speechSynthesis).

Tiered TTS Stack

JEREMY VOICE — 3-TIER TTS FALLBACK POST /api/tts { text: "..." } │ ▼ TIER 1 — Edge-TTS (primary, free) │ Provider: Microsoft Azure Neural (free endpoint) │ Voice: en-US-SteffanNeural │ Latency: ~0.6s for 500 chars │ Cost: $0.00 │ Returns: audio/mpeg (MP3) │ ▼ fail TIER 2 — Kokoro ONNX (local, free) │ Runtime: kokoro-onnx on CPU │ Voice: am_michael (warm American male) │ Model: kokoro-v1.0.onnx (310 MB) + voices-v1.0.bin (27 MB) │ Latency: ~2s x86, much slower on ARM A1 │ Cost: $0.00 │ Returns: audio/wav │ ▼ fail TIER 3 — ElevenLabs (premium fallback) │ Provider: ElevenLabs API │ Voice: Carter D (GorLj2SsI4u2JqL58gAA) │ Model: eleven_v3 │ Cost: ~$0.01 / request (when credits available) │ Gated: Only invoked if ELEVEN_API_KEY is set │ ▼ all fail HTTP 503 { "error": "Voice unavailable — all TTS engines failed" }

Endpoint Contract

POST /api/tts Body: { "text": "..." } Max chars: 1,000 per request (cut at sentence boundary) Timeout: 30s per tier Returns: audio/mpeg (Edge) or audio/wav (Kokoro) blob Errors: 400 no text | 503 all tiers failed

Why Edge-TTS, not Kokoro, as primary on the live VM

Kokoro ONNX is the highest-quality of the three, but the production VM is an Oracle ARM A1 Micro. In benchmark, Kokoro inference on that hardware runs at roughly 1 second per character of input — a 300-char response would exceed 4 minutes of inference and blow through any sane proxy timeout. Edge-TTS, by contrast, is a network call to Microsoft's free neural endpoint that returns in under a second regardless of host CPU. So the order was inverted for the live deployment: Edge-TTS primary, Kokoro reserved as a local safety net in case the Edge endpoint is ever blocked, and ElevenLabs only if both fail and a key is configured.

cleanForTTS() — 11 Regex Passes

function cleanForTTS(text) { text = text.replace(/\*\*(.*?)\*\*/g, '$1'); // strip bold text = text.replace(/\*(.*?)\*/g, '$1'); // strip italic text = text.replace(/#{1,6}\s/g, ''); // strip headers text = text.replace(/\[([^\]]+)\]\([^)]+\)/g, '$1'); // links → text text = text.replace(/`([^`]+)`/g, '$1'); // strip inline code text = text.replace(/```[\s\S]*?```/g, ''); // strip code blocks text = text.replace(/[-*+]\s/g, ''); // strip list markers text = text.replace(/\d+\.\s/g, ''); // strip numbered lists text = text.replace(/>\s/g, ''); // strip blockquotes text = text.replace(/\n{2,}/g, '. '); // double newlines → period text = text.replace(/\n/g, ' '); // single newlines → space return text.trim().substring(0, 500); }

KITT 32-Bar Equalizer

KITT EQUALIZER — Web Audio API AudioContext → AnalyserNode (fftSize: 128) │ ▼ getByteFrequencyData() → Uint8Array[64] │ ▼ Take 32 bars (indices 0-31) Map center-out: bar[0] at center, bar[31] at edges │ ▼ Render: ┌──────────────────────────────────────────────────────┐ │ ▉ ▉ │ │ ▉ ▉ ▉ ▉ │ │ ▉ ▉ ▉ ▉ ▉ ▉ │ │ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ │ │ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ ▉ │ └──────────────────────────────────────────────────────┘ Color: Gold gradient (#D4AF37 → #B08D57) Glow: box-shadow at val > 100 Idle: Breathing animation (sine wave, 0.5-3px) Decay: Staggered fade-out per bar (30ms delay each)

speak() Function Flow

async function speak(text) { const clean = cleanForTTS(text); if (!clean) return; try { const res = await fetch('/api/tts', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ text: clean }) }); if (!res.ok) throw new Error('TTS failed'); const blob = await res.blob(); const url = URL.createObjectURL(blob); const audio = new Audio(url); // Connect to Web Audio for KITT bars const source = audioCtx.createMediaElementSource(audio); source.connect(analyser); analyser.connect(audioCtx.destination); startKITT(); // begin animation loop audio.play(); audio.onended = () => stopKITT(); } catch (e) { // v1.2: NO browser-synth fallback. If every server tier fails, // the UI fails silently rather than impersonate Jeremy with a // generic speechSynthesis voice. Voice integrity > voice presence. console.log('Voice unavailable — all server TTS engines failed'); stopKITT(); } }

11. Infrastructure

Jeremy runs on a single Oracle Cloud ARM A1 Micro instance — free tier, no monthly cost. The entire stack is one process behind nginx.

VM Specification

PropertyValue
ProviderOracle Cloud (Always Free)
InstanceARM A1 Micro
IP129.159.169.37
CPU1 ARM core
RAM6 GB
Disk50 GB boot volume
OSUbuntu 22.04 LTS (aarch64)
Cost$0.00/mo

nginx Configuration

server { listen 443 ssl; server_name prosenetwork.org www.prosenetwork.org; ssl_certificate /etc/letsencrypt/live/prosenetwork.org/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/prosenetwork.org/privkey.pem; root /home/ubuntu/pro-se-network/app; index index.html; location /api/ { proxy_pass http://127.0.0.1:7860; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_read_timeout 120s; } location / { try_files $uri $uri/ /index.html; } }

Process Inventory

PID CMD RSS ----- ------------------------------- ------- XXXX python3 src/server.py ~65 MB └── ThreadingMixIn threads ~2 MB each └── SQLite WAL (memories.db) ~5 MB Total runtime footprint: ~80 MB

Persistent Memory Filesystem

~/.jeremy/ └── memory/ ├── persistent.db # FTS5-indexed SQLite ├── sessions/ # Per-session summaries │ ├── abc123.json │ └── def456.json └── entities/ # Cross-session entity store ├── persons.json └── dates.json

FTS5 Index Schema

CREATE VIRTUAL TABLE memory_fts USING fts5( content, session_id, area_of_law, timestamp, tokenize='porter unicode61' ); -- Queries: -- SELECT * FROM memory_fts WHERE memory_fts MATCH 'custody AND brooklyn'; -- SELECT * FROM memory_fts WHERE memory_fts MATCH 'landlord NEAR tenant';

Bridge / Quarantine System

BRIDGE CONCEPT

Tools flagged by the Quadrant Guard as DENY enter a "quarantine" state. The bridge allows an owner to manually review quarantined actions, approve or permanently deny them, and optionally adjust guard parameters. Bridge state is stored in gate_decisions table with verdict='QUARANTINED' until resolved.


12. Data Pipeline

Jeremy runs dual logging: memories.py feeds the runtime SQLite brain, while conversation_log.py writes append-only JSONL for retraining. The two systems are independent — neither blocks the other.

Dual Logging Architecture

DUAL LOGGING User Message │ ├──────────────────────────────┐ ▼ ▼ memories.py → SQLite conversation_log.py → JSONL (Runtime Brain) (Retraining Archive) │ │ ├── conversations table ├── session_start entry ├── entity_memory table ├── user_exchange entry ├── gate_decisions table ├── assistant_exchange entry └── personality_evolution └── area + risk tags │ │ ▼ ▼ Powers: history recall, Powers: fine-tuning dataset, entity injection, context area distribution analysis, assembly, gate audits quality review, retraining

JSONL Entry Format

// Session start {"type":"session_start","session_id":"abc123","timestamp":"2026-02-25T14:30:00Z"} // User exchange {"type":"user","session_id":"abc123","content":"I need help with my lease", "area_of_law":"landlord_tenant","risk_score":15,"timestamp":"..."} // Assistant exchange {"type":"assistant","session_id":"abc123","content":"Let me help you understand...", "area_of_law":"landlord_tenant","risk_score":15,"tier_used":0,"tokens":847, "timestamp":"..."}

Retraining Flow

RETRAINING PIPELINE conversations.jsonl (accumulates) │ ▼ Filter: quality threshold │ - Remove short/empty exchanges │ - Remove CRITICAL-risk sessions (too sensitive) │ - Keep only GUIDANCE-state exchanges │ ▼ Tag: area_of_law + risk_level │ - Balance across 12 areas │ - Ensure jurisdiction diversity │ ▼ Format: instruction/response pairs │ - System prompt + user message → assistant response │ ▼ Fine-tune: Phi-3 3.8B (current base) │ ▼ Merge: LoRA → full model │ Current: israelburns/jeremy-v1-merged (7.64 GB) │ ▼ Deploy: HF Space or local inference

Current Training Data

Dataset: 5,196 instruction/response pairs (v2) Areas: 12 areas of law Format: {"instruction": "...", "input": "...", "output": "..."} Source: Hand-curated + synthetic + conversation logs Artifact: israelburns/jeremy-v1-merged (Phi-3 3.8B LoRA merge, 7.64 GB on HF)

NOTE — Production vs. research track

The fine-tuned Phi-3 3.8B merged model is a research artifact, not the production inference path. Jeremy's live responses on prosenetwork.org are served 100% by the 4-tier API stack described in Section 4 (Gemini + Claude). The conversation logs and training pipeline exist to support a future self-hosted path — not to power the current build.


13. Code Metrics + Known Issues

Build Summary

JEREMY AI — BUILD METRICS 83 Python modules 41,635 lines of code 207+ HTTP endpoints (do_GET + do_POST handlers) 14 FSM states 5 risk signals 27 registered tools 4 gate guards 4 AI tiers (3 free Gemini + 1 premium Claude) 3 TTS tiers (Edge-TTS + Kokoro + ElevenLabs) 14 areas of law 8 supported jurisdictions 5,196 training pairs (v2 dataset, research track) 1 server process (~65 MB RSS) $0.00/mo total operating cost

Senior Engineer Flags

KNOWN ISSUES — ARCHITECTURE REVIEW

The following are documented architectural concerns identified during code review. They are not bugs — they are tradeoffs made for speed of development on a single-developer, $0 infrastructure stack.

IssueLocationSeverityDetail
No session TTL / eviction server.py MEDIUM In-memory session dict grows unbounded. No TTL, no LRU eviction. On a low-traffic legal aid site this is acceptable; at scale it would OOM.
No session auth server.py MEDIUM Session IDs are 12-character UUIDs. No HMAC, no cookie signing, no CSRF token. Anyone with a valid session ID can resume that session. Acceptable for informational tool; not for anything with PII.
No request-level lock on session mutations server.py LOW ThreadingMixIn means concurrent requests can mutate the same session dict. In practice, users send one message at a time. Race condition is theoretically possible but practically unlikely.
Blocking synchronous HTTP in request threads jeremy_client.py LOW LLM API calls use urllib (synchronous). Each request thread blocks for 5-30s during inference. With ThreadingMixIn, this is fine at low concurrency. At scale, would need async or a task queue.
Divorce endpoint double-reads body server.py LOW The divorce POST handler reads the request body twice. In Python's HTTPServer, the body stream is consumed on first read. Second read returns empty. This is a bug that likely causes silent failures on the divorce workflow endpoint.

CONTEXT

These issues are documented, not hidden. Jeremy is a legal information tool serving low-traffic pro se litigants — not a high-concurrency SaaS platform. The architecture is appropriate for its current scale and cost constraints ($0 infrastructure, $5/mo TTS). Fixing these would add complexity without immediate user-facing benefit.