🧠 Prism Coder
🌐 Read in your language: 🇬🇧 English · 🇪🇸 Español · 🇫🇷 Français · 🇵🇹 Português · 🇷🇴 Română · 🇺🇦 Українська · 🇷🇺 Русский · 🇩🇪 Deutsch · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇨🇳 中文 · 🇸🇦 العربية
Persistent memory + tool-calling intelligence for AI agents. (formerly Prism MCP)
A Model Context Protocol server that gives Claude, Cursor, and other AI tools a Mind Palace — long-term memory that survives across sessions, with semantic search, cognitive routing, a visual dashboard, and the prism-coder:1b7 / prism-coder:8b / prism-coder:14b / prism-coder:32b LLM fleet for offline tool-calling.
Renamed in v14.0.0: the project is now Prism Coder to cover both the Mind Palace memory server and the
prism-coder:1b7/prism-coder:8b/prism-coder:14b/prism-coder:32bLLM fleet on HuggingFace + Ollama. The npm package staysprism-mcp-serverso existing install URLs andmcp.jsonentries keep working — theprism-coderbinary has been the canonical entry point since v12.
What Prism Coder does
💾 Your AI remembers across sessions
Every conversation feeds the Mind Palace. Next session, your AI agent loads the right context automatically — no re-explaining.
🔍 Semantic search over your history
Ask "what did I decide about the auth flow last month?" and get the answer with citations. Vector search + keyword + graph traversal.
🧬 Cognitive routing
Different memory types live in different stores: episodic (what happened), semantic (what's true), procedural (how to do X). The router picks where to store and where to retrieve.
🔄 Proactive session drift detection (new in v15)
Your AI agent can now detect when it has drifted from your original goals — mid-session, automatically — and self-correct before you notice the problem.
Three direct Prism calls:
session_save_ledger— snapshot current statesession_cognitive_route— compare current work against original goals, returnson_track / minor_drift / major_driftsession_compact_ledger— if drifted, compress and reload only what matters
When major drift is detected, the alert routes to the Synalux portal so it's visible across sessions and devices — not just in the current conversation.
Real example it caught: A training session promised BFCL ≥90% for three AI models. The agent spent 3 hours debugging audio bugs instead. The drift check surfaced: "Training goal unmet. Layer3 corpus missing from all training sets. 0 BFCL scores measured." The session immediately re-aligned.
No scripts. No cron. No hooks. Three tool calls, Prism handles the rest.
🛡 Local-first — security + speed
Free tier runs entirely on your machine — SQLite, local embedding model, no API keys, no cloud. Paid tier adds cloud sync via Synalux portal.
Why local models matter:
| Cloud LLM | Local prism-coder | |
|---|---|---|
| Tool-call latency | 200ms–3s | ~1.6s (1.7B) / ~1.1s (14B) |
| API key required | Yes | No |
| Data sent externally | Every prompt | Nothing |
| Works offline | ❌ | ✅ |
| Cost at scale | $0.002–0.06/call | $0 |
| HIPAA | Requires BAA | On-prem = no BAA |
Install in one command — no config, no keys, no vendor agreements:
ollama pull dcostenco/prism-coder:14b # 9 GB · default router · Mac M2+ / iPad Pro
ollama pull dcostenco/prism-coder:4b # 2.5 GB · verifier · iPhone 15/16 Pro
ollama pull dcostenco/prism-coder:1b7 # 2.2 GB · ultra-low RAM / Apple Watch
ollama pull dcostenco/prism-coder:32b # 19 GB · complex tasks · Mac M2 Ultra+
ollama pull dcostenco/prism-coder:8b # 4.7 GB · balanced · iPhone/iPad 8GBPrism MCP detects both the namespaced (dcostenco/prism-coder:14b) and bare (prism-coder:14b) Ollama tag forms automatically — nothing else to configure. If you want the bare tags as aliases for direct ollama run prism-coder:14b use, run:
prism register-models # aliases */prism-coder:* → prism-coder:* via `ollama cp`
prism register-models --dry-run # preview what would be aliasedCascade architecture
Three-tier local cascade with cloud fallback:
Query arrives
│
▼
prism-coder:14b ── routes (100% eval_300) ──▶ serve (~3s, 9GB, FREE)
│ │
│ knowledge_search (RAG context)
│ │
▼ ▼
prism-coder:4b ── verifies claims ──────────▶ grounded response
│ (2.5GB, <1s)
│
▼ (complex tasks only, explicit ceiling="32b")
prism-coder:32b ── deep reasoning ──────────▶ serve (~8s, 19GB, FREE)
│
▼ (cloud fallback when local insufficient)
Claude Sonnet 4 → Claude Opus 4.7 ─────────▶ serve (cloud, ~$0.01/req)| Tier | Model | Role | RAM | Latency | Cost |
|---|---|---|---|---|---|
| Default | prism-coder:14b | Router + general inference | 9 GB | ~3s | $0 |
| Verifier | prism-coder:4b | Grounding claims check | 2.5 GB | <1s | $0 |
| Complex | prism-coder:32b | Deep reasoning (on-demand) | 19 GB | ~8s | $0 |
| Cloud | Sonnet → Opus | Fallback for max quality | — | ~5-10s | ~$0.01 |
Mobile / offline cascade (Prism AAC iOS):
prism-coder:14b (iPad Pro 16GB) → prism-coder:4b (iPhone 8GB)
→ prism-coder:1.7b (any device, always fits)Knowledge ingestion — teach Prism your codebase
Your code knowledge lives in the knowledge graph, not in model weights. Routing stays at 100%.
bash scripts/knowledge-ingest/setup.sh # one-time setup
# Then every git commit auto-indexes changed files into the knowledge graphThree entry points:
- MCP tool:
knowledge_ingest— AI says "learn this code" - GitHub webhook:
POST /api/github/webhook— auto on push - REST API:
POST /api/v1/prism/ingest— open interface
See KNOWLEDGE_INGESTION.md for full setup guide.
Routing accuracy
Head-to-head: prism-coder:14b vs Claude Opus (25-case benchmark, production system prompt, May 2026):
| Metric | prism-coder:14b | Claude Opus 4 |
|---|---|---|
| Overall accuracy | 96% (24/25) | 88% (22/25) |
| Tool routing (15 tests) | 93% (14/15) | 80% (12/15) |
| Abstention (10 tests) | 100% (10/10) | 100% (10/10) |
| Avg latency | 0.8s | 5.5s |
| Cost per query | $0 | ~$0.017 |
| Annual @ 1K/day | $0 | ~$6,100 |
prism-coder:14b beats Opus on tool routing — 7x faster, free, runs offline.
eval_300 (300 cases, 17 tools + NO_TOOL, 9 categories, 3-seed validated):
| Model | eval_300 strict | Size | Latency |
|---|---|---|---|
| prism-coder:32b | 300/300 (100%) | 19 GB | ~1.4s |
| prism-coder:14b | 299/300 (99.7%) | 9 GB | ~0.8s |
| prism-coder:4b | 300/300 (100%) | 2.5 GB | ~0.5s |
| prism-coder:1.7b | 300/300 (100%) | 2.2 GB | ~1.6s |
Categories: abstention, adversarial traps, cascade, disambiguation, edge cases, multi-intent, natural phrasing, parameter extraction, verifier prompts.
What this means: a child in a hospital without WiFi, a nonverbal adult on an airplane, or a family on a budget gets Claude-grade routing accuracy with zero cloud dependency — the AAC path routes correctly 100% of the time across all tiers.
What it does NOT mean: these scores measure routing precision on a 17-tool taxonomy, not general intelligence. Claude outperforms on everything outside this task. The value is offline reliability at zero cost, not replacing Claude. Code and clinical knowledge come from RAG via knowledge_search.
🔍 L3 Grounding Verifier
When prism_infer receives an evidence payload, the grounding verifier automatically checks the model's response against the provided evidence before returning to the caller. Unverified or hallucinated claims are flagged. This is the third layer (L3) of the cascade — after tool routing (L1) and confidence gating (L2).
🧠 HRR Semantic Drift Detection (v17.0)
Detects when long AI agent sessions drift from their original goal — using Holographic Reduced Representations for temporal trajectory encoding and anomaly detection.
Three domains, one detector:
| Domain | Signals | Safety |
|---|---|---|
| BCBA/Clinical | Client specificity decay, function-intervention alignment (4 functions), contraindication detection (epilepsy/pica/dysphagia/diabetes) | PHI-safe, deterministic |
| Coding | File scope entropy, summary vagueness, test coverage ratio, trajectory HRR divergence | Adaptive threshold for refactors |
| AAC | Prediction accuracy, vocabulary stagnation, topic divergence | Emergency phrases always ≥ 0.95 |
Research-backed: trajectory association (Frady et al. 2018), HDAD anomaly detection (Wang et al. 2021), unit-modulus projection (Ganesan et al. NeurIPS 2021). 306 tests across 8 files, zero failures. Use session_detect_drift with optional domain parameter.
⚡ Zero-search retrieval (new in v15.8)
Holographic Reduced Representations (HRR) via Rust WASM for instant memory retrieval without a database query.
Three adaptive strategies:
- GloVe embeddings (offline, 50K words) — 87% Top-1 accuracy, stable at 200+ concepts
- API embeddings (Gemini/Voyage) — 90%+ accuracy when online
- NeurIPS 2021 projection — unit-modulus normalization for numerical stability
Retrieval cascade: HRR (~0.2ms) → FTS5 (~50ms) → Supabase (~200ms)
| Metric | HRR (WASM) | FTS5 | Supabase Vector |
|---|---|---|---|
| Latency | 0.2ms | 50ms | 200ms |
| Speedup | 1x | 250x slower | 1000x slower |
| Offline | Yes | Yes | No |
| Accuracy (GloVe) | 87% Top-1 | 95%+ | 95%+ |
| Hologram size | 8KB | Index varies | Cloud |
HRR acts as Tier 0 — if confidence is high, FTS5 is skipped entirely. Falls through gracefully when HRR has no match. 97 dedicated tests (72 system + 25 API/client). Built with Rust + rustfft + wasm-bindgen (229KB binary).
HRR AAC prediction benchmark — real-world impact on Prism AAC word prediction (10 scenarios, 54 integration tests):
| Scenario | Baseline Top-1 | +HRR Top-1 | Top-1 Lift | MRR Lift |
|---|---|---|---|---|
| Core AAC phrases | 36.7% | 46.7% | +27.3% | +6.0% |
| Personal vocabulary | 70.4% | 81.5% | +15.8% | +9.2% |
| Mixed (all phrases) | 47.2% | 56.9% | +20.6% | +5.7% |
| Cross-session recall | 80.0% | 80.0% | +0.0% | +0.0% |
Top-1 = correct word is tile #1. MRR = Mean Reciprocal Rank. Zero Top-5 regressions in any scenario. HRR encodes bigrams + trigrams from every spoken phrase; probes take ~0.2ms — safe on every keystroke. All Synalux apps (clinical, AAC, PrismCoach) share HRR via the portal /api/v1/hrr endpoint.
Competitive comparison:
| System | Retrieval | Offline | Cost | La
…