Back to MCP Servers

Prism

Zero-config persistent memory for AI agents with local SQLite. Mind Palace web dashboard, time travel (rewind/replay sessions), agent telepathy (cross-client memory sharing), code mode templates, morning briefings, and progressive context loading. 25 tools, 6 resources, 4 prompt…

knowledge-memorysqliteaiagent
By dcostenco
14324Updated 5 days agoTypeScriptAGPL-3.0

Installation

npx -y prism-mcp

Configuration

{
  "mcpServers": {
    "prism-mcp": {
      "command": "npx",
      "args": ["-y", "prism-mcp"]
    }
  }
}

How to use

  1. Run the installation command above (if needed)
  2. Open your Claude Code settings file (~/.claude/settings.json)
  3. Add the configuration to the mcpServers section
  4. Restart Claude Code to apply changes

🧠 Prism Coder

🌐 Read in your language: 🇬🇧 English · 🇪🇸 Español · 🇫🇷 Français · 🇵🇹 Português · 🇷🇴 Română · 🇺🇦 Українська · 🇷🇺 Русский · 🇩🇪 Deutsch · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇨🇳 中文 · 🇸🇦 العربية

Persistent memory + tool-calling intelligence for AI agents. (formerly Prism MCP)

A Model Context Protocol server that gives Claude, Cursor, and other AI tools a Mind Palace — long-term memory that survives across sessions, with semantic search, cognitive routing, a visual dashboard, and the prism-coder:1b7 / prism-coder:8b / prism-coder:14b / prism-coder:32b LLM fleet for offline tool-calling.

npm VS Marketplace Website MCP Registry Smithery License: AGPL-3.0

Renamed in v14.0.0: the project is now Prism Coder to cover both the Mind Palace memory server and the prism-coder:1b7 / prism-coder:8b / prism-coder:14b / prism-coder:32b LLM fleet on HuggingFace + Ollama. The npm package stays prism-mcp-server so existing install URLs and mcp.json entries keep working — the prism-coder binary has been the canonical entry point since v12.


What Prism Coder does

💾 Your AI remembers across sessions

Every conversation feeds the Mind Palace. Next session, your AI agent loads the right context automatically — no re-explaining.

🔍 Semantic search over your history

Ask "what did I decide about the auth flow last month?" and get the answer with citations. Vector search + keyword + graph traversal.

🧬 Cognitive routing

Different memory types live in different stores: episodic (what happened), semantic (what's true), procedural (how to do X). The router picks where to store and where to retrieve.

🔄 Proactive session drift detection (new in v15)

Your AI agent can now detect when it has drifted from your original goals — mid-session, automatically — and self-correct before you notice the problem.

Three direct Prism calls:

  1. session_save_ledger — snapshot current state
  2. session_cognitive_route — compare current work against original goals, returns on_track / minor_drift / major_drift
  3. session_compact_ledger — if drifted, compress and reload only what matters

When major drift is detected, the alert routes to the Synalux portal so it's visible across sessions and devices — not just in the current conversation.

Real example it caught: A training session promised BFCL ≥90% for three AI models. The agent spent 3 hours debugging audio bugs instead. The drift check surfaced: "Training goal unmet. Layer3 corpus missing from all training sets. 0 BFCL scores measured." The session immediately re-aligned.

No scripts. No cron. No hooks. Three tool calls, Prism handles the rest.

🛡 Local-first — security + speed

Free tier runs entirely on your machine — SQLite, local embedding model, no API keys, no cloud. Paid tier adds cloud sync via Synalux portal.

Why local models matter:

Cloud LLMLocal prism-coder
Tool-call latency200ms–3s~1.6s (1.7B) / ~1.1s (14B)
API key requiredYesNo
Data sent externallyEvery promptNothing
Works offline
Cost at scale$0.002–0.06/call$0
HIPAARequires BAAOn-prem = no BAA

Install in one command — no config, no keys, no vendor agreements:

ollama pull dcostenco/prism-coder:14b   # 9 GB  · default router · Mac M2+ / iPad Pro
ollama pull dcostenco/prism-coder:4b    # 2.5 GB · verifier · iPhone 15/16 Pro
ollama pull dcostenco/prism-coder:1b7   # 2.2 GB · ultra-low RAM / Apple Watch
ollama pull dcostenco/prism-coder:32b   # 19 GB  · complex tasks · Mac M2 Ultra+
ollama pull dcostenco/prism-coder:8b    # 4.7 GB · balanced · iPhone/iPad 8GB

Prism MCP detects both the namespaced (dcostenco/prism-coder:14b) and bare (prism-coder:14b) Ollama tag forms automatically — nothing else to configure. If you want the bare tags as aliases for direct ollama run prism-coder:14b use, run:

prism register-models           # aliases */prism-coder:* → prism-coder:* via `ollama cp`
prism register-models --dry-run # preview what would be aliased

Cascade architecture

Three-tier local cascade with cloud fallback:

Query arrives
  │
  ▼
prism-coder:14b ── routes (100% eval_300) ──▶  serve  (~3s, 9GB, FREE)
  │                                              │
  │                                    knowledge_search (RAG context)
  │                                              │
  ▼                                              ▼
prism-coder:4b ── verifies claims ──────────▶  grounded response
  │                 (2.5GB, <1s)
  │
  ▼  (complex tasks only, explicit ceiling="32b")
prism-coder:32b ── deep reasoning ──────────▶  serve  (~8s, 19GB, FREE)
  │
  ▼  (cloud fallback when local insufficient)
Claude Sonnet 4 → Claude Opus 4.7 ─────────▶  serve  (cloud, ~$0.01/req)
TierModelRoleRAMLatencyCost
Defaultprism-coder:14bRouter + general inference9 GB~3s$0
Verifierprism-coder:4bGrounding claims check2.5 GB<1s$0
Complexprism-coder:32bDeep reasoning (on-demand)19 GB~8s$0
CloudSonnet → OpusFallback for max quality~5-10s~$0.01

Mobile / offline cascade (Prism AAC iOS):

prism-coder:14b (iPad Pro 16GB) → prism-coder:4b (iPhone 8GB)
  → prism-coder:1.7b (any device, always fits)

Knowledge ingestion — teach Prism your codebase

Your code knowledge lives in the knowledge graph, not in model weights. Routing stays at 100%.

bash scripts/knowledge-ingest/setup.sh   # one-time setup
# Then every git commit auto-indexes changed files into the knowledge graph

Three entry points:

  • MCP tool: knowledge_ingest — AI says "learn this code"
  • GitHub webhook: POST /api/github/webhook — auto on push
  • REST API: POST /api/v1/prism/ingest — open interface

See KNOWLEDGE_INGESTION.md for full setup guide.

Routing accuracy

Head-to-head: prism-coder:14b vs Claude Opus (25-case benchmark, production system prompt, May 2026):

Metricprism-coder:14bClaude Opus 4
Overall accuracy96% (24/25)88% (22/25)
Tool routing (15 tests)93% (14/15)80% (12/15)
Abstention (10 tests)100% (10/10)100% (10/10)
Avg latency0.8s5.5s
Cost per query$0~$0.017
Annual @ 1K/day$0~$6,100

prism-coder:14b beats Opus on tool routing — 7x faster, free, runs offline.

eval_300 (300 cases, 17 tools + NO_TOOL, 9 categories, 3-seed validated):

Modeleval_300 strictSizeLatency
prism-coder:32b300/300 (100%)19 GB~1.4s
prism-coder:14b299/300 (99.7%)9 GB~0.8s
prism-coder:4b300/300 (100%)2.5 GB~0.5s
prism-coder:1.7b300/300 (100%)2.2 GB~1.6s

Categories: abstention, adversarial traps, cascade, disambiguation, edge cases, multi-intent, natural phrasing, parameter extraction, verifier prompts.

What this means: a child in a hospital without WiFi, a nonverbal adult on an airplane, or a family on a budget gets Claude-grade routing accuracy with zero cloud dependency — the AAC path routes correctly 100% of the time across all tiers.

What it does NOT mean: these scores measure routing precision on a 17-tool taxonomy, not general intelligence. Claude outperforms on everything outside this task. The value is offline reliability at zero cost, not replacing Claude. Code and clinical knowledge come from RAG via knowledge_search.

🔍 L3 Grounding Verifier

When prism_infer receives an evidence payload, the grounding verifier automatically checks the model's response against the provided evidence before returning to the caller. Unverified or hallucinated claims are flagged. This is the third layer (L3) of the cascade — after tool routing (L1) and confidence gating (L2).

🧠 HRR Semantic Drift Detection (v17.0)

Detects when long AI agent sessions drift from their original goal — using Holographic Reduced Representations for temporal trajectory encoding and anomaly detection.

Three domains, one detector:

DomainSignalsSafety
BCBA/ClinicalClient specificity decay, function-intervention alignment (4 functions), contraindication detection (epilepsy/pica/dysphagia/diabetes)PHI-safe, deterministic
CodingFile scope entropy, summary vagueness, test coverage ratio, trajectory HRR divergenceAdaptive threshold for refactors
AACPrediction accuracy, vocabulary stagnation, topic divergenceEmergency phrases always ≥ 0.95

Research-backed: trajectory association (Frady et al. 2018), HDAD anomaly detection (Wang et al. 2021), unit-modulus projection (Ganesan et al. NeurIPS 2021). 306 tests across 8 files, zero failures. Use session_detect_drift with optional domain parameter.

⚡ Zero-search retrieval (new in v15.8)

Holographic Reduced Representations (HRR) via Rust WASM for instant memory retrieval without a database query.

Three adaptive strategies:

  • GloVe embeddings (offline, 50K words) — 87% Top-1 accuracy, stable at 200+ concepts
  • API embeddings (Gemini/Voyage) — 90%+ accuracy when online
  • NeurIPS 2021 projection — unit-modulus normalization for numerical stability

Retrieval cascade: HRR (~0.2ms) → FTS5 (~50ms) → Supabase (~200ms)

MetricHRR (WASM)FTS5Supabase Vector
Latency0.2ms50ms200ms
Speedup1x250x slower1000x slower
OfflineYesYesNo
Accuracy (GloVe)87% Top-195%+95%+
Hologram size8KBIndex variesCloud

HRR acts as Tier 0 — if confidence is high, FTS5 is skipped entirely. Falls through gracefully when HRR has no match. 97 dedicated tests (72 system + 25 API/client). Built with Rust + rustfft + wasm-bindgen (229KB binary).

HRR AAC prediction benchmark — real-world impact on Prism AAC word prediction (10 scenarios, 54 integration tests):

ScenarioBaseline Top-1+HRR Top-1Top-1 LiftMRR Lift
Core AAC phrases36.7%46.7%+27.3%+6.0%
Personal vocabulary70.4%81.5%+15.8%+9.2%
Mixed (all phrases)47.2%56.9%+20.6%+5.7%
Cross-session recall80.0%80.0%+0.0%+0.0%

Top-1 = correct word is tile #1. MRR = Mean Reciprocal Rank. Zero Top-5 regressions in any scenario. HRR encodes bigrams + trigrams from every spoken phrase; probes take ~0.2ms — safe on every keystroke. All Synalux apps (clinical, AAC, PrismCoach) share HRR via the portal /api/v1/hrr endpoint.

Competitive comparison:

| System | Retrieval | Offline | Cost | La

View source on GitHub