Back to MCP Servers

Prism

Zero-config persistent memory for AI agents with local SQLite. Mind Palace web dashboard, time travel (rewind/replay sessions), agent telepathy (cross-client memory sharing), code mode templates, morning briefings, and progressive context loading. 25 tools, 6 resources, 4 prompt…

knowledge-memorysqliteaiagent
By dcostenco
14924Updated 5 days agoTypeScriptAGPL-3.0

Installation

npx -y prism-mcp

Configuration

{
  "mcpServers": {
    "prism-mcp": {
      "command": "npx",
      "args": ["-y", "prism-mcp"]
    }
  }
}

How to use

  1. Run the installation command above (if needed)
  2. Open your Claude Code settings file (~/.claude/settings.json)
  3. Add the configuration to the mcpServers section
  4. Restart Claude Code to apply changes

Prism Coder

Give your AI agent memory that lasts. Persistent sessions, knowledge graphs, and offline tool-routing — fully local and free.

npm MCP Registry License: AGPL-3.0 Models on HuggingFace

<p align="center"> <img src="docs/v11_hivemind_multi_agent_dashboard.jpg" alt="Prism Coder — Mind Palace Dashboard with Knowledge Graph and Multi-Agent Hivemind" width="700" /> </p>

Prism Coder is an MCP server that gives Claude, Cursor, and other AI tools long-term memory that survives across sessions. It ships with the open-weight prism-coder model fleet (2B–27B) for fast, offline tool-routing — no cloud required.

No account needed. No API keys. Runs on your machine.
A paid subscription adds cloud sync, higher model tiers, and team features through the Synalux portal.


Quickstart

The free tier needs no account, no API key, and no cloud. Add the server to your MCP client:

{
  "mcpServers": {
    "prism": {
      "command": "npx",
      "args": ["-y", "prism-mcp-server"]
    }
  }
}

Open Claude Desktop or Cursor and your agent now has memory backed by a local SQLite database (~/.prism-mcp/data.db).

Optional — local model fleet for offline tool-routing. Pull whichever fits your hardware:

ollama pull dcostenco/prism-coder:2b    # 2.3 GB · mobile / lightweight (99.1% routing accuracy)
ollama pull dcostenco/prism-coder:4b    # 3.4 GB · verifier (100% accuracy)
ollama pull dcostenco/prism-coder:9b    # 5.8 GB · default router (100% accuracy, Qwen3.5)
ollama pull dcostenco/prism-coder:27b   # 16 GB  · complex tasks (100% accuracy)

Prism detects both the namespaced (dcostenco/prism-coder:9b) and bare (prism-coder:9b) Ollama tags automatically.


What it does

Your AI agent forgets everything between sessions. Prism fixes that — and adds verification, drift detection, and multi-agent coordination on top.

Mind Palace — persistent memory that survives across sessions

Every conversation feeds a persistent store. The next session loads the right context automatically — no re-explaining.

<p align="center"> <img src="docs/mind-palace-dashboard.png" alt="Mind Palace Dashboard — project state, neural graph, pending TODOs" width="700" /> </p>

The dashboard shows your current project state, pending TODOs, intent health, and a neural knowledge graph — all built automatically from your agent sessions.

Knowledge Graph — semantic + keyword + graph search

Ask "what did I decide about the auth flow last month?" and get an answer with citations, combining vector similarity, full-text search, and graph traversal.

<p align="center"> <img src="docs/knowledge-graph.jpg" alt="Knowledge Graph — 190 keywords, 47 edges, 12 projects visualized" width="500" /> </p>

Session History — immutable audit trail

Every session is logged with files changed, decisions made, and TODOs. Search, filter, and replay any past session.

<p align="center"> <img src="docs/session-ledger.jpg" alt="Session Ledger — 93 sessions, 847 decisions logged across 12 projects" width="700" /> </p>

Inference Metrics — see where your tokens go

Every prism_infer call tracks which model handled it (local Ollama vs cloud) and how many tokens were consumed. When you save a session, Prism shows a summary:

📊 Inference Metrics (this session):
  Total calls: 12 — Local: 10 (83%) | Cloud: 2 (17%)
  Tokens: 8,420 in + 3,150 out = 11,570 total
  Avg latency: 1,240ms
  By model:
    prism-coder:27b: 6 calls, 7,200 tokens, avg 1,800ms
    prism-coder:9b: 4 calls, 2,870 tokens, avg 620ms
    synalux-27b: 2 calls, 1,500 tokens, avg 1,100ms

Local calls use actual Ollama token counts (prompt_eval_count / eval_count from Ollama); cloud calls use char/4 estimates. Metrics are tracked locally — no portal dependency, no env vars, works offline. Per-call data is also forwarded to the Synalux portal as best-effort analytics (independent of the display).

Session Drift Detection

Long agent sessions can wander from their original goal. session_detect_drift compares current work against the stated goal and returns on_track / minor_drift / major_drift so the agent can self-correct.

Behavioral Verification — catch bad edits before they happen

AI agents apply patterns from checklists without understanding the real-world impact. The verify_behavior tool challenges the agent with a scenario it must answer before editing — forcing it to think through what the end user will experience.

Agent: "I'll revert this kitchen display change"
Prism: "⚠️ Scenario: A cook sees a 3-item ticket. One item is voided.
        What should the cook see after the void?"
Agent: "The ticket stays visible with the remaining 2 items."
Prism: "Correct — your revert would hide the ticket entirely."

17 built-in domains (billing, auth, ordering, clinical, HR, and more). Custom domains per workspace on Enterprise. No hooks needed — works in any MCP client.

Time Travel

Roll back to any previous session state. Compare diffs between versions. Restore a known-good state with one click.

<p align="center"> <img src="docs/time-travel-timeline.jpg" alt="Time Travel — version timeline with diff view and one-click restore" width="500" /> </p>

Cognitive Routing

Three memory types, automatically sorted: episodic (what happened — session logs, decisions), semantic (what's true — facts, architecture), and procedural (how to do X — workflows, patterns). When you search, the router picks the right store instead of dumping everything.

Multi-Agent Hivemind

Coordinate multiple AI agents working on the same project. Each agent has its own session, but they share memory through the knowledge graph. The Hivemind Radar shows real-time agent status, tasks, and activity.

<p align="center"> <img src="docs/hivemind-radar.jpg" alt="Hivemind Radar — 5 agents with real-time status, tasks, and activity feed" width="500" /> </p>

Neural Search

Search across all memories with highlighted results, knowledge graph editing, and memory density metrics.

<p align="center"> <img src="docs/v6_cognitive_load_dashboard.jpg" alt="Neural Search with Knowledge Graph Editor and Memory Density" width="500" /> </p>

Local-first and privacy

The free tier runs entirely on your machine. Paid tiers add cloud sync through the Synalux portal, which is what enables cross-device memory and team sharing.

Local tier (free)Cloud tier (paid)
Memory storageLocal SQLiteSynalux portal (Supabase-backed)
InferenceLocal Ollama modelsLocal models + cloud fallback
API keys requiredNoneSynalux subscription key
Web search / scrapeNot includedVia Synalux portal (provider keys server-side)
What leaves your machineNothingMemory text + file paths + search queries, sent to the portal over TLS (PHI-redacted before transit)
Works offlineLocal features yes; sync/cloud no

Handling sensitive data. All cloud writes pass through automatic redaction (SSNs, dates of birth, medical record numbers, phone numbers, emails, and clinical identifiers are stripped before transit). For regulated workloads, run the local tier for full air-gap, or use Enterprise which includes a HIPAA Business Associate Agreement.


Models

The prism-coder fleet uses Qwen3.5 for MCP tool-routing AND general inference. The 9B and 27B are fine-tuned with LoRA (r=128, all 64 layers including DeltaNet); the 2B and 4B use stock Qwen3.5-4B at different quantization levels. The 27B scored 100% on BFCL function-calling and 100% on an internal 15-problem coding eval at $0 inference cost.

prism_infer supports three modes: route (tool routing, fast, nothink), chat (conversation with thinking), and code (code generation with thinking). In chat/code modes, the model uses <think> blocks for chain-of-thought reasoning, which are stripped before the response is served. If the local model fails a quality gate (empty, think-only, or truncated), paid tiers automatically escalate to Claude via the Synalux portal.

ModelOllama tagSizeBFCL AccuracyRoleTier
Qwen3.5-4B Q3_K_Mprism-coder:2b2.3 GB99.1% × 3 seedsiPhone / mobile first gateFree
Qwen3.5-4B Q4_K_Mprism-coder:4b3.4 GB100% × 3 seedsVerifierFree
Qwen3.5-9B (LoRA)prism-coder:9b5.8 GB100% × 3 seedsDefault routerStandard+
Qwen3.5-27B (LoRA)prism-coder:27b16 GB100% × 3 seedsQuality tier (DeltaNet, 28.5 tok/s)Advanced+

Weights: huggingface.co/dcostenco (public GGUF). Latency depends on model size and hardware — see Benchmarks to measure it on your own machine rather than trusting a printed number.

Cascade

query → prism-coder:9b (local router, default)
      → prism-coder:4b (grounding verifier)
      → prism-coder:2b (iPhone / mobile, auto-selected by RAM)
      → prism-coder:27b (complex tasks, on demand)
      → cloud fallback (paid tiers, for max quality)

Multi-Layer Verification

Every tool-grounded answer on paid tiers passes through deterministic L3 routing rules and an NLI grounding verifier before reaching the user. Free-tier users get the deterministic gates (L1, L3-Tool, L3-Tier0) without the model-based NLI check.

LayerWhatModelCost
L1Crisis/medical safety gateNone (regex)0 ms
L3-ToolTool name remap + false-positive rejectionNone (deterministic)0 ms
L3-Tier0Integer grounding (set membership)None (deterministic)0 ms
L3-Tier2NLI verifier (claim → ENTAILED/NEUTRAL/CONTRADICTED)prism-coder:2b~200 ms
L4Hallucination judge (opt-out for clinical)prism-coder:4b~500 ms

Fail-closed on the verified path: when the grounding verifier runs (Standard tier and up), timeout, ambiguity, or missing evidence yields a refusal, not pass-through. Free-tier users get the deterministic L1/L3-Tool gates but not the NLI verifier.


Benchmarks

Reproduce every number yourself. All evals are open-source and self-contained:

git clone https://github.com/dcostenco/prism-coder && cd prism-coder
pip install anthropic requests
python3 tests/benchmarks/prism-routing-100/benchmark.py --models 2b 4b 9b 27b

Routing eval (115 cases, 12 categories, 3-seed mean). Routing accuracy includes the deterministic L3 correction layer — the same rules that run in production. On this narrow tool-routing task all fleet models achieve near-perfect accuracy. Be honest with yourself about what that means: the eval is near-saturated for this taxonomy — it measures whether the right one of a small set of MCP tools is selected, not general capability. The useful takeaway is offline routing reliability at zero cost, not that a 2.3 GB model rivals a frontier model in general.

ModelRouting accuracyNotes
prism-coder:2b (Q3_K_M)99.1% × 3 seeds1 failure: regex→knowledge_search
prism-coder:4b / 9b / 27b100% × 3 seedsPerfect on all 115 cases
Claude (frontier, same eval)~98%Stronger everywhere outside this narrow task

Memory uplift (LoCoMo-Plus, self-published). A separate long-context dialogue benchmark (dcostenco/Locomo-Plus) measures how much structured memo

View source on GitHub