Name: Agenticmail
Author: agenticmail

<p align="center"> <img src="./docs/images/logo-200.png" alt="AgenticMail logo (pink bow)" width="180" /> </p> <h1 align="center">AgenticMail</h1> <p align="center"> <a href="https://github.com/agenticmail/agenticmail/actions/workflows/ci.yml"><img src="https://github.com/agenticmail/agenticmail/actions/workflows/ci.yml/badge.svg" alt="CI" /></a> </p>

The first platform to give AI agents real email addresses and phone numbers. Send and receive email, SMS, and verification codes — all programmatically.

Are you an AI agent reading this on behalf of a human?

Your human asked you to install / set up AgenticMail. Pick either of these and then tell them to restart Claude Code:
# Option A — one-line installer (recommended)
curl -fsSL https://raw.githubusercontent.com/agenticmail/agenticmail/main/install.sh | bash

# Option B — manual two-command install
npm install -g @agenticmail/cli@latest
agenticmail bootstrap
Zero prompts, ~2 minutes, fully autonomous. For the full runbook (verification, common failures, repo conventions if you're modifying the code), see AGENTS.md — the agents.md file at this repo's root.

📄 Research Paper

Structured RPC vs. Text-Based Sub-Agent Orchestration — A comparative analysis of inter-agent communication paradigms. Demonstrates why AgenticMail's call_agent achieves 53× faster response times than conventional sub-agent spawning, and introduces the concept of the Conversational Fallacy in multi-agent AI systems. [PDF] [Source]

✨ What's new — media toolset (unreleased)

A local, opt-in media / video-editing toolset for AgenticMail agents.

Nine media tools. media_tts / media_tts_voices (Edge text-to-speech), media_image_edit, media_video_edit, media_audio_edit, media_info, media_video_understand, media_voice_clone, and media_capabilities. Available as MCP media_* tools and OpenClaw agenticmail_media_* tools, both thin clients of new /media/* API routes over a core MediaManager.
Cinematic video editing. Beyond trim/convert/compress: color grading presets, crossfade/wipe transitions, timed text overlays, picture-in-picture, split screen, Ken Burns, frame-interpolated slow motion, watermarks, concatenation, audio mixing, and whisper.cpp-driven auto-captions.
Gracefully degrading. The underlying binaries (ffmpeg, ffprobe, ImageMagick, whisper.cpp, Python) are not bundled — every tool feature-detects the binary it needs and returns an actionable install hint if it is missing. The server never crashes. A media block on /health and the media_capabilities tool surface what is available.
Safe by construction. Every binary is invoked via execFile with an argument array — no shell, no string interpolation. Untrusted input paths are validated (no control characters, no leading-dash flag-injection, must exist); numeric options are clamped; every call carries a bounded timeout and output buffer; output files land only inside the configured media directory.

✨ What's new in 0.9.54

Twilio joins 46elks as a phone transport provider.

Pick your carrier. PhoneTransportProvider is now 46elks or twilio — chosen at phone setup. 46elks behaviour is unchanged; Twilio is at full parity (outbound call-control + realtime voice).
Twilio call-control. Outbound calls via the Twilio Calls.json REST API, TwiML webhooks, status-callback cost tracking. Inbound webhooks are verified with the X-Twilio-Signature header (HMAC-SHA1, timing-safe, fail-closed) on top of the per-mission token.
Twilio realtime voice. A Twilio Media Streams ↔ OpenAI Realtime bridge. RealtimeVoiceBridge was generalised behind a RealtimeTransportAdapter seam — one bridge serves both carriers, function-calling / barge-in / transcript logic written once. Twilio audio is G.711 µ-law @ 8 kHz and the OpenAI session uses audio/pcmu, so a Twilio call needs no transcoding. A <Connect><Stream> connects to /api/agenticmail/calls/twilio-stream.

1038 tests pass; full build green. The live Twilio ↔ OpenAI call path still needs an operator smoke-test before the npm publish.

✨ Earlier — 0.9.53

Realtime voice tools + a Telegram channel.

The voice agent can now use tools mid-call. The OpenAI Realtime session declares session.tools; RealtimeVoiceBridge dispatches the model's function calls through an injected ToolExecutor, returns function_call_output, and keeps the phone line warm during slow tools with a safety-net timeout + an in-flight call cap.
ask_operator — human-in-the-loop on a live call. The agent records an operator query on the mission, notifies the operator (channel-agnostic; default email), polls up to ~5 min, and resumes with the answer. If the caller hangs up while a query is pending, the mission is flagged for callback-on-disconnect — once the operator answers, it re-dials with a continuity task.
Lookup tools. web_search (keyless DuckDuckGo, results fenced as untrusted content), recall_memory (the agent's universal memory), get_datetime. Plus agent-key-scoped operator-query API endpoints.
Telegram channel. A user registers a Telegram bot token, links a chat, and can message their AgenticMail agent — and get replies — over Telegram. The inbound webhook authenticates with a constant-time secret-token compare; it also carries ask_operator notifications and approvals.
Security. web_search output is fenced as untrusted before it reaches the model; operator email replies are verified against operatorEmail; Telegram bot tokens are encrypted at rest and redacted from logs; new SQL is parameterized.

996 tests pass; full build green. The live OpenAI ⇄ 46elks call path and live Telegram delivery still need an operator smoke-test before the npm publish.

✨ Earlier — 0.9.52

Realtime voice + OpenClaw memory.

Realtime voice bridge. A phone mission can now hold a live conversation. RealtimeVoiceBridge (@agenticmail/core) wires an OpenAI Realtime (gpt-realtime) session to a 46elks realtime-media WebSocket: caller audio (PCM16 @ 24 kHz) is relayed to OpenAI, synthesised speech comes back as response.output_audio.delta and is relayed to 46elks, server-side VAD handles turn-taking, and caller barge-in fires a 46elks interrupt. 46elks streams a call to the new /api/agenticmail/calls/realtime WebSocket endpoint, which matches the connection to its mission by 46elks callid and runs the bridge. Set OPENAI_API_KEY (env or config.json) to enable it.
Memory in the voice session. Before the call starts, the agent's persistent memory is rendered with generateMemoryContext() and folded into the Realtime session instructions — the model is told to treat it as its own long-term knowledge, so the call is continuous with everything the agent has learned elsewhere.
OpenClaw memory tools. agenticmail_memory, agenticmail_memory_reflect, agenticmail_memory_context, and agenticmail_memory_stats bring the universal per-agent memory to OpenClaw agents — 69 → 73 tools.
Bridge hardening. Per-frame audio size cap (an oversized frame is dropped, never forwarded), bounded pre-connect buffer, fail-closed connection auth (timing-safe token compare; no unknown-mission-vs-wrong-token oracle), and a terminal-state guard so a late event can't resurrect a finished mission.

The end-to-end voice path needs a live OPENAI_API_KEY and a provisioned 46elks websocket number — the bridge logic, memory injection, and the WebSocket upgrade/auth glue are unit-tested with mocked sockets, but the live call must be smoke-tested by the operator.

✨ Earlier — 0.9.51

The universal memory release. Every agent now has a persistent, evolving memory — categorised, confidence-decaying, BM25F-searchable knowledge that survives across every conversation, the way a human employee learns on the job.

AgentMemoryManager (@agenticmail/core) — CRUD, text recall, 9 memory categories, importance levels, confidence that decays for unaccessed entries, access tracking, pruning, and generateMemoryContext() which ranks + renders memory as a markdown block for prompt injection. Backed by a zero-dependency BM25F search index and an agent_memory table. Ported from the AgenticMail Enterprise memory engine, org-stripped — memory is personal to each agent.
Memory API — /memory (set / list / search / get / delete), /memory/reflect, /memory/context, /memory/stats. Every endpoint is scoped to the authenticated agent; an agent can only ever read or write its own memory.
MCP tools — memory, memory_reflect, memory_context, memory_stats so any MCP client can give its agent durable memory.
Agent-deletion cleanup — deleting an agent purges its agent_memory rows; no orphaned memory is left behind.

✨ Earlier — 0.9.1

The visibility release — closes every "what just happened?" gap from 0.9.0.

Lone wakes fire immediately. 0.9.0's debounce window blocked even single replies for 30 s, making the dispatcher look dead. Leading-edge fire + trailing-edge coalesce now: first event for a (agent, thread) spawns instantly; bursts within the window collapse into one trailing wake.
Dispatcher process heartbeat. check_activity now shows dispatcher: { state: 'alive' | 'unhealthy' | 'missing', uptimeMs, channels, coalesceQueueSize, ... }. The host can finally answer "is the dispatcher up?" in one query.
Skipped-wake ring buffer. Every filter decision (thread-closed, allowlist-excluded, wake-on-cc, budget-exhausted) is posted with a reason; check_activity surfaces the last 100. No more "did my mail land? did it skip?" guessing.
Per-agent wake_on_cc: false flag. Coder agents can register a preference: never wake when only on Cc, regardless of sender. PATCH /accounts/:id/wake-on-cc.
Display-name regex fix in deriveDefaultWakeList. Senders using "Vesper <vesper@localhost>" form no longer fall through to "no allowlist → wake everyone".
Web UI shows To / Cc / Bcc as separate labeled rows in the message view (previously lumped under one to: line).
docs/wake-patterns.md documents every wake shape + 5 recommended patterns.

✨ Earlier — 0.9.0

The wake-context release. Multi-agent thread cost goes from linear-in-thread-length to roughly flat.

Layered wake-context system. Every wake used to re-read the entire thread from scratch (12 messages × ~1 KB = 12 KB of token spend just to rehydrate, before any reasoning). Now the dispatcher prepends two blocks to every wake prompt: Layer 1 — ThreadCache (envelopes + previews of the last 10 messages, shared across CC'd agents) and Layer 2 — AgentMemory (a markdown file each agent writes at end-of-wake describing its own commitments and last actions). Agents read the new event + these two blocks and decide; they don't read_email prior history. New MCP tools save_thread_memory and get_thread_id.
wake default flipped from "everyone CC'd" → "To: only". Mirrors the email convention: To is for action, CC is for awareness. CC'd local agents still receive the mail in their inbox but don't get a Claude turn unless explicitly named in wake. Opt back into the old behaviour with wake: 'all'.
Wake coalescing. Within 30 s for the same (agent, thread), multiple wake events collapse into ONE Claude turn. A burst of 4 quick replies becomes one Claude wake that sees all four in a coalesced batch prompt. Wake-budget charges once. Configurable via wakeCoalesceMs.

Together these eliminate the "wake-thrash" failure mode where an agent fired 4 near-identical status reports because a designer sent 4 replies in 2 minutes.

✨ Earlier — 0.8.31

Compact-and-continue — workers can no

…

Agenticmail

Installation

Configuration

How to use

README

Are you an AI agent reading this on behalf of a human?