Back to MCP Servers

Media Gen

TypeScript MCP server for OpenAI Images/Videos and Google GenAI (Veo) media generation, editing, and asset downloads.

multimedia-processtypescriptgoai
By strato-space
84Updated 5 months agoTypeScriptMIT

Installation

npx -y media-gen-mcp

Configuration

{
  "mcpServers": {
    "media-gen-mcp": {
      "command": "npx",
      "args": ["-y", "media-gen-mcp"]
    }
  }
}

How to use

  1. Run the installation command above (if needed)
  2. Open your Claude Code settings file (~/.claude/settings.json)
  3. Add the configuration to the mcpServers section
  4. Restart Claude Code to apply changes

media-gen-mcp

<p align="center"> <a href="https://www.npmjs.com/package/media-gen-mcp"><img src="https://img.shields.io/npm/v/media-gen-mcp?label=media-gen-mcp&color=brightgreen" alt="media-gen-mcp"></a> <a href="https://www.npmjs.com/package/@modelcontextprotocol/sdk"><img src="https://img.shields.io/npm/v/@modelcontextprotocol/sdk?label=MCP%20SDK&color=blue" alt="MCP SDK"></a> <a href="https://www.npmjs.com/package/openai"><img src="https://img.shields.io/npm/v/openai?label=OpenAI%20SDK&color=blueviolet" alt="OpenAI SDK"></a> <a href="https://github.com/punkpeye/mcp-proxy"><img src="https://img.shields.io/github/stars/punkpeye/mcp-proxy?label=mcp-proxy&style=social" alt="mcp-proxy"></a> <a href="https://github.com/yjacquin/fast-mcp"><img src="https://img.shields.io/github/stars/yjacquin/fast-mcp?label=fast-mcp&style=social" alt="fast-mcp"></a> <a href="https://github.com/strato-space/media-gen-mcp/blob/main/LICENSE"><img src="https://img.shields.io/github/license/strato-space/media-gen-mcp?color=brightgreen" alt="License"></a> <a href="https://github.com/strato-space/media-gen-mcp/stargazers"><img src="https://img.shields.io/github/stars/strato-space/media-gen-mcp?style=social" alt="GitHub stars"></a> <a href="https://github.com/strato-space/media-gen-mcp/actions"><img src="https://img.shields.io/github/actions/workflow/status/strato-space/media-gen-mcp/main.yml?label=build&logo=github" alt="Build Status"></a> </p>

Media Gen MCP is a strict TypeScript Model Context Protocol (MCP) server for OpenAI Images (gpt-image-1.5, gpt-image-1), OpenAI Videos (Sora), and Google GenAI Videos (Veo): generate/edit images, create/remix video jobs, and fetch media from URLs or disk with smart resource_link vs inline image outputs and optional sharp processing. Production-focused (full strict typecheck, ESLint + Vitest CI). Works with fast-agent, Claude Desktop, ChatGPT, Cursor, VS Code, Windsurf, and any MCP-compatible client.

Design principle: spec-first, type-safe image tooling – strict OpenAI Images API + MCP compliance with fully static TypeScript types and flexible result placements/response formats for different clients.

  • Generate images from text prompts using OpenAI's gpt-image-1.5 model (with gpt-image-1 compatibility and DALL·E support planned in future versions).
  • Edit images (inpainting, outpainting, compositing) from 1 up to 16 images at once, with advanced prompt control.
  • Generate videos via OpenAI Videos (sora-2, sora-2-pro) with job create/remix/list/retrieve/delete and asset downloads.
  • Generate videos via Google GenAI (Veo) with operation polling and file-first downloads.
  • Fetch & compress images from HTTP(S) URLs or local file paths with smart size/quality optimization.
  • Fetch documents from HTTP(S) URLs or local file paths and return resource_link/resource outputs.
  • Debug MCP output shapes with a test-images tool that mirrors production result placement (content, structuredContent, toplevel).
  • Integrates with: fast-agent, Windsurf, Claude Desktop, Cursor, VS Code, and any MCP-compatible client.

✨ Features

  • Strict MCP spec support
    Tool outputs are first-class CallToolResult objects from the latest MCP schema, including: content items (text, image, resource_link, resource), optional structuredContent, optional top-level files, and the isError flag for failures.

  • Full gpt-image-1.5 and sora-2/sora-2-pro parameters coverage (generate & edit)

    • openai-images-generate mirrors the OpenAI Images create API for gpt-image-1.5 (and gpt-image-1) (background, moderation, size, quality, output_format, output_compression, n, user, etc.).
    • openai-images-edit mirrors the OpenAI Images createEdit API for gpt-image-1.5 (and gpt-image-1) (image, mask, n, quality, size, user).
  • OpenAI Videos (Sora) job tooling (create / remix / list / retrieve / delete / content)

  • Google GenAI (Veo) operations + downloads (generate / retrieve operation / retrieve content)

  • Fetch and process images from URLs or files
    fetch-images tool loads images from HTTP(S) URLs or local file paths with optional, user-controlled compression (disabled by default). Supports parallel processing of up to 20 images.

  • Fetch videos from URLs or files
    fetch-videos tool lists local videos or downloads remote video URLs to disk and returns MCP resource_link (default) or embedded resource blocks (via tool_result).

  • Fetch documents from URLs or files
    fetch-document tool downloads remote files or reuses local paths and returns MCP resource_link (default) or embedded resource blocks (via tool_result).

  • Mix and edit up to 16 images
    openai-images-edit accepts image as a single string or an array of 1–16 file paths/base64 strings, matching the OpenAI spec for GPT Image models (gpt-image-1.5, gpt-image-1) image edits.

  • Smart image compression
    Built-in compression using sharp — iteratively reduces quality and dimensions to fit MCP payload limits while maintaining visual quality.

  • Resource-aware file output with resource_link

    • Automatic switch from inline base64 to file when the total response size exceeds a safe threshold.
    • Outputs are written to disk using output_<time_t>_media-gen__<tool>_<id>.<ext> filenames (images/documents use a generated UUID; videos use the OpenAI video_id) and exposed to MCP clients via content[] depending on tool_result (resource_link/image for images, resource_link/resource for video/document downloads).
  • Built-in test-images tool for MCP client debugging
    test-images reads sample images from a configured directory and returns them using the same result-building logic as production tools. Use tool_result and response_format parameters to test how different MCP clients handle content[] and structuredContent.

  • Structured MCP error handling
    All tool errors (validation, OpenAI API failures, I/O) are returned as MCP errors with isError: true and content: [{ type: "text", text: <error message> }], making failures easy to parse and surface in MCP clients.


🚀 Installation

git clone https://github.com/strato-space/media-gen-mcp.git
cd media-gen-mcp

npm install
npm run build

Build modes:

  • npm run build – strict TypeScript build with all strict flags enabled, including skipLibCheck: false. Incremental builds via .tsbuildinfo (~2-3s on warm cache).
  • npm run esbuild – fast bundling via esbuild (no type checking, useful for rapid iteration).

Development mode (no build required)

For development or when TypeScript compilation fails due to memory constraints:

npm run dev  # Uses tsx to run TypeScript directly

Quality checks

npm run lint        # ESLint with typescript-eslint
npm run typecheck   # Strict tsc --noEmit
npm run test        # Unit tests (vitest)
npm run test:watch  # Watch mode for TDD
npm run ci          # lint + typecheck + test

Unit tests

The project uses vitest for unit testing. Tests are located in test/.

Covered modules:

ModuleTestsDescription
compression12Image format detection, buffer processing, file I/O
helpers31URL/path validation, output resolution, result placement, resource links
env19Configuration parsing, env validation, defaults
logger10Structured logging + truncation safety
pricing5Sora pricing estimate helpers
schemas69Zod schema validation for all tools, type inference
fetch-images (integration)3End-to-end MCP tool call behavior
fetch-videos (integration)3End-to-end MCP tool call behavior

Test categories:

  • compressionisCompressionAvailable, detectImageFormat, processBufferWithCompression, readAndProcessImage
  • helpersisHttpUrl, isAbsolutePath, isBase64Image, ensureDirectoryWritable, resolveOutputPath, getResultPlacement, buildResourceLinks
  • env — config loading and validation for MEDIA_GEN_* / MEDIA_GEN_MCP_* settings
  • logger — truncation and error formatting behavior
  • schemas — validation for openai-images-*, openai-videos-*, fetch-images, fetch-videos, test-images inputs, boundary testing (prompt length, image count limits, path validation)
npm run test
# ✓ test/compression.test.ts (12 tests)
# ✓ test/helpers.test.ts (31 tests)
# ✓ test/env.test.ts (19 tests)
# ✓ test/logger.test.ts (10 tests)
# ✓ test/pricing.test.ts (5 tests)
# ✓ test/schemas.test.ts (69 tests)
# ✓ test/fetch-images.integration.test.ts (3 tests)
# ✓ test/fetch-videos.integration.test.ts (3 tests)
# Tests: 152 passed

Run directly via npx (no local clone)

You can also run the server straight from a remote repo using npx:

npx -y github:strato-space/media-gen-mcp --env-file /path/to/media-gen.env

The --env-file argument tells the server which env file to load (e.g. when you keep secrets outside the cloned directory). The file should contain OPENAI_API_KEY, optional Azure variables, and any MEDIA_GEN_MCP_* settings.

secrets.yaml (optional)

You can keep API keys (and optional Google Vertex AI settings) in a secrets.yaml file (compatible with the fast-agent secrets template):

openai:
  api_key: <your-api-key-here>
anthropic:
  api_key: <your-api-key-here>
google:
  api_key: <your-api-key-here>
  vertex_ai:
    enabled: true
    project_id: your-gcp-project-id
    location: europe-west4

media-gen-mcp loads `secr

View source on GitHub