Back to MCP Servers

Ai Distiller

Extracts essential code structure from large codebases into AI-digestible format, helping AI agents write code that correctly uses existing APIs on the first attempt.

developer-toolsapiaiagent
By janreges
16119Updated 1 month agoCMIT

Installation

npx -y ai-distiller

Configuration

{
  "mcpServers": {
    "ai-distiller": {
      "command": "npx",
      "args": ["-y", "ai-distiller"]
    }
  }
}

How to use

  1. Run the installation command above (if needed)
  2. Open your Claude Code settings file (~/.claude/settings.json)
  3. Add the configuration to the mcpServers section
  4. Restart Claude Code to apply changes

AI Distiller (aid)

Note: This is the very first version of this tool. We would be very grateful for any feedback in the form of a discussion or by creating an issue on GitHub. Thank you!

πŸš€ MCP Server Available: Install the Model Context Protocol server for AI Distiller from NPM: @janreges/ai-distiller-mcp - seamlessly integrate with Claude, Cursor, and other MCP-compatible AI tools!

<p align="center"> <img src="docs/assets/aid-mascot-300.png" alt="AI Distiller (aid) Mascot" width="200"> </p> <p align="center"> <img src="https://img.shields.io/badge/Languages-12+-blue" alt="12+ Languages"> <img src="https://img.shields.io/badge/Performance-5k+_files/sec-green" alt="Performance"> <img src="https://img.shields.io/badge/Compression-90%25+-orange" alt="Compression"> <img src="https://img.shields.io/badge/Tests-1211_passed-purple" alt="Tests"> </p

πŸ€” Why AI Distiller?

Do you work with large-scale projects that have thousands of files and functions? Do you struggle with AI tools like Claude Code, Gemini, Copilot, or Cursor frequently "hallucinating" and generating code that looks correct at first glance but is actually incompatible with your project?

The problem is context. AI models have a limited context window and cannot comprehend your entire codebase. Instead, AI agents search files, "grep" for keywords, look at a few lines before and after the found term, and try (often, but not always) to guess the interface of your classes and functions. The result? Code full of errors that guesses parameters, returns incorrect data types, and ignores the existing architecture. If you are a sophisticated user of AI agents (vibe coder), you know that you can help yourself by instructing the AI ​​agent to consistently write and run tests, using static code analysis, pre-commit hooks, etc. - the AI ​​agent will usually fix the code itself, but in the meantime it will take 20 steps and 5 minutes. On the other hand, it must be admitted that if you pay for each AI request (and large context is an expensive factor) and are not "playing for time", you may not mind this limited context approach.

AI Distiller (or aid for short) helps solve this problem. Its main function is code "distillation" – a process where it extracts only the most essential information from the entire project (ideally from the main source folder, or a specific module subdirectory for extremely large projects) that the AI needs to write code correctly on the first try. This distillation usually generates a context that is only 5-20% of the original source code volume, allowing AI tools to include it in their context. As a result, the AI uses the existing code exactly as it was designed, not by trial and error.

Very simply, it can be said that aid, within the distillation process, will leave only the public parts of the interface, input and output data types, but in the default state it will discard method implementations and non-public structures. But everything is configurable via CLI Options.

Table of Contents

✨ Key Features

FeatureDescription
πŸš€ Extreme SpeedProcesses tens of megabytes of code in hundreds of milliseconds. By default, it uses 80% of available CPU cores, but can be configured, e.g., with --workers=1 to use only a single CPU core.
🧠 Intelligent DistillationUnderstands 12+ programming languages and extracts only public APIs (methods, properties, types).
βš™οΈ High ConfigurabilityAllows including private, protected, and internal members, implementation, or comments.
πŸ€– AI Prompt GenerationGenerates ready-to-use prompts with distilled code for AI analysis. The tool creates files with prompts that AI agents can then execute for security audits, refactoring, etc. See --ai-action switch.
πŸ“‹ Analysis AutomationCreates a complete checklist and directory structure for AI agents, who can then systematically analyze the entire project. See the flow-for-* actions for the --ai-action switch.
πŸ“œ Git AnalysisProcesses commit history and prepares data for in-depth analysis of development quality and team dynamics.
πŸ’» Multi-platformA single binary file with no dependencies for Windows, Linux, and macOS (x64 & ARM).
πŸ”Œ Integration via MCPCan be integrated into tools like Claude Code, VS Code, Cursor, Windsurf and others thanks to the included MCP server.

🎯 Intelligent Filtering

Control exactly what to include with our new granular flag system:

Visibility Control:

  • --public=1 (default) - Include public members
  • --protected=0 (default) - Exclude protected members
  • --internal=0 (default) - Exclude internal/package-private
  • --private=0 (default) - Exclude private members

Content Control:

  • --comments=0 (default) - Exclude comments
  • --docstrings=1 (default) - Include documentation
  • --implementation=0 (default) - Exclude function/methods bodies
  • --imports=1 (default) - Include import/use statements

Default behavior: Shows only public API signatures with basic documentation - perfect for AI understanding while maintaining maximum compression.

πŸ€– AI-Powered Analysis Prompt Generation

AI Distiller generates specialized prompts combined with distilled code for AI-driven analysis:

  • --ai-action=flow-for-deep-file-to-file-analysis - Generates task lists and prompts for systematic file-by-file analysis
  • --ai-action=flow-for-multi-file-docs - Creates documentation workflow prompts with code structure
  • Output to files - Prompts are saved to .aid/ directory (or use --stdout for small codebases)
  • Ready for AI execution - Generated files contain both the analysis prompt and distilled code
  • AI agent instructions - Output includes guidance for AI agents to read and process the generated files
  • Gemini advantage - 1M token context window perfect for larger codebase analysis

Note: AI Distiller doesn't perform the analysis itself - it prepares optimized prompts that AI agents (Claude, Gemini, ChatGPT) then execute. Users often need to explicitly ask their AI agent to process the generated file or copy its contents to web-based AI tools.

πŸ“ Multiple Output Formats

  • Text (--format text) - Ultra-compact for AI consumption (default)
  • Markdown (--format md) - Clean, structured Markdown
  • JSON Structured (--format json-structured) - Rich semantic data for tools
  • JSONL (--format jsonl) - Streaming format
  • XML (--format xml) - Legacy system compatible

πŸ“Š Smart Summary Output

After each distillation, AI Distiller displays a summary showing compression efficiency and processing speed:

# Default: Visual progress bar for interactive terminals (green dots = saved, red dots = remaining)
✨ Distilled 970 files [β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘] 98% (10M β†’ 256K) in 231ms πŸ’° ~2.4M tokens saved (~64k remaining)

# Choose your preferred format with --summary-type
aid ./src --summary-type=stock-ticker
πŸ“Š AID 97.6% β–² β”‚ SIZE: 10Mβ†’256K β”‚ TIME: 231ms β”‚ EST: ~2.4M tokens saved

# JSON output
aid ./src --summary-type=json

{
  "original_bytes": 70020,
  "distilled_bytes": 8244,
  "savings_pct": 88.22622107969151,
  "duration_ms": 6,
  "tokens_before": 17505,
  "tokens_after": 2061,
  "tokens_saved": 15444,
  "token_savings_pct": 88.22622107969151,
  "file_count": 9,
  "output_path": "/home/user/project/.aid/aid.processor.txt",
  "tokenizer": "cl100k_base"
}

Available formats:

  • visual-progress-bar (default) - Shows compression as a progress bar
  • stock-ticker - Compact stock market style display
  • speedometer-dashboard - Multi-line dashboard with metrics
  • minimalist-sparkline - Single line with all essential info
  • ci-friendly - Clean format for CI/CD pipelines
  • json - Machine-readable JSON output
  • off - Disable summary output

Use --no-emoji to remove emojis from any format.

πŸ“ Smart Project Root Detection

AI Distiller automatically detects your project root and centralizes all outputs in a .aid/ directory:

  • Automatic detection: Searches upward for .aidrc, go.mod, package.json, .git, etc.
  • Consistent location: All outputs go to <project-root>/.aid/ regardless of where you run aid
  • Cache management: MCP cache stored in .aid/cache/ for better organization
  • Easy cleanup: Add .aid/ to .gitignore to keep outputs out of version control

Detection priority:

  1. .aidrc file - Create this empty file to explicitly mark your project root
  2. Language markers - go.mod, package.json, pyproject.toml, etc.
  3. Version control - .git directory
  4. Environment variable - AID_PROJECT_ROOT (fallback if no markers found)
  5. Current directory - Final fallback with warning
# Mark a specific directory as project root (recommended)
touch /my/project/.aidrc

# Run from anywhere in your project - outputs always go to project root
cd deep/nested/directory
aid ../../../src  # Output: <project-root>/.aid/aid.src.txt

# Use environment variable as fallback (useful for CI/CD)
AID_PROJECT_ROOT=/build/workspace aid src/

🌍 Language Support

Currently supports 12 languages via tree-sitter:

  • Full Support: Python, Go, JavaScript, PHP, Ruby
  • Beta: TypeScript, Java, C#, Rust, Kotlin, Swift, C++
  • Coming Soon: Zig, Scala, Clojure

Language-Specific Documentation:

  • C++ - C++11/14/17/20 support with templates, namespaces, modern features
  • C# - Complete C# 12 support with records, nullable reference types, pattern matching
  • Go - Full Go support with interfaces, goroutines, generics (1.18+)
  • Java - Java 8-21 support with records, sealed classes, pattern matching
  • JavaScript - ES6+ support with classes, modules, async/await
  • Kotlin - Kotlin 1.x support with coroutines, data classes, sealed classes
  • PHP - PHP 7.4+ with PHP 8.x features (attributes, union types, enums)
  • Python - Full Python 3.x support with type hints, async/await, decorators
  • Ruby - Ruby 2.x/3.x support with blocks, modules, metaprogramming
  • Rust - Rust 2018/2021 editions with traits, lifetimes, async
  • Swift - Swift 5.x support with protocols, extensions, property wrappers
  • TypeScript - TypeScript 4.x/5.x with generics, decorators, type system

🎯 How It Works

  1. Scans your codebase recursively for supported file types (10+ languages)
  2. Parses each file using language-specific tree-sitter parsers (all bundled, no dependencies)
  3. Extracts only what you need: public APIs, type signatures, class hierarchies

…

View source on GitHub