Name: Ai Distiller
Author: janreges

AI Distiller (`aid`)

Note: This is the very first version of this tool. We would be very grateful for any feedback in the form of a discussion or by creating an issue on GitHub. Thank you!

🚀 MCP Server Available: Install the Model Context Protocol server for AI Distiller from NPM: @janreges/ai-distiller-mcp - seamlessly integrate with Claude, Cursor, and other MCP-compatible AI tools!

<p align="center"> <img src="docs/assets/aid-mascot-300.png" alt="AI Distiller (aid) Mascot" width="200"> </p> <p align="center"> <img src="https://img.shields.io/badge/Languages-12+-blue" alt="12+ Languages"> <img src="https://img.shields.io/badge/Performance-5k+_files/sec-green" alt="Performance"> <img src="https://img.shields.io/badge/Compression-90%25+-orange" alt="Compression"> <img src="https://img.shields.io/badge/Tests-1211_passed-purple" alt="Tests"> </p

🤔 Why AI Distiller?

Do you work with large-scale projects that have thousands of files and functions? Do you struggle with AI tools like Claude Code, Gemini, Copilot, or Cursor frequently "hallucinating" and generating code that looks correct at first glance but is actually incompatible with your project?

The problem is context. AI models have a limited context window and cannot comprehend your entire codebase. Instead, AI agents search files, "grep" for keywords, look at a few lines before and after the found term, and try (often, but not always) to guess the interface of your classes and functions. The result? Code full of errors that guesses parameters, returns incorrect data types, and ignores the existing architecture. If you are a sophisticated user of AI agents (vibe coder), you know that you can help yourself by instructing the AI agent to consistently write and run tests, using static code analysis, pre-commit hooks, etc. - the AI agent will usually fix the code itself, but in the meantime it will take 20 steps and 5 minutes. On the other hand, it must be admitted that if you pay for each AI request (and large context is an expensive factor) and are not "playing for time", you may not mind this limited context approach.

AI Distiller (or aid for short) helps solve this problem. Its main function is code "distillation" – a process where it extracts only the most essential information from the entire project (ideally from the main source folder, or a specific module subdirectory for extremely large projects) that the AI needs to write code correctly on the first try. This distillation usually generates a context that is only 5-20% of the original source code volume, allowing AI tools to include it in their context. As a result, the AI uses the existing code exactly as it was designed, not by trial and error.

Very simply, it can be said that aid, within the distillation process, will leave only the public parts of the interface, input and output data types, but in the default state it will discard method implementations and non-public structures. But everything is configurable via CLI Options.

✨ Key Features

Feature	Description
🚀 Extreme Speed	Processes tens of megabytes of code in hundreds of milliseconds. By default, it uses 80% of available CPU cores, but can be configured, e.g., with `--workers=1` to use only a single CPU core.
🧠 Intelligent Distillation	Understands 12+ programming languages and extracts only public APIs (methods, properties, types).
⚙️ High Configurability	Allows including private, protected, and internal members, implementation, or comments.
🤖 AI Prompt Generation	Generates ready-to-use prompts with distilled code for AI analysis. The tool creates files with prompts that AI agents can then execute for security audits, refactoring, etc. See `--ai-action` switch.
📋 Analysis Automation	Creates a complete checklist and directory structure for AI agents, who can then systematically analyze the entire project. See the flow-for-* actions for the `--ai-action` switch.
📜 Git Analysis	Processes commit history and prepares data for in-depth analysis of development quality and team dynamics.
💻 Multi-platform	A single binary file with no dependencies for Windows, Linux, and macOS (x64 & ARM).
🔌 Integration via MCP	Can be integrated into tools like Claude Code, VS Code, Cursor, Windsurf and others thanks to the included MCP server.

🎯 Intelligent Filtering

Control exactly what to include with our new granular flag system:

Visibility Control:

--public=1 (default) - Include public members
--protected=0 (default) - Exclude protected members
--internal=0 (default) - Exclude internal/package-private
--private=0 (default) - Exclude private members

Content Control:

--comments=0 (default) - Exclude comments
--docstrings=1 (default) - Include documentation
--implementation=0 (default) - Exclude function/methods bodies
--imports=1 (default) - Include import/use statements

Default behavior: Shows only public API signatures with basic documentation - perfect for AI understanding while maintaining maximum compression.

🤖 AI-Powered Analysis Prompt Generation

AI Distiller generates specialized prompts combined with distilled code for AI-driven analysis:

--ai-action=flow-for-deep-file-to-file-analysis - Generates task lists and prompts for systematic file-by-file analysis
--ai-action=flow-for-multi-file-docs - Creates documentation workflow prompts with code structure
Output to files - Prompts are saved to .aid/ directory (or use --stdout for small codebases)
Ready for AI execution - Generated files contain both the analysis prompt and distilled code
AI agent instructions - Output includes guidance for AI agents to read and process the generated files
Gemini advantage - 1M token context window perfect for larger codebase analysis

Note: AI Distiller doesn't perform the analysis itself - it prepares optimized prompts that AI agents (Claude, Gemini, ChatGPT) then execute. Users often need to explicitly ask their AI agent to process the generated file or copy its contents to web-based AI tools.

📝 Multiple Output Formats

Text (--format text) - Ultra-compact for AI consumption (default)
Markdown (--format md) - Clean, structured Markdown
JSON Structured (--format json-structured) - Rich semantic data for tools
JSONL (--format jsonl) - Streaming format
XML (--format xml) - Legacy system compatible

📊 Smart Summary Output

After each distillation, AI Distiller displays a summary showing compression efficiency and processing speed:

# Default: Visual progress bar for interactive terminals (green dots = saved, red dots = remaining)
✨ Distilled 970 files [░░░░░░░░░░░░░░░] 98% (10M → 256K) in 231ms 💰 ~2.4M tokens saved (~64k remaining)

# Choose your preferred format with --summary-type
aid ./src --summary-type=stock-ticker
📊 AID 97.6% ▲ │ SIZE: 10M→256K │ TIME: 231ms │ EST: ~2.4M tokens saved

# JSON output
aid ./src --summary-type=json

{
  "original_bytes": 70020,
  "distilled_bytes": 8244,
  "savings_pct": 88.22622107969151,
  "duration_ms": 6,
  "tokens_before": 17505,
  "tokens_after": 2061,
  "tokens_saved": 15444,
  "token_savings_pct": 88.22622107969151,
  "file_count": 9,
  "output_path": "/home/user/project/.aid/aid.processor.txt",
  "tokenizer": "cl100k_base"
}

Available formats:

visual-progress-bar (default) - Shows compression as a progress bar
stock-ticker - Compact stock market style display
speedometer-dashboard - Multi-line dashboard with metrics
minimalist-sparkline - Single line with all essential info
ci-friendly - Clean format for CI/CD pipelines
json - Machine-readable JSON output
off - Disable summary output

Use --no-emoji to remove emojis from any format.

📁 Smart Project Root Detection

AI Distiller automatically detects your project root and centralizes all outputs in a .aid/ directory:

Automatic detection: Searches upward for .aidrc, go.mod, package.json, .git, etc.
Consistent location: All outputs go to <project-root>/.aid/ regardless of where you run aid
Cache management: MCP cache stored in .aid/cache/ for better organization
Easy cleanup: Add .aid/ to .gitignore to keep outputs out of version control

Detection priority:

.aidrc file - Create this empty file to explicitly mark your project root
Language markers - go.mod, package.json, pyproject.toml, etc.
Version control - .git directory
Environment variable - AID_PROJECT_ROOT (fallback if no markers found)
Current directory - Final fallback with warning

# Mark a specific directory as project root (recommended)
touch /my/project/.aidrc

# Run from anywhere in your project - outputs always go to project root
cd deep/nested/directory
aid ../../../src  # Output: <project-root>/.aid/aid.src.txt

# Use environment variable as fallback (useful for CI/CD)
AID_PROJECT_ROOT=/build/workspace aid src/

🌍 Language Support

Currently supports 12 languages via tree-sitter:

Full Support: Python, Go, JavaScript, PHP, Ruby
Beta: TypeScript, Java, C#, Rust, Kotlin, Swift, C++
Coming Soon: Zig, Scala, Clojure

Language-Specific Documentation:

C++ - C++11/14/17/20 support with templates, namespaces, modern features
C# - Complete C# 12 support with records, nullable reference types, pattern matching
Go - Full Go support with interfaces, goroutines, generics (1.18+)
Java - Java 8-21 support with records, sealed classes, pattern matching
JavaScript - ES6+ support with classes, modules, async/await
Kotlin - Kotlin 1.x support with coroutines, data classes, sealed classes
PHP - PHP 7.4+ with PHP 8.x features (attributes, union types, enums)
Python - Full Python 3.x support with type hints, async/await, decorators
Ruby - Ruby 2.x/3.x support with blocks, modules, metaprogramming
Rust - Rust 2018/2021 editions with traits, lifetimes, async
Swift - Swift 5.x support with protocols, extensions, property wrappers
TypeScript - TypeScript 4.x/5.x with generics, decorators, type system

🎯 How It Works

Scans your codebase recursively for supported file types (10+ languages)
Parses each file using language-specific tree-sitter parsers (all bundled, no dependencies)
Extracts only what you need: public APIs, type signatures, class hierarchies

…

Ai Distiller

Installation

Configuration

How to use

README