![]()
PinRAG
Overview
PinRAG is for when you want to learn about something and your materials are scattered—PDFs and ebooks, GitHub repos, YouTube videos, Discord discussions, and plain notes. You index those materials into one shared RAG index, then ask questions from Cursor, VS Code (GitHub Copilot), or any MCP-capable assistant and get answers with citations pointing back to pages, timestamps, files, or threads.
Under the hood it is Retrieval-Augmented Generation built with LangChain and exposed as an MCP (Model Context Protocol) server: add documents from the editor, query with natural language, list or remove what you indexed. Supported inputs include PDFs, local text files and directories, Discord exports, YouTube (transcript from URL, playlist, or ID), and GitHub repo URLs. For YouTube you can optionally add vision so on-screen code, diagrams, and UI text are merged with the transcript in the same chunks—see YouTube vision enrichment.
Features
- Multi-format indexing — PDF (.pdf), local files or directories, plain text (.txt), Discord export (.txt), YouTube (video or playlist URL, or video ID), GitHub repo (URL), web documentation sites (URL)
- Optional YouTube vision — Off by default. When enabled, runs a vision model (OpenAI, Anthropic, or OpenRouter native video) and merges structured on-screen context with the transcript so RAG chunks carry searchable code names, labels, and diagrams—not speech alone. OpenRouter mode avoids local ffmpeg/video download; openai/anthropic use scene keyframes and require
pinrag[vision]+ ffmpeg (see YouTube vision enrichment) - RAG with citations — Answers cite source context: PDF page, YouTube timestamp, document name for plain text and Discord, chunk index for GitHub repos, source URL for web documentation
- Document tags — Tag documents at index time (e.g.
AMIGA,PI_PICO) for filtered search - Metadata filtering —
query_toolsupportsdocument_id,tag,document_type, PDFpage_min/page_max, andresponse_style(thorough or concise) - MCP tools —
add_document_tool,query_tool,list_documents_tool,remove_document_tool,set_document_tag_tool,list_collections_tool; optionalcollectionon tools overridesPINRAG_COLLECTION_NAMEfor that call - MCP resources —
pinrag://documents(indexed documents) andpinrag://server-config(env vars and config); click in Cursor’s MCP panel to view - MCP prompt —
use_pinrag(parameter: request) for querying, indexing, listing, or removing documents - Configurable LLM — OpenRouter (default, free
openrouter/freerouter), OpenAI, Anthropic, or Cerebras Inference (OpenAI-compatible API); set viaPINRAG_LLM_PROVIDERandPINRAG_LLM_MODELin MCPenvor your shell - Local embeddings — Nomic (
PINRAG_EMBEDDING_MODEL, defaultnomic-embed-text-v1.5); no API key; first run downloads model weights (~270 MB, cached) - Retrieval & chunking options — Structure-aware chunking (on by default); optional FlashRank re-ranking, multi-query expansion, and parent-child chunks for PDFs (see Configuration)
- Observability — MCP tool notifications (
ctx.log) plus optional LangSmith tracing - Built with — LangChain, Chroma; optional OpenRouter, OpenAI, Anthropic, FlashRank
Installation
Add PinRAG as an MCP server in your editor. Install uv and ensure uvx is on your PATH—that runs PinRAG from PyPI without a prior pip install.
Cursor: add this under mcpServers in ~/.cursor/mcp.json:
{
"mcpServers": {
"pinrag": {
"command": "uvx",
"args": ["--refresh", "pinrag"],
"env": {
"OPENROUTER_API_KEY": "your-openrouter-api-key-here",
"PINRAG_PERSIST_DIR": "/absolute/path/to/your/pinrag-data"
}
}
}
}VS Code (GitHub Copilot): run MCP: Open User Configuration from the Command Palette (or add .vscode/mcp.json in a workspace), then merge this shape—top-level key is servers:
{
"servers": {
"pinrag": {
"command": "uvx",
"args": ["--refresh", "pinrag"],
"env": {
"OPENROUTER_API_KEY": "your-openrouter-api-key-here",
"PINRAG_PERSIST_DIR": "/absolute/path/to/your/pinrag-data"
}
}
}
}Quick Start
HTTP server mode
For clients that speak MCP over HTTP (e.g. pinrag-cli with --server), run:
pinrag server [--host 127.0.0.1] [--port 8765]This starts a streamable-HTTP MCP endpoint at http://<host>:<port>/mcp. The default pinrag stdio command for editors is unchanged; pinrag server is additive. Connect pinrag-cli with --server http://127.0.0.1:8765/mcp.
Configure MCP server
Put API keys and any PinRAG settings in the MCP entry’s env block. The server does not load .env files when the editor launches it.
Use in chat
| Action | Tool |
|---|---|
| Index files, directories, or URLs | add_document_tool — required paths: list of local paths (PDFs, plain or DiscordChatExporter .txt, directories) or URLs (YouTube videos, playlist URLs, GitHub repos, web documentation sites; bare YouTube video IDs allowed). Optional tags (one per path). For GitHub URLs only: branch, include_patterns, exclude_patterns. |
| List indexed documents | list_documents_tool — returns documents (IDs), total_chunks, and optional tag filter. document_details may include document_type, tags, page / message / segment counts, titles, aggregated bytes, and upload_timestamp when present in metadata. |
| Query with filters | query_tool — required query. Optional document_id, tag, document_type, page_min / page_max (PDF ranges), response_style (thorough or concise; leave empty to use PINRAG_RESPONSE_STYLE). |
| Remove a document | remove_document_tool — required document_id (exact value from list_documents_tool). |
| View resources (read-only) | In the MCP panel, open Resources and choose pinrag://documents (indexed docs) or pinrag://server-config (effective config, including PINRAG_VERSION). |
Ask in chat: "Add /path/to/amiga-book.pdf with tag AMIGA", "Index https://youtu.be/xyz and ask what it says", "Index https://github.com/owner/repo and ask about the codebase", or "Index https://docs.langchain.com/ and summarize its memory APIs". The AI will invoke the tools for you. Citations show page numbers for PDFs, timestamps (e.g. t. 1:23) for YouTube, document names for plain text and Discord exports, chunk index labels for GitHub, and source URLs for web documentation.
GitHub indexing
Index a repo with add_document_tool and a URL in paths, e.g. https://github.com/owner/repo, https://github.com/owner/repo/tree/branch, or github.com/owner/repo (scheme optional).
GitHub-only options: branch, include_patterns / exclude_patterns — defaults already favor common text and source files and skip bulky artifacts; use patterns when you need files outside that set. Files over PINRAG_GITHUB_MAX_FILE_BYTES (default 512 KiB) are skipped.
Auth: Set GITHUB_TOKEN in MCP env (or the shell) for private repos or fewer rate-limit hits on big indexes; small public runs often work without it. Use a classic or fine-grained PAT with repo read access; there is no OAuth in PinRAG.
Web documentation indexing
Point add_document_tool at any documentation site URL, e.g. https://docs.langchain.com/, https://docs.crewai.com/, or https://picocomputer.github.io/. PinRAG discovers pages via (in order) llms.txt / llms-full.txt (Mintlify-style), sitemap.xml (including robots.txt Sitemap: hints and nested sitemap indexes), then a scoped BFS crawl from the seed URL.
Scope: exact host match (no subdomains) plus path prefix derived from the seed — e.g. https://docs.example.com/guide/ only indexes pages under /guide/. Use the site root URL to capture the full docs tree.
Extraction: text/markdown responses (from llms.txt fast paths) pass through; HTML runs through trafilatura with a BeautifulSoup + markdownify fallback that scopes to <main> / <article> / [role=main].
Limits & politeness: controlled by PINRAG_WEB_MAX_PAGES (default 200), PINRAG_WEB_MAX_DEPTH (5), PINRAG_WEB_MAX_PAGE_BYTES (1 MiB), PINRAG_WEB_CONCURRENCY (4), PINRAG_WEB_RATE_LIMIT_PER_HOST (2.0/sec), and PINRAG_WEB_RESPECT_ROBOTS (true). Some sites (e.g. Cloudflare-protected pages) may return 403 to pure-Python clients; that's a known limitation.
Citations: web chunks carry a source_url metadata field; answers cite per-page URLs, and the document_id is <host><path_prefix> so remove_document_tool / set_document_tag_tool operate on the whole site at once.
YouTube indexing and IP blocking
Transcript-heavy indexing—especially from cloud or high-volume IPs—may return errors like "YouTube is blocking requests from your IP". Point youtube-transcript-api at a proxy via MCP env (or your shell):
PINRAG_YT_PROXY_HTTP_URL=http://user:pass@proxy.example.com:80
PINRAG_YT_PROXY_HTTPS_URL=http://user:pass@proxy.example.com:80PINRAG_YT_PROXY_* affects transcript fetches only; yt-dlp steps (titles, playlists) do not use it. Residential or rotating proxies usually fare better than raw datacenter IPs.
When some paths fail (e.g. a few videos in a playlist), add_document_tool includes fail_summary with counts keyed by blocked, disabled, missing_transcript, and other.
YouTube vision enrichment (optional)
Default indexing is transcript-only. Set PINRAG_YT_VISION_ENABLED=true to add vision captions for on-screen content, time-aligned with the transcript and chunked with metadata such as has_visual, frame_count, and visual_source.
PINRAG_YT_VISION_PROVIDER:
openai(default) oranthropic:yt-dlpdownload → scene-based frames → one multimodal call per frame. Needspinrag[vision], ffmpeg/ffprobe onPATH, andOPENAI_API_KEYorANTHROPIC_API_KEY(install the extra in the same env aspinrag, e.g.uv sync --extra visionorpip install 'pinrag[vision]').openrouter: one OpenRouter request per video viavideo_url(defaultgoogle/gemini-2.5-flash).OPENROUTER_API_KEYonly—no download, ffmpeg, orpinrag[vision]; choose a video-capable model if you overridePINRAG_YT_VISION_MODEL.
Ops: Re-index after changing vision settings. For openai/anthropic, tune cost and timeouts with PINRAG_YT_VISION_MAX_FRAMES and optional PINRAG_YT_VISION_IMAGE_DETAIL=high (clearer small text, more tokens)
…