Name: Pinrag
Author: ndjordjevic

PinRAG

Overview

PinRAG is for when you want to learn about something and your materials are scattered—PDFs and ebooks, GitHub repos, YouTube videos, Discord discussions, and plain notes. You index those materials into one shared RAG index, then ask questions from Cursor, VS Code (GitHub Copilot), or any MCP-capable assistant and get answers with citations pointing back to pages, timestamps, files, or threads.

Under the hood it is Retrieval-Augmented Generation built with LangChain and exposed as an MCP (Model Context Protocol) server: add documents from the editor, query with natural language, list or remove what you indexed. Supported inputs include PDFs, local text files and directories, Discord exports, YouTube (transcript from URL, playlist, or ID), and GitHub repo URLs. For YouTube you can optionally add vision so on-screen code, diagrams, and UI text are merged with the transcript in the same chunks—see YouTube vision enrichment.

Features

Multi-format indexing — PDF (.pdf), local files or directories, plain text (.txt), Discord export (.txt), YouTube (video or playlist URL, or video ID), GitHub repo (URL), web documentation sites (URL)
Optional YouTube vision — Off by default. When enabled, runs a vision model (OpenAI, Anthropic, or OpenRouter native video) and merges structured on-screen context with the transcript so RAG chunks carry searchable code names, labels, and diagrams—not speech alone. OpenRouter mode avoids local ffmpeg/video download; openai/anthropic use scene keyframes and require pinrag[vision] + ffmpeg (see YouTube vision enrichment)
RAG with citations — Answers cite source context: PDF page, YouTube timestamp, document name for plain text and Discord, chunk index for GitHub repos, source URL for web documentation
Document tags — Tag documents at index time (e.g. AMIGA, PI_PICO) for filtered search
Metadata filtering — query_tool supports document_id, tag, document_type, PDF page_min/page_max, and response_style (thorough or concise)
MCP tools — add_document_tool, query_tool, list_documents_tool, remove_document_tool, set_document_tag_tool, list_collections_tool; optional collection on tools overrides PINRAG_COLLECTION_NAME for that call
MCP resources — pinrag://documents (indexed documents) and pinrag://server-config (env vars and config); click in Cursor’s MCP panel to view
MCP prompt — use_pinrag (parameter: request) for querying, indexing, listing, or removing documents
Configurable LLM — OpenRouter (default, free openrouter/free router), OpenAI, Anthropic, or Cerebras Inference (OpenAI-compatible API); set via PINRAG_LLM_PROVIDER and PINRAG_LLM_MODEL in MCP env or your shell
Local embeddings — Nomic (PINRAG_EMBEDDING_MODEL, default nomic-embed-text-v1.5); no API key; first run downloads model weights (~270 MB, cached)
Retrieval & chunking options — Structure-aware chunking (on by default); optional FlashRank re-ranking, multi-query expansion, and parent-child chunks for PDFs (see Configuration)
Observability — MCP tool notifications (ctx.log) plus optional LangSmith tracing
Built with — LangChain, Chroma; optional OpenRouter, OpenAI, Anthropic, FlashRank

Installation

Add PinRAG as an MCP server in your editor. Install uv and ensure uvx is on your PATH—that runs PinRAG from PyPI without a prior pip install.

Cursor: add this under mcpServers in ~/.cursor/mcp.json:

{
  "mcpServers": {
    "pinrag": {
      "command": "uvx",
      "args": ["--refresh", "pinrag"],
      "env": {
        "OPENROUTER_API_KEY": "your-openrouter-api-key-here",
        "PINRAG_PERSIST_DIR": "/absolute/path/to/your/pinrag-data"
      }
    }
  }
}

VS Code (GitHub Copilot): run MCP: Open User Configuration from the Command Palette (or add .vscode/mcp.json in a workspace), then merge this shape—top-level key is servers:

{
  "servers": {
    "pinrag": {
      "command": "uvx",
      "args": ["--refresh", "pinrag"],
      "env": {
        "OPENROUTER_API_KEY": "your-openrouter-api-key-here",
        "PINRAG_PERSIST_DIR": "/absolute/path/to/your/pinrag-data"
      }
    }
  }
}

Quick Start

HTTP server mode

For clients that speak MCP over HTTP (e.g. pinrag-cli with --server), run:

pinrag server [--host 127.0.0.1] [--port 8765]

This starts a streamable-HTTP MCP endpoint at http://<host>:<port>/mcp. The default pinrag stdio command for editors is unchanged; pinrag server is additive. Connect pinrag-cli with --server http://127.0.0.1:8765/mcp.

Configure MCP server

Put API keys and any PinRAG settings in the MCP entry’s env block. The server does not load .env files when the editor launches it.

Use in chat

Action	Tool
Index files, directories, or URLs	`add_document_tool` — required `paths`: list of local paths (PDFs, plain or DiscordChatExporter `.txt`, directories) or URLs (YouTube videos, playlist URLs, GitHub repos, web documentation sites; bare YouTube video IDs allowed). Optional `tags` (one per path). For GitHub URLs only: `branch`, `include_patterns`, `exclude_patterns`.
List indexed documents	`list_documents_tool` — returns `documents` (IDs), `total_chunks`, and optional `tag` filter. `document_details` may include `document_type`, tags, page / message / segment counts, titles, aggregated `bytes`, and `upload_timestamp` when present in metadata.
Query with filters	`query_tool` — required `query`. Optional `document_id`, `tag`, `document_type`, `page_min` / `page_max` (PDF ranges), `response_style` (`thorough` or `concise`; leave empty to use `PINRAG_RESPONSE_STYLE`).
Remove a document	`remove_document_tool` — required `document_id` (exact value from `list_documents_tool`).
View resources (read-only)	In the MCP panel, open Resources and choose `pinrag://documents` (indexed docs) or `pinrag://server-config` (effective config, including `PINRAG_VERSION`).

Ask in chat: "Add /path/to/amiga-book.pdf with tag AMIGA", "Index https://youtu.be/xyz and ask what it says", "Index https://github.com/owner/repo and ask about the codebase", or "Index https://docs.langchain.com/ and summarize its memory APIs". The AI will invoke the tools for you. Citations show page numbers for PDFs, timestamps (e.g. t. 1:23) for YouTube, document names for plain text and Discord exports, chunk index labels for GitHub, and source URLs for web documentation.

GitHub indexing

Index a repo with add_document_tool and a URL in paths, e.g. https://github.com/owner/repo, https://github.com/owner/repo/tree/branch, or github.com/owner/repo (scheme optional).

GitHub-only options: branch, include_patterns / exclude_patterns — defaults already favor common text and source files and skip bulky artifacts; use patterns when you need files outside that set. Files over PINRAG_GITHUB_MAX_FILE_BYTES (default 512 KiB) are skipped.

Auth: Set GITHUB_TOKEN in MCP env (or the shell) for private repos or fewer rate-limit hits on big indexes; small public runs often work without it. Use a classic or fine-grained PAT with repo read access; there is no OAuth in PinRAG.

Web documentation indexing

Point add_document_tool at any documentation site URL, e.g. https://docs.langchain.com/, https://docs.crewai.com/, or https://picocomputer.github.io/. PinRAG discovers pages via (in order) llms.txt / llms-full.txt (Mintlify-style), sitemap.xml (including robots.txt Sitemap: hints and nested sitemap indexes), then a scoped BFS crawl from the seed URL.

Scope: exact host match (no subdomains) plus path prefix derived from the seed — e.g. https://docs.example.com/guide/ only indexes pages under /guide/. Use the site root URL to capture the full docs tree.

Extraction: text/markdown responses (from llms.txt fast paths) pass through; HTML runs through trafilatura with a BeautifulSoup + markdownify fallback that scopes to <main> / <article> / [role=main].

Limits & politeness: controlled by PINRAG_WEB_MAX_PAGES (default 200), PINRAG_WEB_MAX_DEPTH (5), PINRAG_WEB_MAX_PAGE_BYTES (1 MiB), PINRAG_WEB_CONCURRENCY (4), PINRAG_WEB_RATE_LIMIT_PER_HOST (2.0/sec), and PINRAG_WEB_RESPECT_ROBOTS (true). Some sites (e.g. Cloudflare-protected pages) may return 403 to pure-Python clients; that's a known limitation.

Citations: web chunks carry a source_url metadata field; answers cite per-page URLs, and the document_id is <host><path_prefix> so remove_document_tool / set_document_tag_tool operate on the whole site at once.

YouTube indexing and IP blocking

Transcript-heavy indexing—especially from cloud or high-volume IPs—may return errors like "YouTube is blocking requests from your IP". Point youtube-transcript-api at a proxy via MCP env (or your shell):

PINRAG_YT_PROXY_HTTP_URL=http://user:pass@proxy.example.com:80
PINRAG_YT_PROXY_HTTPS_URL=http://user:pass@proxy.example.com:80

PINRAG_YT_PROXY_* affects transcript fetches only; yt-dlp steps (titles, playlists) do not use it. Residential or rotating proxies usually fare better than raw datacenter IPs.

When some paths fail (e.g. a few videos in a playlist), add_document_tool includes fail_summary with counts keyed by blocked, disabled, missing_transcript, and other.

YouTube vision enrichment (optional)

Default indexing is transcript-only. Set PINRAG_YT_VISION_ENABLED=true to add vision captions for on-screen content, time-aligned with the transcript and chunked with metadata such as has_visual, frame_count, and visual_source.

PINRAG_YT_VISION_PROVIDER:

openai (default) or anthropic: yt-dlp download → scene-based frames → one multimodal call per frame. Needs pinrag[vision], ffmpeg/ffprobe on PATH, and OPENAI_API_KEY or ANTHROPIC_API_KEY (install the extra in the same env as pinrag, e.g. uv sync --extra vision or pip install 'pinrag[vision]').
openrouter: one OpenRouter request per video via video_url (default google/gemini-2.5-flash). OPENROUTER_API_KEY only—no download, ffmpeg, or pinrag[vision]; choose a video-capable model if you override PINRAG_YT_VISION_MODEL.

Ops: Re-index after changing vision settings. For openai/anthropic, tune cost and timeouts with PINRAG_YT_VISION_MAX_FRAMES and optional PINRAG_YT_VISION_IMAGE_DETAIL=high (clearer small text, more tokens)

…

Pinrag

Installation

Configuration

How to use

README

PinRAG

Overview

Features

Installation

Quick Start

HTTP server mode

Configure MCP server

Use in chat

GitHub indexing

Web documentation indexing

YouTube indexing and IP blocking

YouTube vision enrichment (optional)

You might also like