Back to MCP Servers

Webpeel

Smart web fetcher for AI agents with auto-escalation from HTTP to headless browser to stealth mode. Includes 9 MCP tools: fetch, search, crawl, map, extract, batch, screenshot, jobs, and agent. Achieved 100% success rate on a 30-URL benchmark.

search-data-extractionbrowseraiagent
By webpeel
1110Updated todayTypeScriptNOASSERTION

Installation

npx -y webpeel

Configuration

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel"]
    }
  }
}

How to use

  1. Run the installation command above (if needed)
  2. Open your Claude Code settings file (~/.claude/settings.json)
  3. Add the configuration to the mcpServers section
  4. Restart Claude Code to apply changes
<p align="center"> <a href="https://webpeel.dev"> <img src=".github/banner.svg" alt="WebPeel — Web data API for AI agents" width="100%"> </a> </p> <p align="center"> <a href="https://www.npmjs.com/package/webpeel"><img src="https://img.shields.io/npm/v/webpeel.svg?style=flat-square" alt="npm version"></a> <a href="https://www.npmjs.com/package/webpeel"><img src="https://img.shields.io/npm/dm/webpeel.svg?style=flat-square" alt="npm downloads"></a> <a href="https://github.com/webpeel/webpeel/stargazers"><img src="https://img.shields.io/github/stars/webpeel/webpeel?style=flat-square" alt="GitHub stars"></a> <a href="https://github.com/webpeel/webpeel/actions/workflows/ci.yml"><img src="https://github.com/webpeel/webpeel/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-WebPeel%20SDK-blue.svg?style=flat-square" alt="License"></a> </p> <h3 align="center">The web data layer for AI agents.<br>Fetch, search, crawl, extract, screenshot — one call, zero boilerplate.</h3> <p align="center"> <a href="#quick-start">Quick Start</a> · <a href="#agent-native-integrations">Agent Integrations</a> · <a href="https://webpeel.dev/docs">Docs</a> · <a href="https://webpeel.dev/playground">Playground</a> · <a href="https://app.webpeel.dev/signup">Get API Key</a> </p> <p align="center"> <img src=".github/readme-demo.svg" alt="WebPeel demo showing agent-friendly web fetch input, automatic engine selection, and clean JSON output" width="100%"> </p>

The Problem

Every AI agent that touches the web rebuilds the same brittle stack: HTTP fetch → headless browser → anti-bot bypass → HTML cleanup → markdown conversion → token budgeting. Each layer fails differently. Sites change. Cloudflare rotates challenges. Your agent gets empty strings at 2 AM and your pipeline breaks.

WebPeel replaces that entire stack with one function call. It handles engine selection, anti-bot escalation, domain-specific extraction, and token optimization so your agent gets clean, structured data every time — without managing browsers, proxies, or parsing logic.


Quick Start

# Zero-install — just run it
npx webpeel "https://example.com"

# Search the web
npx webpeel search "latest AI agent frameworks"

# Crawl an entire site
npx webpeel crawl docs.example.com --max-pages 50

# Screenshot any page
npx webpeel screenshot "https://stripe.com/pricing" --full-page

# Ask a question about any page
npx webpeel ask "https://arxiv.org/abs/2401.00001" "What is the main contribution?"

Or install globally:

npm install -g webpeel

Use as a library:

import { peel } from 'webpeel';

const result = await peel('https://news.ycombinator.com');
console.log(result.markdown);   // Clean markdown, ready for your LLM
console.log(result.metadata);   // Title, tokens saved, timing, etc.

Use via API:

curl "https://api.webpeel.dev/v1/fetch?url=https://stripe.com/pricing" \
  -H "Authorization: Bearer $WEBPEEL_API_KEY"
{
  "url": "https://stripe.com/pricing",
  "markdown": "# Stripe Pricing\n\n**Integrated per-transaction fees**...",
  "metadata": {
    "title": "Pricing & Fees | Stripe",
    "tokens": 420,
    "tokensOriginal": 8200,
    "savingsPct": 94.9
  }
}

Get your free API key → · No credit card required · 500 requests/week free


Why WebPeel

🧠 55+ Domain Extractors — Not Just HTML-to-Markdown

Generic scrapers convert raw HTML to markdown and call it a day. WebPeel has purpose-built extractors for 55+ domains — Reddit, GitHub, YouTube, Amazon, ArXiv, Hacker News, Wikipedia, StackOverflow, Zillow, Polymarket, ESPN, and more. Each extractor understands the site's structure and returns clean, structured data without browser rendering.

⚡ 65–98% Token Savings

Domain extractors strip navigation, ads, sidebars, and boilerplate before content reaches your agent. Less context consumed = lower costs, faster inference, and longer agent chains.

SiteRaw HTML tokensWebPeel tokensSavings
News article18,00064096%
Reddit thread24,00089096%
Wikipedia page31,0002,10093%
GitHub README5,2001,80065%
E-commerce product14,00031098%

🔄 6-Layer Engine Escalation

WebPeel doesn't just try one method — it automatically escalates through 6 engines until it gets a good result:

Simple HTTP → Domain API → Browser render → Stealth browser → Cloaked browser → Search cache fallback

No manual --render flags for most sites. WebPeel knows which sites need JavaScript, which need stealth, and which have anti-bot protection — and picks the right engine automatically.

🔌 Firecrawl-Compatible Migration Path

Already using Firecrawl-style workflows? WebPeel supports compatible /v1/scrape, /v2/scrape, /v1/crawl, /v1/search, and /v1/map endpoints, which makes migration dramatically easier than rebuilding your pipeline from scratch.


Agent-Native Integrations

MCP Server (Claude, Cursor, Windsurf, VS Code)

Give any MCP-compatible AI the ability to browse, search, and extract from the web.

{
  "mcpServers": {
    "webpeel": {
      "command": "npx",
      "args": ["-y", "webpeel", "mcp"],
      "env": { "WEBPEEL_API_KEY": "wp_your_key_here" }
    }
  }
}

7 MCP tools exposed: webpeel_read · webpeel_find · webpeel_see · webpeel_extract · webpeel_monitor · webpeel_act · webpeel_crawl

Full MCP setup guide →

LangChain

import { WebPeelLoader } from 'webpeel/integrations/langchain';

const loader = new WebPeelLoader({ url: 'https://example.com', render: true });
const docs = await loader.load();

LlamaIndex

import { WebPeelReader } from 'webpeel/integrations/llamaindex';

const reader = new WebPeelReader();
const docs = await reader.loadData('https://example.com');

Python SDK

pip install webpeel
from webpeel import WebPeel

wp = WebPeel(api_key="wp_...")
result = wp.fetch("https://example.com")
print(result.markdown)

Full Feature Set

CapabilityCLIAPIDetails
Fetch & extractwebpeel "url"GET /v1/fetchClean markdown from any URL
Web searchwebpeel search "query"GET /v1/searchDuckDuckGo (free) or Brave (BYOK)
Smart searchPOST /v1/search/smartAI-powered structured results
Crawl siteswebpeel crawl "url"POST /v1/crawlDepth/page limits, rate control
Screenshotswebpeel screenshot "url"POST /v1/screenshotFull-page, multi-viewport, visual diff, filmstrip
Structured extraction--extract-schemaPOST /v1/extractJSON schema → structured data
Q&Awebpeel ask "url" "q"POST /v1/answerAnswer questions about any page
Deep researchPOST /v1/deep-researchMulti-query autonomous research
Content monitoringwebpeel monitor "url"POST /v1/watchChange detection with webhooks
Browser sessionsPOST /v1/sessionPersistent sessions for login flows
Browser actions--action 'click:.btn'actions fieldClick, type, scroll, wait
Batch scrapewebpeel batch filePOST /v1/batch/scrapeParallel multi-URL processing
URL discoverywebpeel map "url"POST /v1/mapSitemap and link discovery
YouTube transcriptsauto-detectedauto-detectedMultiple export formats
PDF extractionauto-detectedauto-detectedText, tables, structure
Research agentPOST /v1/agentAutonomous multi-step research

Use Cases for Agent Builders

RAG pipelines — Fetch docs, articles, or entire sites as clean markdown ready for chunking, embedding, and retrieval.

Price monitoring — Track product pages across major commerce sites with structured extraction and change detection.

Competitive intel — Monitor competitor pages, pricing tables, and job boards. Visual diff screenshots catch layout changes CSS selectors would miss.

Research agents — Give Claude, Codex, Cursor, or your own agent grounded web access through the API or MCP server.

Lead enrichment — Pull company details, public links, and page structure from business sites without writing per-site parsers.

Content aggregation — Crawl and extract from communities, docs sites, and publications with domain-native extractors that understand each site's structure.


Architecture

Your Agent
    ↓
 WebPeel (npm / API / MCP)
    ↓
┌─────────────────────────────────┐
│  Engine Ranker                  │
│  HTTP → Domain API → Browser   │
│  → Stealth → Cloaked → Cache   │
├─────────────────────────────────┤
│  55+ Domain Extractors          │
│  reddit · github · youtube      │
│  amazon · arxiv · zillow · ...  │
├─────────────────────────────────┤
│  Content Pipeline               │
│  Readability → Turndown →       │
│  Token budgeting → Chunking     │
└─────────────────────────────────┘
    ↓
 Clean markdown / structured JSON

Reliability

WebPeel is built for production agent workflows, not just one-off demos.

  • Automated evals in-repo — smart search and fetch eval suites ship with the codebase
  • Post-deploy gate — critical checks run before calling a deploy healthy
  • Engine fallback chain — when one fetch method fails, WebPeel escalates instead of giving up
  • Multiple surfaces, one core — CLI, API, SDK, and MCP all ride the same extraction pipeline

Security

  • SSRF protection — blocks localhost, private IPs, metadata endpoints, file:// schemes
  • Helmet.js — HSTS, X-Frame-Options, nosniff, XSS protection on all responses
  • Webhook signing — HMAC-SHA256 on all outbound webhooks
  • API key hashing — SHA-256 with granular scopes
  • Rate limiting — sliding window, per-tier
  • Audit logging — every API call logged with IP, key, and action
  • GDPR compliantDELETE /v1/account for full data erasure Security policy → · SLA (99.9% uptime) →

Why teams choose WebPeel instead of stitching a stack together

ApproachWhat it gives youWhere it breaks down
Raw HTTP + HTML parsingCheap, simple fetchesFalls apart on JS-heavy sites, anti-bot pages, and noisy HTML
Pure browser automationMaximum controlExpensive, slow, fragile, and high-maintenance for large-scale use
Search-only APIsGreat discoveryWeak page extraction, limited structured output, limited downstream actions
Single-purpose scrapersFast on one jobYou end up composing 4–6 tools for real agent workflows
WebPeelFetch + search + crawl + extraction + screenshots + monitoring in one layerOpinionated toward agent workflows rather than generic scraping

Links

📖 Documentation · 💰 Pricing · 🎮 Playground · 📝 Blog · 💬 Discussions · 🚀 Releases · 📊 Status · 🔒 Security · 📋 Changelog


Contributing

Pull requests welcome. Please open an issue first to discuss major changes.

git clone https://github.com/webpeel/webpeel.git
cd webpeel && npm install
npm run build && npm test

License

WebPeel SDK License — free for personal and commercial use with attribution.

<p align="center"> <a href="https://app.webpeel.dev/signup"><strong>Get started

View source on GitHub