Back to Skills

Cloudflare Workers Ai

Cloudflare Workers AI for serverless GPU inference. Use for LLMs, text/image generation, embeddings, or encountering AI_ERROR, rate limits, token exceeded errors.

cloudflareaillmembedding
By secondsky
17928Updated 1 day agoTypeScriptMIT

Skill Content

# Cloudflare Workers AI - Complete Reference

Production-ready knowledge domain for building AI-powered applications with Cloudflare Workers AI.

**Status**: Production Ready āœ…
**Last Updated**: 2025-11-21
**Dependencies**: cloudflare-worker-base (for Worker setup)
**Latest Versions**: wrangler@4.43.0, @cloudflare/workers-types@4.20251014.0

---

## Table of Contents

1. [Quick Start (5 minutes)](#quick-start-5-minutes)
2. [Workers AI API Reference](#workers-ai-api-reference)
3. [Model Selection Guide](#model-selection-guide)
4. [Common Patterns](#common-patterns)
5. [AI Gateway Integration](#ai-gateway-integration)
6. [Rate Limits & Pricing](#rate-limits--pricing)
7. [Production Checklist](#production-checklist)

---

## Quick Start (5 minutes)

### 1. Add AI Binding

**wrangler.jsonc:**
```jsonc
{
  "ai": {
    "binding": "AI"
  }
}
```

### 2. Run Your First Model

```typescript
export interface Env {
  AI: Ai;
}

export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
      prompt: 'What is Cloudflare?',
    });

    return Response.json(response);
  },
};
```

### 3. Add Streaming (Recommended)

```typescript
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [{ role: 'user', content: 'Tell me a story' }],
  stream: true, // Always use streaming for text generation!
});

return new Response(stream, {
  headers: { 'content-type': 'text/event-stream' },
});
```

**Why streaming?**
- Prevents buffering large responses in memory
- Faster time-to-first-token
- Better user experience for long-form content
- Avoids Worker timeout issues

---

## Workers AI API Reference

### Core API: `env.AI.run()`

```typescript
const response = await env.AI.run(model, inputs, options?);
```

| Parameter | Type | Description |
|-----------|------|-------------|
| `model` | string | Model ID (e.g., `@cf/meta/llama-3.1-8b-instruct`) |
| `inputs` | object | Model-specific inputs (see model type below) |
| `options.gateway.id` | string | AI Gateway ID for caching/logging |
| `options.gateway.skipCache` | boolean | Skip AI Gateway cache |

**Returns**: `Promise<ModelOutput>` (non-streaming) or `ReadableStream` (streaming)

### Input Types by Model Category

| Category | Key Inputs | Output |
|----------|------------|--------|
| **Text Generation** | `messages[]`, `stream`, `max_tokens`, `temperature` | `{ response: string }` |
| **Embeddings** | `text: string \| string[]` | `{ data: number[][], shape: number[] }` |
| **Image Generation** | `prompt`, `num_steps`, `guidance` | Binary PNG |
| **Vision** | `messages[].content[].image_url` | `{ response: string }` |

šŸ“„ **Full model details**: Load `references/models-catalog.md` for complete model list, parameters, and rate limits.

---

## Model Selection Guide

### Text Generation (LLMs)

| Model | Best For | Rate Limit | Size |
|-------|----------|------------|------|
| `@cf/meta/llama-3.1-8b-instruct` | General purpose, fast | 300/min | 8B |
| `@cf/meta/llama-3.2-1b-instruct` | Ultra-fast, simple tasks | 300/min | 1B |
| `@cf/qwen/qwen1.5-14b-chat-awq` | High quality, complex reasoning | 150/min | 14B |
| `@cf/deepseek-ai/deepseek-r1-distill-qwen-32b` | Coding, technical content | 300/min | 32B |
| `@hf/thebloke/mistral-7b-instruct-v0.1-awq` | Fast, efficient | 400/min | 7B |

### Text Embeddings

| Model | Dimensions | Best For | Rate Limit |
|-------|-----------|----------|------------|
| `@cf/baai/bge-base-en-v1.5` | 768 | General purpose RAG | 3000/min |
| `@cf/baai/bge-large-en-v1.5` | 1024 | High accuracy search | 1500/min |
| `@cf/baai/bge-small-en-v1.5` | 384 | Fast, low storage | 3000/min |

### Image Generation

| Model | Best For | Rate Limit | Speed |
|-------|----------|------------|-------|
| `@cf/black-forest-labs/flux-1-schnell` | High quality, photorealistic | 720/min | Fast |
| `@cf/stabilityai/stable-diffusion-xl-base-1.0` | General purpose | 720/min | Medium |
| `@cf/lykon/dreamshaper-8-lcm` | Artistic, stylized | 720/min | Fast |

### Vision Models

| Model | Best For | Rate Limit |
|-------|----------|------------|
| `@cf/meta/llama-3.2-11b-vision-instruct` | Image understanding | 720/min |
| `@cf/unum/uform-gen2-qwen-500m` | Fast image captioning | 720/min |

---

## Common Patterns

### Pattern 1: Chat with Streaming

```typescript
app.post('/chat', async (c) => {
  const { messages } = await c.req.json<{ messages: Array<{ role: string; content: string }> }>();
  const stream = await c.env.AI.run('@cf/meta/llama-3.1-8b-instruct', { messages, stream: true });
  return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
});
```

### Pattern 2: RAG (Retrieval Augmented Generation)

```typescript
// 1. Generate embedding for query
const embeddings = await env.AI.run('@cf/baai/bge-base-en-v1.5', { text: [userQuery] });
// 2. Search Vectorize
const matches = await env.VECTORIZE.query(embeddings.data[0], { topK: 3 });
// 3. Build context
const context = matches.matches.map((m) => m.metadata.text).join('\n\n');
// 4. Generate with context
const stream = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
  messages: [
    { role: 'system', content: `Answer using this context:\n${context}` },
    { role: 'user', content: userQuery },
  ],
  stream: true,
});
return new Response(stream, { headers: { 'content-type': 'text/event-stream' } });
```

šŸ“„ **More patterns**: Load `references/best-practices.md` for structured output, image generation, multi-model consensus, and production patterns.

---

## AI Gateway Integration

Enable caching, logging, and cost tracking with AI Gateway:

```typescript
const response = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', { prompt: 'Hello' }, {
  gateway: { id: 'my-gateway', skipCache: false },
});
```

**Benefits**: Cost tracking, response caching (50-90% savings on repeated queries), request logging, rate limiting, analytics.

---

## Rate Limits & Pricing

**Information last verified**: 2025-01-14

Rate limits and pricing vary significantly by model. Always check the official documentation for the most current information:

- **Rate Limits**: https://developers.cloudflare.com/workers-ai/platform/limits/
- **Pricing**: https://developers.cloudflare.com/workers-ai/platform/pricing/

**Free Tier**: 10,000 neurons/day
**Paid Tier**: $0.011 per 1,000 neurons

šŸ“„ **Per-model details**: See `references/models-catalog.md` for specific rate limits and pricing for each model.

---

## Production Checklist

**Essential before deploying:**
- [ ] Enable AI Gateway for cost tracking
- [ ] Implement streaming for text generation
- [ ] Add rate limit retry with exponential backoff
- [ ] Validate input length (prevent token limit errors)
- [ ] Add input sanitization (prevent prompt injection)

šŸ“„ **Full checklist**: Load `references/best-practices.md` for complete production checklist, error handling patterns, monitoring, and cost optimization.

---

## External SDK Integrations

Workers AI supports OpenAI SDK compatibility and Vercel AI SDK:

```typescript
// OpenAI SDK - use same patterns with Workers AI models
const openai = new OpenAI({
  apiKey: env.CLOUDFLARE_API_KEY,
  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,
});

// Vercel AI SDK - native integration
import { createWorkersAI } from 'workers-ai-provider';
const workersai = createWorkersAI({ binding: env.AI });
```

šŸ“„ **Full integration guide**: Load `references/integrations.md` for OpenAI SDK, Vercel AI SDK, and REST API examples.

---

## Limits Summary

| Feature | Limit |
|---------|-------|
| Concurrent requests | No hard limit (rate limits apply) |
| Max input tokens | Varies by model (typically 2K-128K) |
| Max output tokens | Varies by model (typically 512-2048) |
| Streaming chunk size | ~1 KB |
| Image size (output) | ~5 MB |
| Request timeout | Workers timeout applies (30s default, 5m max CPU) |
| Daily free neurons | 10,000 |
| Rate limits | See "Rate Limits & Pricing" section |

---

## When to Load References

| Reference File | Load When... |
|----------------|--------------|
| `references/models-catalog.md` | Choosing a model, checking rate limits, comparing model capabilities |
| `references/best-practices.md` | Production deployment, error handling, cost optimization, security |
| `references/integrations.md` | Using OpenAI SDK, Vercel AI SDK, or REST API instead of native binding |

---

## References

- [Workers AI Docs](https://developers.cloudflare.com/workers-ai/)
- [Models Catalog](https://developers.cloudflare.com/workers-ai/models/)
- [AI Gateway](https://developers.cloudflare.com/ai-gateway/)
- [Pricing](https://developers.cloudflare.com/workers-ai/platform/pricing/)

How to use

  1. Copy the skill content above
  2. Create a .claude/skills directory in your project
  3. Save as .claude/skills/claude-skills-cloudflare-workers-ai.md
  4. Use /claude-skills-cloudflare-workers-ai in Claude Code to invoke this skill

Claude Code Skills Collection

170 production-ready skills for Claude Code CLI

Version 3.3.1 | Last Updated: 2026-05-14

<div align="center">

šŸ”Œ Platform Support

This repository uses Claude Plugin Patterns — natively supported by:

PlatformStatusNotes
Claude Codeāœ… NativeFull marketplace support
Factory Droidāœ… NativeFull marketplace support
</div> **For all other Platforms like opencode, codex and others, you can use https://github.com/enulus/OpenPackage **

A curated collection of battle-tested skills for building modern web applications with Cloudflare, AI integrations, React, Tailwind, and more.

PS: if skills.sh warns about any skill: Their scan process is a outdated LLM which flags newest versions pins (like in ZOD) as non existent and by that potentially malicous.


Quick Start

Marketplace Installation (Recommended)

# Add the marketplace
/plugin marketplace add https://github.com/secondsky/claude-skills

# Install individual skills as needed
/plugin install cloudflare-d1@claude-skills
/plugin install tailwind-v4-shadcn@claude-skills
/plugin install ai-sdk-core@claude-skills

See MARKETPLACE.md for complete catalog of all 170 skills.

Bulk Installation (Contributors)

# Clone the repository
git clone https://github.com/secondsky/claude-skills.git
cd claude-skills

# Install all 170 skills at once
./scripts/install-all.sh

# Or install individual skills
./scripts/install-skill.sh cloudflare-d1

Repository Structure

This repository contains 170 production-tested skills for Claude Code, each focused on a specific technology or capability.

Individual Skills: Each skill is a standalone unit with:

  • SKILL.md - Core knowledge and guidance
  • Templates - Working code examples
  • References - Extended documentation
  • Scripts - Helper utilities

Installation Options:

  1. Individual - Install only the skills you need via marketplace
  2. Bulk - Install all 170 skills using ./scripts/install-all.sh

Available Skills (170 Individual Skills)

Each skill is individually installable. Install only the skills you need.

Full Catalog: See MARKETPLACE.md for detailed listings.

Categories

CategorySkillsExamples
tooling29turborepo, plan-interview, code-review
frontend26nuxt-v4, nuxt-v5, tailwind-v4-shadcn, tanstack-query, nuxt-studio, maz-ui, threejs
cloudflare21cloudflare-d1, cloudflare-workers-ai, cloudflare-agents
ai20openai-agents, claude-api, ai-sdk-core
api16api-design-principles, graphql-implementation
web10hono-routing, firecrawl-scraper, web-performance
mobile7swift-best-practices, react-native-app, react-native-skills
database6drizzle-orm-d1, neon-vercel-postgres, supabase-postgres-best-practices
security6csrf-protection, access-control-rbac
auth4better-auth
testing4vitest-testing, playwright-testing
design4design-review, design-system-creation
woocommerce4woocommerce-backend-dev
cms4hugo, sveltia-cms, wordpress-plugin-core
architecture3microservices-patterns, architecture-patterns
data3sql-query-optimization, recommendation-engine
seo2seo-optimizer, seo-keyword-cluster-builder
documentation1technical-specification

How It Works

Auto-Discovery

Claude Code automatically checks ~/.claude/skills/ for relevant skills before planning tasks:

User: "Set up a Cloudflare Worker with D1 database"
           ↓
Claude: [Checks skills automatically]
           ↓
Claude: "Found cloudflare-d1 skills.
         These prevent 12 documented errors. Use them?"
           ↓
User: "Yes"
           ↓
Result: Production-ready setup, zero errors, ~65% token savings

Note: Due to token limits, not all skills may be visible at once. See āš ļø Important: Token Limits below.

Skill Structure

Each skill includes:

skills/[skill-name]/
ā”œā”€ā”€ SKILL.md              # Complete documentation
ā”œā”€ā”€ .claude-plugin/
│   └── plugin.json       # Plugin metadata
ā”œā”€ā”€ templates/            # Ready-to-copy templates
ā”œā”€ā”€ scripts/              # Automation scripts
└── references/           # Extended documentation

Recent Additions

May 2026

Supply Chain Security (cross-cutting):

  • dependency-upgrade expanded with Socket CLI integration — proactive malicious package detection, typosquatting alerts, and CI/CD security gates. New 418-line reference guide, 2 GitHub Actions templates, and expanded supply chain security comparison (3 tools)
  • 31 skills now include "Secure Installation" guidance — contextually-tailored security sections across all high-risk skill categories (scaffolding, MCP/agent SDKs, multi-provider installs, Docker, CI/CD). Covers 8 Bun skills, 5 Nuxt skills, 6 Cloudflare skills, 4 AI/agent skills, and 8 frontend/tooling skills
  • Supply chain security is now a first-class cross-cutting concern woven into the skill collection — not a standalone topic

February - April 2026

Full-Stack Frameworks:

  • nuxt-v5 (v1.0.0) - Full Nuxt 5 support with 4 skills (core, data, server, production), 3 diagnostic agents, and interactive setup wizard
  • supabase-postgres-best-practices - 30 Postgres optimization rules from Supabase across 8 categories
  • threejs (v1.0.0) - 3D web graphics: scenes, geometries, shaders, animations, post-processing

Infrastructure:

  • JSON schema validation - Automated plugin.json validation with CI support
  • GitHub issue templates - Skill-specific issue templates for bug reports, feature requests, and submissions

Plugin Enhancements:

  • mutation-testing - Added Bun native runner support
  • dependency-upgrade - Added supply chain security content

December 2025 - January 2026

Frontend Expansion:

  • nuxt-studio (v1.0.0) - Visual CMS for Nuxt Content with live preview, OAuth auth, and R2 storage integration
  • maz-ui (v1.0.0) - 50+ Vue/Nuxt components with theming, i18n, form generation, and 14 composables

Developer Workflow:

  • plan-interview (v2.0.0) - Adaptive interview-driven spec generation with autonomous quality review
  • turborepo (v2.8.0) - Updated to official Vercel skill with enhanced monorepo build optimization

Mobile Development:

  • react-native-skills (v1.0.0) - React Native & Expo best practices with performance optimization patterns

Enhanced Authentication:

  • better-auth (v2.2.0) - Expanded to 18 framework integrations with 30+ authentication plugins

āš ļø Important: Token Limits

Skill Visibility Constraint

Claude Code has a 15,000 character limit for the total size of skill descriptions in the system prompt. This limit also applies to commands and agents.

What this means:

  • Not all 170 skills may be visible in Claude's context at once
  • Skills are loaded based on relevance and available token budget
  • You can verify how many skills Claude currently sees by asking: "How many skills do you see in your system prompt?"

Checking Visible Skills

To verify which skills are currently loaded:

# Ask Claude Code directly
"Check what skills/plugins you see in your system prompt"

Claude will report something like: "85 of 170 skills visible due to token limits"

Workaround: Increase Token Budget

You can double the headroom for skill descriptions by setting an environment variable:

# Increase limit to 30,000 characters
export SLASH_COMMAND_TOOL_CHAR_BUDGET=30000

# Then launch Claude Code
claude

This gives you approximately 2x more skill visibility in the system prompt.

Note: This is a temporary workaround. The Claude Code team is working on better solutions for skill discovery and loading.


Token Efficiency

MetricManual SetupWith SkillsSavings
Average Tokens12,000-15,0004,000-5,000~65%
Typical Errors2-4 per service0 (prevented)100%
Setup Time2-4 hours15-45 minutes~80%

Across all 170 skills: 400+ documented errors prevented.


Contributing

Prerequisites for Contributors

Install the official plugin development toolkit:

/plugin install plugin-dev@claude-code-marketplace

This provides:

  • /plugin-dev:create-plugin command (8-phase guided workflow)
  • 7 comprehensive skills (hooks, MCP, structure, agents, commands, skills)
  • 2 specialized agents (agent-creator, plugin-validator)

Quick Steps

  1. Create skill directory in plugins/
  2. Add SKILL.md with YAML frontmatter
  3. Run ./scripts/sync-plugins.sh
  4. Submit pull request

See CONTRIBUTING.md and PLUGIN_DEV_BEST_PRACTICES.md for detailed guidelines.


Documentation

DocumentPurpose
START_HERE.mdStart here! Quick navigation guide
PLUGIN_DEV_BEST_PRACTICES.mdRepository-specific best practices (marketplace, budget, quality)
MARKETPLACE.mdFull skill catalog and installation guide
MARKETPLACE_MANAGEMENT.mdTechnical infrastructure (plugin.json, scripts, validation)
CLAUDE.mdProject context and development standards
CONTRIBUTING.mdContribution guidelines

Links


Built with ā¤ļø by Claude Skills Maintainers

View source on GitHub