Senior Prompt Engineer

Name: Senior Prompt Engineer
Author: alirezarezvani

Use when the user asks to optimize prompts, design prompt templates, evaluate LLM outputs with an eval set, measure RAG retrieval quality, validate agent/tool configurations, analyze token usage, or design structured-output contracts. Covers eval-driven prompt iteration, RAG met…

pythonaillmragagent

By alirezarezvani

19k 2.7kUpdated 3 days agoPythonMIT

Skill Content

# Senior Prompt Engineer

Eval-driven prompt engineering, RAG quality measurement, and agent workflow validation. Everything here is **model-agnostic by design**: techniques are framed by what they do, not by which model generation they were observed on, and the tools never hardcode model IDs or pricing — you supply your provider's current rates when you want dollar figures.

## Operating Rules

1. **Never change a prompt without a baseline.** Capture metrics first (`--analyze --output baseline.json`), then compare every iteration against it.
2. **Eval set before optimization.** 10–20 representative cases with expected outputs minimum. If the user has no eval set, build one with them before touching the prompt — optimizing against vibes is the #1 failure mode.
3. **Prefer platform features over prompt hacks.** If the provider offers native structured outputs / JSON schema enforcement, tool-use APIs, or prompt caching, use those instead of "respond ONLY with JSON" incantations. Prompt-level format enforcement is the fallback, not the default.
4. **Current-generation models need less scaffolding.** Don't add chain-of-thought boilerplate, role framing, or few-shot examples reflexively — frontier models often do worse with redundant scaffolding. Add each element only when the eval set shows it helps.
5. **Cost numbers are always user-supplied.** Look up the provider's current per-Mtok pricing and pass it via `--price-per-mtok` (never trust a cached price table — including any you remember).

## Tools (exact CLIs, all stdlib)

### 1. Prompt Optimizer — `scripts/prompt_optimizer.py`

Static analysis: token estimate, clarity/structure scores (0–100), ambiguity + redundancy detection, few-shot example extraction.

```bash
# Full analysis (human-readable report)
python3 scripts/prompt_optimizer.py prompt.txt --analyze

# Save machine-readable baseline for later comparison
python3 scripts/prompt_optimizer.py prompt.txt --analyze --json --output baseline.json

# Token estimate; cost only if you supply your provider's current rate
python3 scripts/prompt_optimizer.py prompt.txt --tokens --model claude --price-per-mtok 3.00

# Whitespace/redundancy-trimmed version
python3 scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt

# Extract Input/Output few-shot pairs to JSON
python3 scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json

# Compare a revision against the saved baseline
python3 scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
```

`--model` accepts any string; only the tokenizer family is inferred (names containing "claude" → 3.5 chars/token, otherwise 4.0). Exit 0 on success, 1 on missing file.

### 2. RAG Evaluator — `scripts/rag_evaluator.py`

Measures retrieval and grounding quality from two JSON files (formats printed in `--help`).

```bash
python3 scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
python3 scripts/rag_evaluator.py --contexts ctx.json --questions q.json --k 10 --json
python3 scripts/rag_evaluator.py --contexts ctx.json --questions q.json --output report.json --verbose
python3 scripts/rag_evaluator.py --contexts ctx.json --questions q.json --compare baseline_report.json
```

Reports context relevance, precision@k, coverage, answer faithfulness, groundedness. Treat relevance < 0.80 as a retrieval problem (chunking/embedding/filtering), not a prompt problem — fix retrieval before rewriting the generation prompt.

### 3. Agent Orchestrator — `scripts/agent_orchestrator.py`

Validates agent configs (YAML/JSON): tool wiring, missing required config, loop risk, token estimates.

```bash
python3 scripts/agent_orchestrator.py agent.yaml --validate
python3 scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid
python3 scripts/agent_orchestrator.py agent.yaml --estimate-cost --runs 100 \
    --input-price-per-mtok 3.00 --output-price-per-mtok 15.00
```

Without the two price flags, `--estimate-cost` reports token estimates only. The `model:` field in the config is informational — any model name is accepted.

## Workflows

### Prompt Optimization (eval-gated)

1. **Baseline:** `python3 scripts/prompt_optimizer.py current_prompt.txt --analyze --json --output baseline.json`
2. **Diagnose** from the report: ambiguous verbs ("analyze", "handle"), redundant blocks, missing output contract, token waste.
3. **Apply one change at a time**, in this order of leverage:
   | Symptom | Fix |
   |---------|-----|
   | Malformed/unparseable output | Native structured outputs / JSON schema if the API supports it; explicit schema-in-prompt otherwise |
   | Inconsistent answers across runs | Tighten instructions + add 2–3 contrastive examples (one near-miss showing what NOT to do) |
   | Misses edge cases | Enumerate the edge cases explicitly; add a "when uncertain, do X" rule |
   | Token bloat on repeated calls | Move stable prefix (system rules, examples) first so prompt caching applies; trim redundancy |
   | Wrong reasoning on hard cases | Ask for stepwise reasoning *in a scratch field the consumer ignores*, or use the provider's extended-thinking mode |
4. **Re-analyze and compare:** `python3 scripts/prompt_optimizer.py revised.txt --analyze --compare baseline.json`
5. **Eval gate (must pass before shipping):** run the revised prompt over the eval set, write per-case pass/fail to `eval_results.json`, then assert:
   ```bash
   python3 scripts/prompt_optimizer.py revised.txt --analyze --json --output revised.json \
     && python3 -c "
   import json, sys
   r = json.load(open('revised.json')); b = json.load(open('baseline.json'))
   ok = r['clarity_score'] >= b['clarity_score'] and r['token_count'] <= b['token_count'] * 1.10
   sys.exit(0 if ok else 1)"
   echo "gate exit=$?"   # 0 = ship; 1 = regression, iterate again
   ```
   Pair this structural gate with your task-level eval: the revision must not lose any previously-passing eval case (no-regression rule).

### Few-Shot Example Design

1. Define the task contract first (input shape, output shape, edge-case policy).
2. Start with **zero examples** and measure — current models often need none. Add examples only for failure clusters the eval reveals.
3. When adding: 3–5 max, ordered simple → edge → negative (what NOT to extract), formatted identically to the real output contract.
4. Validate consistency: `python3 scripts/prompt_optimizer.py prompt_with_examples.txt --extract-examples --output examples.json` and inspect that every extracted pair parses against your schema.
5. Re-run the eval set; if a case passes only because it resembles an example, add a held-out variant to the eval set.

### Structured Output Design

1. Write the JSON Schema first (types, enums, required, maxLength).
2. **Prefer API-native enforcement**: structured-outputs / response-schema / tool-call parameters guarantee shape; prompt text cannot.
3. Fallback (API without schema support): include the schema rendered as field-by-field rules + one valid example, and instruct "output only the JSON object".
4. Gate: pipe 10 eval outputs through a schema validator (`python3 -c "import json,sys; [json.loads(l) for l in sys.stdin]"` at minimum); 10/10 must parse, else return to step 2.

### RAG Tuning Loop

1. Build `questions.json` (id, question, reference answer) and capture current retrievals to `contexts.json`.
2. `python3 scripts/rag_evaluator.py --contexts contexts.json --questions questions.json --output rag_baseline.json`
3. Fix the **lowest metric first**: relevance → chunking/embeddings/metadata filters; faithfulness → grounding instructions + "answer only from context" + citation requirement; coverage → retrieval k / query expansion.
4. Gate: `python3 scripts/rag_evaluator.py --contexts new_contexts.json --questions questions.json --compare rag_baseline.json` — every metric must be ≥ baseline; any regression blocks the change.

### Agent Config Review

1. `python3 scripts/agent_orchestrator.py agent.yaml --validate` — must exit with VALIDATION PASSED; fix every error and warning (missing tool config, unbounded iterations, loop risk).
2. Check context discipline: each tool description ≤ 1–2 sentences, tool count minimal for the job, stable system prompt placed first (cache-friendly), iteration cap + early-exit condition present.
3. Budget: `--estimate-cost --runs N` with your current prices; if cost/run exceeds budget, cut tools or context before downgrading the model.

## References

| File | Contains | Load when user asks about |
|------|----------|---------------------------|
| `references/prompt_engineering_patterns.md` | 10 prompt patterns with input/output examples | "which pattern?", few-shot design, decomposition, meta-prompting |
| `references/llm_evaluation_frameworks.md` | Eval metrics, scoring methods, A/B testing | "how to evaluate?", "measure quality", "compare prompts" |
| `references/agentic_system_design.md` | Agent architectures (ReAct, Plan-Execute, Tool Use) | "build agent", "tool calling", "multi-agent" |

## Related Skills

- `engineering-team/skills/senior-ml-engineer` — model deployment and serving (this skill stops at the prompt/eval layer)
- `engineering/rag-architect` — RAG system architecture (this skill measures RAG quality; that one designs the pipeline)
- `engineering/agent-designer` — full agent system design (this skill validates configs; that one designs the architecture)

How to use

Copy the skill content above
Create a .claude/skills directory in your project
Save as .claude/skills/claude-skills-senior-prompt-engineer.md
Use /claude-skills-senior-prompt-engineer in Claude Code to invoke this skill

README

View on GitHub

Claude Code Skills & Plugins — Agent Skills for Every Coding Tool

345 production-ready Claude Code skills, plugins, and agent skills for 13 AI coding tools.

The most comprehensive open-source library of Claude Code skills and agent plugins — also works with OpenAI Codex, Gemini CLI, Cursor, and 9 more coding agents. Reusable expertise packages covering engineering, DevOps, marketing (incl. AEO — Answer Engine Optimization for LLM citation), security (PreToolUse hooks), compliance, C-level advisory (incl. founder-mode CFO/CMO/CRO/CPO/COO/CHRO/CISO/GC/CDO/CAIO/CCO/VPE personas + 21 /cs:* slash commands), productivity (capture/email/reflect), an academic research stack (litreview/grants/dossier/patent/syllabus/pulse/notebooklm + hybrid router), and enterprise Research Operations (clinical-research/research-finance/market-research/product-research, v2.9.0).

Works with: Claude Code · OpenAI Codex · Gemini CLI · OpenClaw · Hermes Agent¹ · Mistral Vibe² · Cursor · Aider · Windsurf · Kilo Code · OpenCode · Augment · Antigravity

5,200+ GitHub stars — the most comprehensive open-source Claude Code skills & agent plugins library.

What Are Claude Code Skills & Agent Plugins?

Claude Code skills (also called agent skills or coding agent plugins) are modular instruction packages that give AI coding agents domain expertise they don't have out of the box. Each skill includes:

SKILL.md — structured instructions, workflows, and decision frameworks
Python tools — 579 CLI scripts (all stdlib-only, zero pip installs)
Reference docs — 702 templates, checklists, and domain-specific knowledge files

One repo, thirteen platforms. Works natively as Claude Code plugins, Codex agent skills, Gemini CLI skills, Hermes Agent skills, Mistral Vibe skills, and converts to more tools via scripts/convert.sh. All 579 Python tools run anywhere Python runs.

Skills vs Agents vs Personas

	Skills	Agents	Personas
Purpose	How to execute a task	What task to do	Who is thinking
Scope	Single domain	Single domain	Cross-domain
Voice	Neutral	Professional	Personality-driven
Example	"Follow these steps for SEO"	"Run a security audit"	"Think like a startup CTO"

All three work together. See Orchestration for how to combine them.

Quick Install

Gemini CLI (New)

# Clone the repository
git clone https://github.com/alirezarezvani/claude-skills.git
cd claude-skills

# Run the setup script
./scripts/gemini-install.sh

# Start using skills
> activate_skill(name="senior-architect")

Claude Code (Recommended)

# Add the marketplace
/plugin marketplace add alirezarezvani/claude-skills

# Install by domain
/plugin install engineering-skills@claude-code-skills          # 24 core engineering
/plugin install engineering-advanced-skills@claude-code-skills  # 25 POWERFUL-tier
/plugin install product-skills@claude-code-skills               # 12 product skills
/plugin install marketing-skills@claude-code-skills             # 43 marketing skills
/plugin install ra-qm-skills@claude-code-skills                 # 12 regulatory/quality
/plugin install pm-skills@claude-code-skills                    # 6 project management
/plugin install c-level-skills@claude-code-skills               # 28 C-level advisory (full C-suite)
/plugin install business-growth-skills@claude-code-skills       # 4 business & growth
/plugin install finance-skills@claude-code-skills               # 2 finance (analyst + SaaS metrics)

# Or install individual skills
/plugin install skill-security-auditor@claude-code-skills       # Security scanner
/plugin install playwright-pro@claude-code-skills                  # Playwright testing toolkit
/plugin install self-improving-agent@claude-code-skills         # Auto-memory curation
/plugin install content-creator@claude-code-skills              # Single skill

OpenAI Codex

npx agent-skills-cli add alirezarezvani/claude-skills --agent codex
# Or: git clone + ./scripts/codex-install.sh

OpenClaw

bash <(curl -s https://raw.githubusercontent.com/alirezarezvani/claude-skills/main/scripts/openclaw-install.sh)

Manual Installation

git clone https://github.com/alirezarezvani/claude-skills.git
# Copy any skill folder to ~/.claude/skills/ (Claude Code) or ~/.codex/skills/ (Codex)

Multi-Tool Support (New)

Convert all 345 skills to 9 AI coding tools with a single script:

Tool	Format	Install
Cursor	`.mdc` rules	`./scripts/install.sh --tool cursor --target .`
Aider	`CONVENTIONS.md`	`./scripts/install.sh --tool aider --target .`
Kilo Code	`.kilocode/rules/`	`./scripts/install.sh --tool kilocode --target .`
Windsurf	`.windsurf/skills/`	`./scripts/install.sh --tool windsurf --target .`
OpenCode	`.opencode/skills/`	`./scripts/install.sh --tool opencode --target .`
Augment	`.augment/rules/`	`./scripts/install.sh --tool augment --target .`
Antigravity	`~/.gemini/antigravity/skills/`	`./scripts/install.sh --tool antigravity`
Hermes Agent	`~/.hermes/skills/`	`python scripts/sync-hermes-skills.py --verbose`
Mistral Vibe	`~/.vibe/skills/`	`./scripts/vibe-install.sh`

How it works:

# 1. Convert all skills to all tools (takes ~15 seconds)
./scripts/convert.sh --tool all

# 2. Install into your project (with confirmation)
./scripts/install.sh --tool cursor --target /path/to/project

# Or use --force to skip confirmation:
./scripts/install.sh --tool aider --target . --force

# 3. Verify
find .cursor/rules -name "*.mdc" | wc -l  # Should show 346

Each tool gets:

✅ All 345 skills converted to native format
✅ Per-tool README with install/verify/update steps
✅ Support for scripts, references, templates where applicable
✅ Zero manual conversion work

Run ./scripts/convert.sh --tool all to generate tool-specific outputs locally.

Skills Overview

345 skills across 17 domains:

Domain	Skills	Highlights	Details
🔧 Engineering — Core	51	Architecture, frontend, backend, fullstack, QA, DevOps, SecOps, AI/ML, data, Playwright Pro (test gen, flaky fix, migrations), self-improving agent (auto-memory curation), security suite, a11y audit	engineering-team/
⚡ Engineering — POWERFUL	78	Agent designer, RAG architect, database designer, CI/CD builder, security auditor, MCP builder, AgentHub, Helm charts, Terraform, self-eval, llm-wiki, tc-tracker, autoresearch-agent, reliability portfolio (feature-flags-architect, kubernetes-operator, chaos-engineering, slo-architect), ship-gate, security-guidance PreToolUse hook, Matt Pocock skills (write-a-skill, caveman, grill-me, handoff, grill-with-docs)	engineering/
🎯 Product	17	Product manager, agile PO, strategist, UX researcher, UI design, landing pages, SaaS scaffolder, analytics, experiment designer, discovery, roadmap communicator, code-to-prd, apple-hig-expert	product-team/
📣 Marketing	46	8 pods: Content, SEO + AEO (`aeo` — E-E-A-T audit, citation tracking across 5 LLMs), CRO, Channels, Growth, Intelligence, Sales + context foundation + orchestration router	marketing-skill/
🚀 Productivity	6	`capture` (brain-dump-to-action), `email` pair (inbox-setup + inbox-triage), `reflect` (journal), `handoff` (Matt Pocock-inspired), `andreessen` (market-first decision mode)	productivity/
🎨 Marketing (top-level)	1	`landing` — single-file HTML landing-page generator (4 design styles, GSAP patterns, brand palette validator)	marketing/
🔬 Research (academic)	8	`research` orchestrator (hybrid router + fallback) + 7 specialists: `pulse`, `litreview`, `grants` (NIH), `dossier`, `patent`, `syllabus`, `notebooklm`	research/
🧪 Research Operations ✨v2.9.0	5	Enterprise/cross-functional research: orchestrator + `clinical-research` (study design), `research-finance` (R&D program finance), `market-research` (sizing/survey/segmentation), `product-research` (user research) — each with onboarding + customization + opt-in autoresearch bridge	research-ops/
📋 Project Management	9	Senior PM, scrum master, Jira, Confluence, Atlassian admin, templates + bundled Atlassian Remote MCP	project-management/
🏥 Regulatory & QM	18	ISO 13485, MDR 2017/745, FDA, ISO 27001, GDPR, SOC 2, CAPA, risk management	ra-qm-team/
🛡️ Compliance OS	9	Compliance operating system — controls, evidence, audit-readiness workflows	compliance-os/
💼 C-Level Advisory	66	Full C-suite (CEO/CTO/CFO/CMO/CRO/CPO/COO/CHRO/CISO/GC/CDO/CAIO/CCO/VPE) + founder-mode agents + orchestration + board meetings + culture & collaboration	c-level-advisor/
📈 Business & Growth	5	Customer success, sales engineer, revenue ops, contracts & proposals, BizDev toolkit	business-growth/
🏭 Business Operations	7	Orchestrator + process-mapper, vendor-management, capacity-planner, internal-comms, knowledge-ops, procurement-optimizer	business-operations/
🤝 Commercial	8	Orchestrator + pricing-strategist, deal-desk, partnerships-architect, channel-economics, commercial-policy, rfp-responder, commercial-forecaster	commercial/
💰 Finance	4	Financial analyst (DCF, budgeting, forecasting), SaaS metrics coach, business investment advisor	finance/

Personas

Pre-configured agent identities with curated skill loadouts, workflows, and distinct communication styles. Personas go beyond "use these skills" — they define how an agent thinks, prioritizes, and communicates.

Persona	Domain	Best For
Startup CTO	Engineering + Strategy	Architecture decisions, tech stack selection, team building, technical due diligence
Growth Marketer	Marketing + Growth	Content-led growth, launch strategy, channel optimization, bootstrapped marketing
Solo Founder	Cross-domain	One-person sta

…

Footnotes

Hermes Agent is BYO-sync tier: the repo ships a pre-generated .hermes/skills/claude-skills/ tree, but you run python scripts/sync-hermes-skills.py once locally to install into ~/.hermes/skills/. Uses the same agentskills.io SKILL.md standard — no format conversion. ↩
Mistral Vibe is also BYO-sync tier: the repo ships a pre-generated .vibe/skills/claude-skills/ tree, run ./scripts/vibe-install.sh once locally to install into ~/.vibe/skills/. Same agentskills.io SKILL.md standard — no format conversion. Docs: https://docs.mistral.ai/mistral-vibe/agents-skills. ↩