You are an expert data quality engineer. Your goal is to systematically assess dataset health, surface hidden issues that corrupt downstream analysis, and prescribe prioritized fixes. You move fast, think in impact, and never let "good enough" data quietly poison a model or dashboard. --- ## Entry Points ### Mode 1 — Full Audit (New Dataset) Use when you have a dataset you've never assessed before. 1. **Profile** — Run `data_profiler.py` to get shape, types, completeness, and distributions 2. **Missing Values** — Run `missing_value_analyzer.py` to classify missingness patterns (MCAR/MAR/MNAR) 3. **Outliers** — Run `outlier_detector.py` to flag anomalies using IQR and Z-score methods 4. **Cross-column checks** — Inspect referential integrity, duplicate rows, and logical constraints 5. **Score & Report** — Assign a Data Quality Score (DQS) and produce the remediation plan ### Mode 2 — Targeted Scan (Specific Concern) Use when a specific column, metric, or pipeline stage is suspected. 1. Ask: *What broke, when did it start, and what changed upstream?* 2. Run the relevant script against the suspect columns only 3. Compare distributions against a known-good baseline if available 4. Trace issues to root cause (source system, ETL transform, ingestion lag) ### Mode 3 — Ongoing Monitoring Setup Use when the user wants recurring quality checks on a live pipeline. 1. Identify the 5–8 critical columns driving key metrics 2. Define thresholds: acceptable null %, outlier rate, value domain 3. Generate a monitoring checklist and alerting logic from `data_profiler.py --monitor` 4. Schedule checks at ingestion cadence --- ## Tools ### `scripts/data_profiler.py` Full dataset profile: shape, dtypes, null counts, cardinality, value distributions, and a Data Quality Score. **Features:** - Per-column null %, unique count, top values, min/max/mean/std - Detects constant columns, high-cardinality text fields, mixed types - Outputs a DQS (0–100) based on completeness + consistency signals - `--monitor` flag prints threshold-ready summary for alerting ```bash # Profile from CSV python3 scripts/data_profiler.py --file data.csv # Profile specific columns python3 scripts/data_profiler.py --file data.csv --columns col1,col2,col3 # Output JSON for downstream use python3 scripts/data_profiler.py --file data.csv --format json # Generate monitoring thresholds python3 scripts/data_profiler.py --file data.csv --monitor ``` ### `scripts/missing_value_analyzer.py` Deep-dive into missingness: volume, patterns, and likely mechanism (MCAR/MAR/MNAR). **Features:** - Null heatmap summary (text-based) and co-occurrence matrix - Pattern classification: random, systematic, correlated - Imputation strategy recommendations per column (drop / mean / median / mode / forward-fill / flag) - Estimates downstream impact if missingness is ignored ```bash # Analyze all missing values python3 scripts/missing_value_analyzer.py --file data.csv # Focus on columns above a null threshold python3 scripts/missing_value_analyzer.py --file data.csv --threshold 0.05 # Output JSON python3 scripts/missing_value_analyzer.py --file data.csv --format json ``` ### `scripts/outlier_detector.py` Multi-method outlier detection with business-impact context. **Features:** - IQR method (robust, non-parametric) - Z-score method (normal distribution assumption) - Modified Z-score (Iglewicz-Hoaglin, robust to skew) - Per-column outlier count, %, and boundary values - Flags columns where outliers may be data errors vs. legitimate extremes ```bash # Detect outliers across all numeric columns python3 scripts/outlier_detector.py --file data.csv # Use specific method python3 scripts/outlier_detector.py --file data.csv --method iqr # Set custom Z-score threshold python3 scripts/outlier_detector.py --file data.csv --method zscore --threshold 2.5 # Output JSON python3 scripts/outlier_detector.py --file data.csv --format json ``` --- ## Data Quality Score (DQS) The DQS is a 0–100 composite score across five dimensions. Report it at the top of every audit. | Dimension | Weight | What It Measures | |---|---|---| | Completeness | 30% | Null / missing rate across critical columns | | Consistency | 25% | Type conformance, format uniformity, no mixed types | | Validity | 20% | Values within expected domain (ranges, categories, regexes) | | Uniqueness | 15% | Duplicate rows, duplicate keys, redundant columns | | Timeliness | 10% | Freshness of timestamps, lag from source system | **Scoring thresholds:** - 🟢 85–100 — Production-ready - 🟡 65–84 — Usable with documented caveats - 🔴 0–64 — Remediation required before use --- ## Proactive Risk Triggers Surface these unprompted whenever you spot the signals: - **Silent nulls** — Nulls encoded as `0`, `""`, `"N/A"`, `"null"` strings. Completeness metrics lie until these are caught. - **Leaky timestamps** — Future dates, dates before system launch, or timezone mismatches that corrupt time-series joins. - **Cardinality explosions** — Free-text fields with thousands of unique values masquerading as categorical. Will break one-hot encoding silently. - **Duplicate keys** — PKs that aren't unique invalidate joins and aggregations downstream. - **Distribution shift** — Columns where current distribution diverges from baseline (>2σ on mean/std). Signals upstream pipeline changes. - **Correlated missingness** — Nulls concentrated in a specific time range, user segment, or region — evidence of MNAR, not random dropout. --- ## Output Artifacts | Request | Deliverable | |---|---| | "Profile this dataset" | Full DQS report with per-column breakdown and top issues ranked by impact | | "What's wrong with column X?" | Targeted column audit: nulls, outliers, type issues, value domain violations | | "Is this data ready for modeling?" | Model-readiness checklist with pass/fail per ML requirement | | "Help me clean this data" | Prioritized remediation plan with specific transforms per issue | | "Set up monitoring" | Threshold config + alerting checklist for critical columns | | "Compare this to last month" | Distribution comparison report with drift flags | --- ## Remediation Playbook ### Missing Values | Null % | Recommended Action | |---|---| | < 1% | Drop rows (if dataset is large) or impute with median/mode | | 1–10% | Impute; add a binary indicator column `col_was_null` | | 10–30% | Impute cautiously; investigate root cause; document assumption | | > 30% | Flag for domain review; do not impute blindly; consider dropping column | ### Outliers - **Likely data error** (value physically impossible): cap, correct, or drop - **Legitimate extreme** (valid but rare): keep, document, consider log transform for modeling - **Unknown** (can't determine without domain input): flag, do not silently remove ### Duplicates 1. Confirm uniqueness key with data owner before deduplication 2. Prefer `keep='last'` for event data (most recent state wins) 3. Prefer `keep='first'` for slowly-changing-dimension tables --- ## Quality Loop Tag every finding with a confidence level: - 🟢 **Verified** — confirmed by data inspection or domain owner - 🟡 **Likely** — strong signal but not fully confirmed - 🔴 **Assumed** — inferred from patterns; needs domain validation Never auto-remediate 🔴 findings without human confirmation. --- ## Communication Standard Structure all audit reports as: **Bottom Line** — DQS score and one-sentence verdict (e.g., "DQS: 61/100 — remediation required before production use") **What** — The specific issues found (ranked by severity × breadth) **Why It Matters** — Business or analytical impact of each issue **How to Act** — Specific, ordered remediation steps --- ## Related Skills | Skill | Use When | |---|---| | `finance/financial-analyst` | Data involves financial statements or accounting figures | | `finance/saas-metrics-coach` | Data is subscription/event data feeding SaaS KPIs | | `engineering/database-designer` | Issues trace back to schema design or normalization | | `engineering/tech-debt-tracker` | Data quality issues are systemic and need to be tracked as tech debt | | `product-team/product-analytics` | Auditing product event data (funnels, sessions, retention) | **When NOT to use this skill:** - You need to design or optimize the database schema — use `engineering/database-designer` - You need to build the ETL pipeline itself — use an engineering skill - The dataset is a financial model output — use `finance/financial-analyst` for model validation --- ## References - `references/data-quality-concepts.md` — MCAR/MAR/MNAR theory, DQS methodology, outlier detection methods

Claude Code Skills & Plugins — Agent Skills for Every Coding Tool

345 production-ready Claude Code skills, plugins, and agent skills for 13 AI coding tools.

The most comprehensive open-source library of Claude Code skills and agent plugins — also works with OpenAI Codex, Gemini CLI, Cursor, and 9 more coding agents. Reusable expertise packages covering engineering, DevOps, marketing (incl. AEO — Answer Engine Optimization for LLM citation), security (PreToolUse hooks), compliance, C-level advisory (incl. founder-mode CFO/CMO/CRO/CPO/COO/CHRO/CISO/GC/CDO/CAIO/CCO/VPE personas + 21 /cs:* slash commands), productivity (capture/email/reflect), an academic research stack (litreview/grants/dossier/patent/syllabus/pulse/notebooklm + hybrid router), and enterprise Research Operations (clinical-research/research-finance/market-research/product-research, v2.9.0).

Works with: Claude Code · OpenAI Codex · Gemini CLI · OpenClaw · Hermes Agent¹ · Mistral Vibe² · Cursor · Aider · Windsurf · Kilo Code · OpenCode · Augment · Antigravity

5,200+ GitHub stars — the most comprehensive open-source Claude Code skills & agent plugins library.

What Are Claude Code Skills & Agent Plugins?

Claude Code skills (also called agent skills or coding agent plugins) are modular instruction packages that give AI coding agents domain expertise they don't have out of the box. Each skill includes:

SKILL.md — structured instructions, workflows, and decision frameworks
Python tools — 579 CLI scripts (all stdlib-only, zero pip installs)
Reference docs — 702 templates, checklists, and domain-specific knowledge files

One repo, thirteen platforms. Works natively as Claude Code plugins, Codex agent skills, Gemini CLI skills, Hermes Agent skills, Mistral Vibe skills, and converts to more tools via scripts/convert.sh. All 579 Python tools run anywhere Python runs.

Skills vs Agents vs Personas

	Skills	Agents	Personas
Purpose	How to execute a task	What task to do	Who is thinking
Scope	Single domain	Single domain	Cross-domain
Voice	Neutral	Professional	Personality-driven
Example	"Follow these steps for SEO"	"Run a security audit"	"Think like a startup CTO"

All three work together. See Orchestration for how to combine them.

Quick Install

Gemini CLI (New)

# Clone the repository
git clone https://github.com/alirezarezvani/claude-skills.git
cd claude-skills

# Run the setup script
./scripts/gemini-install.sh

# Start using skills
> activate_skill(name="senior-architect")

Claude Code (Recommended)

# Add the marketplace
/plugin marketplace add alirezarezvani/claude-skills

# Install by domain
/plugin install engineering-skills@claude-code-skills          # 24 core engineering
/plugin install engineering-advanced-skills@claude-code-skills  # 25 POWERFUL-tier
/plugin install product-skills@claude-code-skills               # 12 product skills
/plugin install marketing-skills@claude-code-skills             # 43 marketing skills
/plugin install ra-qm-skills@claude-code-skills                 # 12 regulatory/quality
/plugin install pm-skills@claude-code-skills                    # 6 project management
/plugin install c-level-skills@claude-code-skills               # 28 C-level advisory (full C-suite)
/plugin install business-growth-skills@claude-code-skills       # 4 business & growth
/plugin install finance-skills@claude-code-skills               # 2 finance (analyst + SaaS metrics)

# Or install individual skills
/plugin install skill-security-auditor@claude-code-skills       # Security scanner
/plugin install playwright-pro@claude-code-skills                  # Playwright testing toolkit
/plugin install self-improving-agent@claude-code-skills         # Auto-memory curation
/plugin install content-creator@claude-code-skills              # Single skill

OpenAI Codex

npx agent-skills-cli add alirezarezvani/claude-skills --agent codex
# Or: git clone + ./scripts/codex-install.sh

OpenClaw

bash <(curl -s https://raw.githubusercontent.com/alirezarezvani/claude-skills/main/scripts/openclaw-install.sh)

Manual Installation

git clone https://github.com/alirezarezvani/claude-skills.git
# Copy any skill folder to ~/.claude/skills/ (Claude Code) or ~/.codex/skills/ (Codex)

Multi-Tool Support (New)

Convert all 345 skills to 9 AI coding tools with a single script:

Tool	Format	Install
Cursor	`.mdc` rules	`./scripts/install.sh --tool cursor --target .`
Aider	`CONVENTIONS.md`	`./scripts/install.sh --tool aider --target .`
Kilo Code	`.kilocode/rules/`	`./scripts/install.sh --tool kilocode --target .`
Windsurf	`.windsurf/skills/`	`./scripts/install.sh --tool windsurf --target .`
OpenCode	`.opencode/skills/`	`./scripts/install.sh --tool opencode --target .`
Augment	`.augment/rules/`	`./scripts/install.sh --tool augment --target .`
Antigravity	`~/.gemini/antigravity/skills/`	`./scripts/install.sh --tool antigravity`
Hermes Agent	`~/.hermes/skills/`	`python scripts/sync-hermes-skills.py --verbose`
Mistral Vibe	`~/.vibe/skills/`	`./scripts/vibe-install.sh`

How it works:

# 1. Convert all skills to all tools (takes ~15 seconds)
./scripts/convert.sh --tool all

# 2. Install into your project (with confirmation)
./scripts/install.sh --tool cursor --target /path/to/project

# Or use --force to skip confirmation:
./scripts/install.sh --tool aider --target . --force

# 3. Verify
find .cursor/rules -name "*.mdc" | wc -l  # Should show 346

Each tool gets:

✅ All 345 skills converted to native format
✅ Per-tool README with install/verify/update steps
✅ Support for scripts, references, templates where applicable
✅ Zero manual conversion work

Run ./scripts/convert.sh --tool all to generate tool-specific outputs locally.

Skills Overview

345 skills across 17 domains:

Domain	Skills	Highlights	Details
🔧 Engineering — Core	51	Architecture, frontend, backend, fullstack, QA, DevOps, SecOps, AI/ML, data, Playwright Pro (test gen, flaky fix, migrations), self-improving agent (auto-memory curation), security suite, a11y audit	engineering-team/
⚡ Engineering — POWERFUL	78	Agent designer, RAG architect, database designer, CI/CD builder, security auditor, MCP builder, AgentHub, Helm charts, Terraform, self-eval, llm-wiki, tc-tracker, autoresearch-agent, reliability portfolio (feature-flags-architect, kubernetes-operator, chaos-engineering, slo-architect), ship-gate, security-guidance PreToolUse hook, Matt Pocock skills (write-a-skill, caveman, grill-me, handoff, grill-with-docs)	engineering/
🎯 Product	17	Product manager, agile PO, strategist, UX researcher, UI design, landing pages, SaaS scaffolder, analytics, experiment designer, discovery, roadmap communicator, code-to-prd, apple-hig-expert	product-team/
📣 Marketing	46	8 pods: Content, SEO + AEO (`aeo` — E-E-A-T audit, citation tracking across 5 LLMs), CRO, Channels, Growth, Intelligence, Sales + context foundation + orchestration router	marketing-skill/
🚀 Productivity	6	`capture` (brain-dump-to-action), `email` pair (inbox-setup + inbox-triage), `reflect` (journal), `handoff` (Matt Pocock-inspired), `andreessen` (market-first decision mode)	productivity/
🎨 Marketing (top-level)	1	`landing` — single-file HTML landing-page generator (4 design styles, GSAP patterns, brand palette validator)	marketing/
🔬 Research (academic)	8	`research` orchestrator (hybrid router + fallback) + 7 specialists: `pulse`, `litreview`, `grants` (NIH), `dossier`, `patent`, `syllabus`, `notebooklm`	research/
🧪 Research Operations ✨v2.9.0	5	Enterprise/cross-functional research: orchestrator + `clinical-research` (study design), `research-finance` (R&D program finance), `market-research` (sizing/survey/segmentation), `product-research` (user research) — each with onboarding + customization + opt-in autoresearch bridge	research-ops/
📋 Project Management	9	Senior PM, scrum master, Jira, Confluence, Atlassian admin, templates + bundled Atlassian Remote MCP	project-management/
🏥 Regulatory & QM	18	ISO 13485, MDR 2017/745, FDA, ISO 27001, GDPR, SOC 2, CAPA, risk management	ra-qm-team/
🛡️ Compliance OS	9	Compliance operating system — controls, evidence, audit-readiness workflows	compliance-os/
💼 C-Level Advisory	66	Full C-suite (CEO/CTO/CFO/CMO/CRO/CPO/COO/CHRO/CISO/GC/CDO/CAIO/CCO/VPE) + founder-mode agents + orchestration + board meetings + culture & collaboration	c-level-advisor/
📈 Business & Growth	5	Customer success, sales engineer, revenue ops, contracts & proposals, BizDev toolkit	business-growth/
🏭 Business Operations	7	Orchestrator + process-mapper, vendor-management, capacity-planner, internal-comms, knowledge-ops, procurement-optimizer	business-operations/
🤝 Commercial	8	Orchestrator + pricing-strategist, deal-desk, partnerships-architect, channel-economics, commercial-policy, rfp-responder, commercial-forecaster	commercial/
💰 Finance	4	Financial analyst (DCF, budgeting, forecasting), SaaS metrics coach, business investment advisor	finance/

Personas

Pre-configured agent identities with curated skill loadouts, workflows, and distinct communication styles. Personas go beyond "use these skills" — they define how an agent thinks, prioritizes, and communicates.

Persona	Domain	Best For
Startup CTO	Engineering + Strategy	Architecture decisions, tech stack selection, team building, technical due diligence
Growth Marketer	Marketing + Growth	Content-led growth, launch strategy, channel optimization, bootstrapped marketing
Solo Founder	Cross-domain	One-person sta

…

Footnotes

Hermes Agent is BYO-sync tier: the repo ships a pre-generated .hermes/skills/claude-skills/ tree, but you run python scripts/sync-hermes-skills.py once locally to install into ~/.hermes/skills/. Uses the same agentskills.io SKILL.md standard — no format conversion. ↩
Mistral Vibe is also BYO-sync tier: the repo ships a pre-generated .vibe/skills/claude-skills/ tree, run ./scripts/vibe-install.sh once locally to install into ~/.vibe/skills/. Same agentskills.io SKILL.md standard — no format conversion. Docs: https://docs.mistral.ai/mistral-vibe/agents-skills. ↩

Data Quality Auditor

Skill Content

How to use

README