datahub-skills
Agent skills for working with DataHub — plan and review connectors, search the catalog, enrich metadata, trace lineage, manage data quality, and set up connections. Works with Claude Code, Cortex Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, and other Agent Skills-compatible tools.
What's in here
Catalog interaction skills
Search
Search the DataHub catalog, discover entities, and answer ad-hoc questions about your data. Supports keyword search, filtered browse, column-name search, structured property queries, and multi-step question answering.
> Find revenue tables in Snowflake
> Who owns the customer pipeline?
> /datahub-search datasets tagged PIIEnrich
Add or update metadata in DataHub — descriptions, tags, glossary terms, ownership, and deprecation. Shows a before/after plan and asks for approval before making changes.
> Add a description to the orders table
> Tag these columns as PII
> /datahub-enrich set owner of revenue_daily to @jdoeLineage
Explore data lineage, trace upstream sources and downstream consumers, perform impact analysis, and map cross-platform data flows.
> What feeds into the revenue dashboard?
> Impact analysis for changing the orders table
> /datahub-lineage trace the customer pipelineQuality
Manage data quality — create and run assertions (freshness, volume, SQL, field, schema), set up smart AI-inferred assertions, raise and resolve incidents, and configure notification subscriptions. Separates Open Source (diagnostic) from Cloud (full management) capabilities.
> Find datasets with failing assertions
> Create a freshness assertion on the orders table
> /datahub-quality raise an incident on the customer pipeline
> Subscribe me to assertion failures via SlackSetup
Install the DataHub CLI, configure authentication, verify connectivity, and set up default scopes and profiles for the other interaction skills.
> Set up my DataHub connection
> /datahub-setup focus on Snowflake in the Finance domain
> Create a profile for the data-eng teamConnector development skills
Connector planning
Walks you through building a new DataHub connector in four steps: classify the source system type, research it (using a dedicated agent or inline), generate a _PLANNING.md with entity mapping and architecture, and get your sign-off before anyone writes code.
> Plan a connector for ClickHouse
> /connector-planning duckdbConnector review
Checks connector code against the 22 standards (see below). On Claude Code it runs five agents in parallel — silent failures, test coverage, type design, simplification, comment resolution. On other platforms it does the same checks one at a time.
> Review my connector
> /connector-review postgres
> Review PR #1234If you're on Claude Code and want the parallel review, also install pr-review-toolkit:
claude plugin install pr-review-toolkit@claude-plugins-officialLoad standards
Loads all 22 connector standards into context. Run this before starting connector work so the agent actually knows what it's checking against.
> Load the DataHub standards
> What are the connector standards?Installation
Quick install (any agent)
The Skills CLI detects your installed agents and sets things up:
npx skills add datahub-project/datahub-skillsWorks with most agents including Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, Cline, and Roo Code.
Platform-specific
Claude Code
# Option A: Plugin install (gets you hooks, slash commands, multi-agent dispatch)
claude plugin install datahub-skills
# Also install pr-review-toolkit for multi-agent reviews:
claude plugin install pr-review-toolkit@claude-plugins-official# Option B: Skills CLI (project-level, installs to .claude/skills/)
npx skills add datahub-project/datahub-skills -a claude-codeThen:
> Search for revenue tables in Snowflake
> /datahub-search who owns the customer pipeline?
> /datahub-enrich add description to orders table
> /datahub-lineage what feeds into the revenue dashboard?
> /datahub-quality find datasets with failing assertions
> /datahub-setup verify my connection
> /connector-review snowflake
> /connector-planning duckdbCursor
npx skills add datahub-project/datahub-skills -a cursor
# Installs to .agents/skills/Cursor picks up skills from .agents/skills/ automatically:
> Search DataHub for customer tables
> Review my DataHub connector
> Plan a connector for ClickHouseGitHub Copilot
npx skills add datahub-project/datahub-skills -a github-copilot
# Installs to .agents/skills/Use in Copilot Chat:
> Search the DataHub catalog for revenue data
> Review my DataHub connector code
> Help me plan a new connector for DuckDBOpenAI Codex
npx skills add datahub-project/datahub-skills -a codex
# Installs to .agents/skills/> Find datasets owned by the data-eng team
> Review the postgres connector against DataHub standards
> Plan a connector for SnowflakeGemini CLI
npx skills add datahub-project/datahub-skills -a gemini-cli
# Installs to .agents/skills/Verify with /skills list, then:
> Who owns the revenue pipeline?
> Review my DataHub connector
> Plan a new connector for BigQueryWindsurf
npx skills add datahub-project/datahub-skills -a windsurf
# Installs to .windsurf/skills/> Explore lineage for the orders table
> Review my DataHub connector implementation
> Plan a connector for RedshiftManual install
git clone https://github.com/datahub-project/datahub-skills.git
# Catalog interaction skills
cp -r datahub-skills/skills/datahub-search your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-enrich your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-lineage your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-quality your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-setup your-project/.agents/skills/
cp -r datahub-skills/skills/shared-references your-project/.agents/skills/
cp -r datahub-skills/skills/using-datahub your-project/.agents/skills/
# Connector development skills
cp -r datahub-skills/skills/datahub-connector-planning your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-connector-pr-review your-project/.agents/skills/
cp -r datahub-skills/skills/load-standards your-project/.agents/skills/Each skill directory is self-contained. The standards symlinks get dereferenced into real files on copy, so everything travels together. The catalog interaction skills reference shared-references/ for CLI and MCP tool documentation.
What works where
| Feature | Claude Code | Cursor / Copilot / Codex / Gemini CLI / Windsurf |
|---|---|---|
| Catalog search | Yes | Yes |
| Metadata enrichment | Yes | Yes |
| Lineage exploration | Yes | Yes |
| Data quality management | Yes | Yes |
| Connection setup | Yes | Yes |
| Planning workflow | Yes | Yes |
| Load standards | Yes | Yes |
| Review against standards | Yes | Yes |
| Parallel multi-agent review | Yes (5 sub-agents) | No (runs sequentially) |
| Research agent delegation | Yes (dedicated agent) | No (inline fallback) |
| Slash commands | Yes | No (use natural language instead) |
| SessionStart hooks | Yes (via plugin) | No |
Commands (Claude Code only)
Other platforms do the same things through natural language.
Catalog interaction
| Command | What it does |
|---|---|
/catalog-search [query] | Search the catalog and answer questions |
/catalog-enrich [entity] | Add or update metadata |
/catalog-lineage [entity] | Explore lineage and trace dependencies |
/catalog-quality [entity] | Manage assertions, incidents, and subscriptions |
/catalog-setup [task] | Set up connection and configure defaults |
Connector development
| Command | What it does |
|---|---|
/connector-planning [source] | Plan a new connector |
/connector-review [connector] | Review connector code against standards |
/load-standards | Load all 22 standards into context |
Agents
| Agent | What it does |
|---|---|
metadata-searcher | Fast sub-agent for executing catalog queries (Claude Code) |
connector-researcher | Researches source systems before you write a connector |
connector-validator | Runs validation scripts and reports results |
comment-resolution-checker | Checks whether PR review comments were actually addressed |
Standards
22 standards live in standards/, split into two groups:
Core (11): main, api, sql, code_style, containers, lineage, patterns, performance, platform_registration, registration, testing
Source-type (11): bi_tools, data_lakes, data_warehouses, identity_platforms, ml_platforms, nosql_databases, orchestration_tools, product_analytics, query_engines, sql_databases, streaming_platforms
Repo layout
datahub-skills/
├── .claude-plugin/
│ ├── plugin.json
│ └── marketplace.json
├── skills/
│ ├── datahub-search/ # Catalog search and discovery
│ │ ├── SKILL.md
│ │ ├── references/
│ │ └── templates/
│ ├── datahub-enrich/ # Metadata enrichment
│ │ ├── SKILL.md
│ │ ├── references/
│ │ └── templates/
│ ├── datahub-lineage/ # Lineage exploration
│ │ ├── SKILL.md
│ │ ├── references/
│ │ └── templates/
│ ├── datahub-quality/ # Data quality management
│ │ ├── SKILL.md
│ │ ├── references/
│ │ └── templates/
│ ├── datahub-setup/ # Connection setup and config
│ │ ├── SKILL.md
│ │ ├── references/
│ │ └── templates/
│ ├── datahub-connector-planning/ # Connector planning
│ │ ├── SKILL.md
│ │ ├── standards -> ../../standards
│ │ ├── references/
│ │ └── templates/
│ ├── datahub-connector-pr-review/ # Connector review
│ │ ├── SKILL.md
│ │ ├── standards -> ../../standards
│ │ ├── commands/
│ │ ├── references/
│ │ ├── scripts/
│ │ └── templates/
│
…