Back to Plugins

Datahub Skills

DataHub development and interaction toolkit with connector planning, PR review, catalog search, metadata enrichment, lineage tracing, data quality management, and connection setup skills

database
By DataHub
275Updated 1 week agoShellApache-2.0

Installation

/plugin install datahub-skills@claude-plugins-official

How to install

  1. Open Claude Code in your terminal
  2. Run the installation command above
  3. The plugin will be enabled automatically
  4. Use the plugin's features in your Claude Code sessions

datahub-skills

Agent skills for working with DataHub — plan and review connectors, search the catalog, enrich metadata, trace lineage, manage data quality, and set up connections. Works with Claude Code, Cortex Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, and other Agent Skills-compatible tools.

What's in here

Catalog interaction skills

Search

Search the DataHub catalog, discover entities, and answer ad-hoc questions about your data. Supports keyword search, filtered browse, column-name search, structured property queries, and multi-step question answering.

> Find revenue tables in Snowflake
> Who owns the customer pipeline?
> /datahub-search datasets tagged PII

Enrich

Add or update metadata in DataHub — descriptions, tags, glossary terms, ownership, and deprecation. Shows a before/after plan and asks for approval before making changes.

> Add a description to the orders table
> Tag these columns as PII
> /datahub-enrich set owner of revenue_daily to @jdoe

Lineage

Explore data lineage, trace upstream sources and downstream consumers, perform impact analysis, and map cross-platform data flows.

> What feeds into the revenue dashboard?
> Impact analysis for changing the orders table
> /datahub-lineage trace the customer pipeline

Quality

Manage data quality — create and run assertions (freshness, volume, SQL, field, schema), set up smart AI-inferred assertions, raise and resolve incidents, and configure notification subscriptions. Separates Open Source (diagnostic) from Cloud (full management) capabilities.

> Find datasets with failing assertions
> Create a freshness assertion on the orders table
> /datahub-quality raise an incident on the customer pipeline
> Subscribe me to assertion failures via Slack

Setup

Install the DataHub CLI, configure authentication, verify connectivity, and set up default scopes and profiles for the other interaction skills.

> Set up my DataHub connection
> /datahub-setup focus on Snowflake in the Finance domain
> Create a profile for the data-eng team

Connector development skills

Connector planning

Walks you through building a new DataHub connector in four steps: classify the source system type, research it (using a dedicated agent or inline), generate a _PLANNING.md with entity mapping and architecture, and get your sign-off before anyone writes code.

> Plan a connector for ClickHouse
> /connector-planning duckdb

Connector review

Checks connector code against the 22 standards (see below). On Claude Code it runs five agents in parallel — silent failures, test coverage, type design, simplification, comment resolution. On other platforms it does the same checks one at a time.

> Review my connector
> /connector-review postgres
> Review PR #1234

If you're on Claude Code and want the parallel review, also install pr-review-toolkit:

claude plugin install pr-review-toolkit@claude-plugins-official

Load standards

Loads all 22 connector standards into context. Run this before starting connector work so the agent actually knows what it's checking against.

> Load the DataHub standards
> What are the connector standards?

Installation

Quick install (any agent)

The Skills CLI detects your installed agents and sets things up:

npx skills add datahub-project/datahub-skills

Works with most agents including Claude Code, Cursor, Codex, Copilot, Gemini CLI, Windsurf, Cline, and Roo Code.

Platform-specific

Claude Code

# Option A: Plugin install (gets you hooks, slash commands, multi-agent dispatch)
claude plugin install datahub-skills

# Also install pr-review-toolkit for multi-agent reviews:
claude plugin install pr-review-toolkit@claude-plugins-official
# Option B: Skills CLI (project-level, installs to .claude/skills/)
npx skills add datahub-project/datahub-skills -a claude-code

Then:

> Search for revenue tables in Snowflake
> /datahub-search who owns the customer pipeline?
> /datahub-enrich add description to orders table
> /datahub-lineage what feeds into the revenue dashboard?
> /datahub-quality find datasets with failing assertions
> /datahub-setup verify my connection
> /connector-review snowflake
> /connector-planning duckdb

Cursor

npx skills add datahub-project/datahub-skills -a cursor
# Installs to .agents/skills/

Cursor picks up skills from .agents/skills/ automatically:

> Search DataHub for customer tables
> Review my DataHub connector
> Plan a connector for ClickHouse

GitHub Copilot

npx skills add datahub-project/datahub-skills -a github-copilot
# Installs to .agents/skills/

Use in Copilot Chat:

> Search the DataHub catalog for revenue data
> Review my DataHub connector code
> Help me plan a new connector for DuckDB

OpenAI Codex

npx skills add datahub-project/datahub-skills -a codex
# Installs to .agents/skills/
> Find datasets owned by the data-eng team
> Review the postgres connector against DataHub standards
> Plan a connector for Snowflake

Gemini CLI

npx skills add datahub-project/datahub-skills -a gemini-cli
# Installs to .agents/skills/

Verify with /skills list, then:

> Who owns the revenue pipeline?
> Review my DataHub connector
> Plan a new connector for BigQuery

Windsurf

npx skills add datahub-project/datahub-skills -a windsurf
# Installs to .windsurf/skills/
> Explore lineage for the orders table
> Review my DataHub connector implementation
> Plan a connector for Redshift

Manual install

git clone https://github.com/datahub-project/datahub-skills.git

# Catalog interaction skills
cp -r datahub-skills/skills/datahub-search          your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-enrich           your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-lineage          your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-quality          your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-setup            your-project/.agents/skills/
cp -r datahub-skills/skills/shared-references        your-project/.agents/skills/
cp -r datahub-skills/skills/using-datahub            your-project/.agents/skills/

# Connector development skills
cp -r datahub-skills/skills/datahub-connector-planning   your-project/.agents/skills/
cp -r datahub-skills/skills/datahub-connector-pr-review  your-project/.agents/skills/
cp -r datahub-skills/skills/load-standards               your-project/.agents/skills/

Each skill directory is self-contained. The standards symlinks get dereferenced into real files on copy, so everything travels together. The catalog interaction skills reference shared-references/ for CLI and MCP tool documentation.

What works where

FeatureClaude CodeCursor / Copilot / Codex / Gemini CLI / Windsurf
Catalog searchYesYes
Metadata enrichmentYesYes
Lineage explorationYesYes
Data quality managementYesYes
Connection setupYesYes
Planning workflowYesYes
Load standardsYesYes
Review against standardsYesYes
Parallel multi-agent reviewYes (5 sub-agents)No (runs sequentially)
Research agent delegationYes (dedicated agent)No (inline fallback)
Slash commandsYesNo (use natural language instead)
SessionStart hooksYes (via plugin)No

Commands (Claude Code only)

Other platforms do the same things through natural language.

Catalog interaction

CommandWhat it does
/catalog-search [query]Search the catalog and answer questions
/catalog-enrich [entity]Add or update metadata
/catalog-lineage [entity]Explore lineage and trace dependencies
/catalog-quality [entity]Manage assertions, incidents, and subscriptions
/catalog-setup [task]Set up connection and configure defaults

Connector development

CommandWhat it does
/connector-planning [source]Plan a new connector
/connector-review [connector]Review connector code against standards
/load-standardsLoad all 22 standards into context

Agents

AgentWhat it does
metadata-searcherFast sub-agent for executing catalog queries (Claude Code)
connector-researcherResearches source systems before you write a connector
connector-validatorRuns validation scripts and reports results
comment-resolution-checkerChecks whether PR review comments were actually addressed

Standards

22 standards live in standards/, split into two groups:

Core (11): main, api, sql, code_style, containers, lineage, patterns, performance, platform_registration, registration, testing

Source-type (11): bi_tools, data_lakes, data_warehouses, identity_platforms, ml_platforms, nosql_databases, orchestration_tools, product_analytics, query_engines, sql_databases, streaming_platforms

Repo layout

datahub-skills/
├── .claude-plugin/
│   ├── plugin.json
│   └── marketplace.json
├── skills/
│   ├── datahub-search/              # Catalog search and discovery
│   │   ├── SKILL.md
│   │   ├── references/
│   │   └── templates/
│   ├── datahub-enrich/              # Metadata enrichment
│   │   ├── SKILL.md
│   │   ├── references/
│   │   └── templates/
│   ├── datahub-lineage/             # Lineage exploration
│   │   ├── SKILL.md
│   │   ├── references/
│   │   └── templates/
│   ├── datahub-quality/             # Data quality management
│   │   ├── SKILL.md
│   │   ├── references/
│   │   └── templates/
│   ├── datahub-setup/               # Connection setup and config
│   │   ├── SKILL.md
│   │   ├── references/
│   │   └── templates/
│   ├── datahub-connector-planning/  # Connector planning
│   │   ├── SKILL.md
│   │   ├── standards -> ../../standards
│   │   ├── references/
│   │   └── templates/
│   ├── datahub-connector-pr-review/ # Connector review
│   │   ├── SKILL.md
│   │   ├── standards -> ../../standards
│   │   ├── commands/
│   │   ├── references/
│   │   ├── scripts/
│   │   └── templates/
│  

…
View source on GitHub