Chief Data Officer Advisor

Name: Chief Data Officer Advisor
Author: alirezarezvani

Chief Data Officer advisory for startups: AI training data rights and consent provenance, data product strategy (warehouse vs lakehouse vs mesh, build-vs-buy), B2B customer-data-as-asset valuation and M&A readiness, data team org evolution. Use when deciding whether to train mod…

By alirezarezvani

17k 2.4kUpdated 3 days agoPythonMIT

Skill Content

# Chief Data Officer Advisor

Strategic data leadership for startup CDOs and founders without one. **Four decisions, no surveys:**

1. **Can we train our model on this data?** — origin × consent × use-case matrix
2. **Warehouse, lakehouse, or mesh — and what do we build vs buy?** — stage-driven architecture
3. **What is our customer data worth?** — strategic value + M&A multiplier + productization paths
4. **What data role do we hire next?** — stage-to-role map, centralize-vs-embed trigger

This skill does **not** cover tactical data engineering. For schema design, observability, query optimization, RAG, or ML platform implementation, see `engineering/database-designer/`, `engineering/observability-designer/`, `engineering/data-quality-auditor/`, `engineering/sql-database-assistant/`, `engineering/rag-architect/`, `engineering/llm-cost-optimizer/`.

## Keywords

CDO, chief data officer, AI training data, consent provenance, training rights, GDPR Article 6 lawful basis, GDPR Article 22, EU AI Act high-risk, ePrivacy, copyright fair use, hiQ v. LinkedIn, scraped data, synthetic data, data product, data mesh, lakehouse, medallion architecture, dbt, Snowflake, BigQuery, Databricks, Fivetran, Airbyte, reverse ETL, feature store, customer data as asset, data monetization, data productization, anonymization, k-anonymity, differential privacy, M&A data diligence, data org, analytics engineer, data engineer, data scientist, data product manager, centralize vs embed, hub and spoke

## Quick Start

```bash
# Audit data sources for AI training eligibility
python scripts/ai_training_data_audit.py                              # uses embedded sample
python scripts/ai_training_data_audit.py path/to/sources.json

# Pick data architecture + build-vs-buy + sequencing
python scripts/data_product_strategy_picker.py                        # uses embedded Series A SaaS
python scripts/data_product_strategy_picker.py path/to/profile.json

# Value the customer data corpus + productization viability
python scripts/data_asset_valuator.py                                 # uses embedded B2B sample
python scripts/data_asset_valuator.py path/to/corpus.json
```

## Key Questions (ask these first)

- **What decision does this data drive?** (If none, why are we collecting it?)
- **What's the consent provenance of every source we want to train on?** (TOS-only is not the same as explicit opt-in.)
- **Who are the internal data consumers, and how many distinct domains do they span?** (Drives centralize-vs-embed and warehouse-vs-mesh.)
- **In an M&A scenario, is our data a moat or a liability?** (Customer carve-outs in MSAs can flip the answer.)
- **Are we hiring an analytics engineer or a data scientist next?** (They solve different problems; founders confuse them.)
- **Have we run an anonymization audit before any external sharing?** (k-anonymity ≥ 5 is the floor, not the ceiling.)

## Core Responsibilities

### 1. AI Training Data Rights

The 2026 question every startup is facing: **can we use customer data to train our model?**

The answer is rarely binary. It depends on three independent dimensions:

| Dimension | Values |
|---|---|
| **Origin** | 1st-party-explicit-opt-in / 1st-party-TOS-only / partner-licensed / scraped / synthetic |
| **Data class** | Anonymous aggregate / behavioral / PII / 3rd-party content / regulated (PHI, PCI, kids) |
| **Use case** | In-product personalization / fine-tune our model / train foundation model / external sharing |

Each combination produces GO / MITIGATE / NO-GO. **Run** `ai_training_data_audit.py` on a JSON inventory of sources.

See `references/ai_training_data_rights.md` for the full matrix + GDPR Art. 6 lawful basis decision tree + EU AI Act high-risk triggers.

### 2. Data Product Strategy

**Architecture choice (warehouse vs lakehouse vs mesh) is stage-driven, not preference-driven:**

- **Warehouse only** (Snowflake / BigQuery / Postgres): ≤5 data consumers, <2TB, no ML use cases
- **Lakehouse** (warehouse + object storage, often Databricks or Snowflake-with-Iceberg): 5–25 data consumers, 2TB–1PB, 1–3 ML use cases
- **Data mesh**: 25+ data consumers across 4+ domains, federated ownership culture in place

**Build vs buy is decided per layer:**

| Layer | Buy unless | Build only if |
|---|---|---|
| Storage / warehouse | Never build | (You’re a data infra company) |
| ELT / ingest | Never build | Source isn’t supported by Fivetran/Airbyte |
| Modeling (dbt) | Always build | This is your IP |
| BI / dashboards | Buy at <100 consumers | Embedded analytics for customers |
| Feature store | Defer until 3+ prod models | Then build OR buy Tecton/Hopsworks |
| ML platform | Defer until 5+ prod models | Then buy SageMaker/Vertex/Databricks |

**Run** `data_product_strategy_picker.py` for a stage-specific recommendation. See `references/data_product_strategy.md` for kill criteria per architecture and the build-vs-buy decision tree.

### 3. B2B Customer-Data-as-Asset

**The shift:** at Series B+, customer data is no longer just operational — it’s an asset that can be:
- A defensibility moat (replicating requires years of customer cohort)
- An M&A multiplier (1.2x–2x ARR uplift for strategic buyers)
- A direct revenue stream (anonymized industry benchmarks, embedding endpoints, licensing)

But it can also be a **liability**:
- 47/380 customers with MSA carve-outs makes productization legally infeasible
- Anonymization audits often reveal re-identification risk above tolerable thresholds
- Regulatory exposure increases linearly with productization (GDPR Art. 28 processors vs Art. 26 joint controllers)

**Run** `data_asset_valuator.py` with corpus characteristics to get strategic value score + productization paths + risk-adjusted value.

See `references/customer_data_as_asset.md` for the valuation framework, M&A diligence prep checklist, and contractual constraint audit pattern.

### 4. Data Team Org Evolution

**The wrong question:** "Should we hire a data scientist?"
**The right question:** "What’s the next decision we can’t make because we lack data, and what role unblocks that?"

Stage-to-role map (B2B SaaS baseline):

| Stage | First hire | Then | Then |
|---|---|---|---|
| Pre-seed / seed | Founder-as-analyst (SQL + spreadsheets) | — | — |
| Series A (Series A) | Analyst | Analytics engineer (dbt) | — |
| Series B | Data engineer | Senior analyst (embedded in GTM) | Data PM (if 3+ teams need data) |
| Growth | Manager of analytics | ML engineer (if model is core) | Head of Data |
| Late-stage | Head of Data → CDO | Specialized: BI, MLE, DPO | Federated owners per domain (mesh) |

**Centralize-vs-embed trigger:** when 3+ functional areas (sales, marketing, product, ops, CS) need bespoke data weekly, the central team becomes the bottleneck. Move to hub-and-spoke (central platform + embedded analysts) before that becomes a hiring crisis.

See `references/data_team_org_evolution.md`.

## Workflows

### Workflow 1: AI Training Decision (1 hour)
**Goal:** Decide whether a specific data source can train a specific use case.

```bash
# 1. Build sources.json with one entry per data source
# 2. Run the audit
python scripts/ai_training_data_audit.py sources.json
# 3. For each MITIGATE: assign owner + remediation
# 4. For each NO-GO: document the kill reason for the legal log
# 5. Cross-check with cs-general-counsel-advisor on top-3 mitigation items
# 6. Log via /cs:decide
```

### Workflow 2: Architecture Decision (1 day)
**Goal:** Pick warehouse / lakehouse / mesh and the build-vs-buy split for the next 12 months.

```bash
python scripts/data_product_strategy_picker.py profile.json
# Cross-check with cs-cto-advisor on engineering capacity
# Cross-check with cs-cfo-advisor on 3-year TCO
# Log via /cs:decide; consider /cs:freeze 90 if signing a multi-year SaaS contract
```

### Workflow 3: Data Asset Valuation for M&A Prep (3 days)
**Goal:** Value the data corpus and prepare for due diligence.

1. Inventory the corpus: size, freshness, exclusivity, customer overlap, contractual restrictions
2. Run `data_asset_valuator.py`
3. Run the M&A diligence prep checklist in `customer_data_as_asset.md`
4. Surface contractual carve-outs to cs-general-counsel-advisor for re-papering plan
5. Decide productization path (benchmark report / embedding endpoint / direct license)
6. Log via /cs:decide

### Workflow 4: Data Team Roadmap (1 week)
**Goal:** Build the next 18 months of data hires aligned to business decisions.

1. List the top 5 decisions the business can’t make today due to missing data or analysis
2. Map each decision to the role that unblocks it
3. Sequence hires (one role at a time, ramp before next)
4. Cross-check with cs-chro-advisor on comp bands and leveling
5. Identify the centralize-vs-embed trigger date

## Output Standards (when invoked via cs-cdo-advisor)

```
**Bottom Line:** [one sentence — decision and rationale]
**The Decision:** [one of the 4 framings]
**The Evidence:** [numbers, not adjectives]
**How to Act:** [3 concrete next steps]
**Your Decision:** [the call only the founder can make]
```

## Adjacent Skills

- `../cto-advisor/` — architecture capacity, scaling cliffs
- `../ciso-advisor/` — data security, threat modeling for productized data
- `../general-counsel-advisor/` — contractual constraints, DPA, training-data rights
- `../cfo-advisor/` — build-vs-buy TCO, M&A valuation math
- `../chro-advisor/` — data team hiring, leveling, comp
- `../../../engineering/database-designer/` — tactical schema design
- `../../../engineering/rag-architect/` — tactical AI/RAG implementation
- `../../../engineering/llm-cost-optimizer/` — model cost management

## References

- [ai_training_data_rights.md](references/ai_training_data_rights.md) — The training-rights matrix + GDPR Art. 6 / EU AI Act decision tree
- [data_product_strategy.md](references/data_product_strategy.md) — Warehouse / lakehouse / mesh kill criteria + build-vs-buy decision tree
- [customer_data_as_asset.md](references/customer_data_as_asset.md) — Valuation framework + M&A diligence prep + productization paths
- [data_team_org_evolution.md](references/data_team_org_evolution.md) — Stage-to-role map + centralize-vs-embed trigger

---

**Version:** 1.0.0
**Status:** Production Ready
**Disclaimer:** Decisions touching training data rights, data productization, or M&A data diligence should involve qualified counsel. This skill surfaces decisions and tradeoffs — it does not replace legal review.

How to use

Copy the skill content above
Create a .claude/skills directory in your project
Save as .claude/skills/claude-skills-chief-data-officer-advisor.md
Use /claude-skills-chief-data-officer-advisor in Claude Code to invoke this skill

README

View on GitHub

Claude Code Skills & Plugins — Agent Skills for Every Coding Tool

338 production-ready Claude Code skills, plugins, and agent skills for 13 AI coding tools.

The most comprehensive open-source library of Claude Code skills and agent plugins — also works with OpenAI Codex, Gemini CLI, Cursor, and 9 more coding agents. Reusable expertise packages covering engineering, DevOps, marketing (incl. AEO — Answer Engine Optimization for LLM citation), security (PreToolUse hooks), compliance, C-level advisory (incl. founder-mode CFO/CMO/CRO/CPO/COO/CHRO/CISO/GC/CDO/CAIO/CCO/VPE personas + 21 /cs:* slash commands), productivity (capture/email/reflect), an academic research stack (litreview/grants/dossier/patent/syllabus/pulse/notebooklm + hybrid router), and enterprise Research Operations (clinical-research/research-finance/market-research/product-research, v2.9.0).

Works with: Claude Code · OpenAI Codex · Gemini CLI · OpenClaw · Hermes Agent¹ · Mistral Vibe² · Cursor · Aider · Windsurf · Kilo Code · OpenCode · Augment · Antigravity

5,200+ GitHub stars — the most comprehensive open-source Claude Code skills & agent plugins library.

What Are Claude Code Skills & Agent Plugins?

Claude Code skills (also called agent skills or coding agent plugins) are modular instruction packages that give AI coding agents domain expertise they don't have out of the box. Each skill includes:

SKILL.md — structured instructions, workflows, and decision frameworks
Python tools — 533 CLI scripts (all stdlib-only, zero pip installs)
Reference docs — 676 templates, checklists, and domain-specific knowledge files

One repo, thirteen platforms. Works natively as Claude Code plugins, Codex agent skills, Gemini CLI skills, Hermes Agent skills, Mistral Vibe skills, and converts to more tools via scripts/convert.sh. All 533 Python tools run anywhere Python runs.

Skills vs Agents vs Personas

	Skills	Agents	Personas
Purpose	How to execute a task	What task to do	Who is thinking
Scope	Single domain	Single domain	Cross-domain
Voice	Neutral	Professional	Personality-driven
Example	"Follow these steps for SEO"	"Run a security audit"	"Think like a startup CTO"

All three work together. See Orchestration for how to combine them.

Quick Install

Gemini CLI (New)

# Clone the repository
git clone https://github.com/alirezarezvani/claude-skills.git
cd claude-skills

# Run the setup script
./scripts/gemini-install.sh

# Start using skills
> activate_skill(name="senior-architect")

Claude Code (Recommended)

# Add the marketplace
/plugin marketplace add alirezarezvani/claude-skills

# Install by domain
/plugin install engineering-skills@claude-code-skills          # 24 core engineering
/plugin install engineering-advanced-skills@claude-code-skills  # 25 POWERFUL-tier
/plugin install product-skills@claude-code-skills               # 12 product skills
/plugin install marketing-skills@claude-code-skills             # 43 marketing skills
/plugin install ra-qm-skills@claude-code-skills                 # 12 regulatory/quality
/plugin install pm-skills@claude-code-skills                    # 6 project management
/plugin install c-level-skills@claude-code-skills               # 28 C-level advisory (full C-suite)
/plugin install business-growth-skills@claude-code-skills       # 4 business & growth
/plugin install finance-skills@claude-code-skills               # 2 finance (analyst + SaaS metrics)

# Or install individual skills
/plugin install skill-security-auditor@claude-code-skills       # Security scanner
/plugin install playwright-pro@claude-code-skills                  # Playwright testing toolkit
/plugin install self-improving-agent@claude-code-skills         # Auto-memory curation
/plugin install content-creator@claude-code-skills              # Single skill

OpenAI Codex

npx agent-skills-cli add alirezarezvani/claude-skills --agent codex
# Or: git clone + ./scripts/codex-install.sh

OpenClaw

bash <(curl -s https://raw.githubusercontent.com/alirezarezvani/claude-skills/main/scripts/openclaw-install.sh)

Manual Installation

git clone https://github.com/alirezarezvani/claude-skills.git
# Copy any skill folder to ~/.claude/skills/ (Claude Code) or ~/.codex/skills/ (Codex)

Multi-Tool Support (New)

Convert all 338 skills to 9 AI coding tools with a single script:

Tool	Format	Install
Cursor	`.mdc` rules	`./scripts/install.sh --tool cursor --target .`
Aider	`CONVENTIONS.md`	`./scripts/install.sh --tool aider --target .`
Kilo Code	`.kilocode/rules/`	`./scripts/install.sh --tool kilocode --target .`
Windsurf	`.windsurf/skills/`	`./scripts/install.sh --tool windsurf --target .`
OpenCode	`.opencode/skills/`	`./scripts/install.sh --tool opencode --target .`
Augment	`.augment/rules/`	`./scripts/install.sh --tool augment --target .`
Antigravity	`~/.gemini/antigravity/skills/`	`./scripts/install.sh --tool antigravity`
Hermes Agent	`~/.hermes/skills/`	`python scripts/sync-hermes-skills.py --verbose`
Mistral Vibe	`~/.vibe/skills/`	`./scripts/vibe-install.sh`

How it works:

# 1. Convert all skills to all tools (takes ~15 seconds)
./scripts/convert.sh --tool all

# 2. Install into your project (with confirmation)
./scripts/install.sh --tool cursor --target /path/to/project

# Or use --force to skip confirmation:
./scripts/install.sh --tool aider --target . --force

# 3. Verify
find .cursor/rules -name "*.mdc" | wc -l  # Should show 338

Each tool gets:

✅ All 338 skills converted to native format
✅ Per-tool README with install/verify/update steps
✅ Support for scripts, references, templates where applicable
✅ Zero manual conversion work

Run ./scripts/convert.sh --tool all to generate tool-specific outputs locally.

Skills Overview

338 skills across 16 domains:

Domain	Skills	Highlights	Details
🔧 Engineering — Core	51	Architecture, frontend, backend, fullstack, QA, DevOps, SecOps, AI/ML, data, Playwright Pro (test gen, flaky fix, migrations), self-improving agent (auto-memory curation), security suite, a11y audit	engineering-team/
⚡ Engineering — POWERFUL	78	Agent designer, RAG architect, database designer, CI/CD builder, security auditor, MCP builder, AgentHub, Helm charts, Terraform, self-eval, llm-wiki, tc-tracker, autoresearch-agent, reliability portfolio (feature-flags-architect, kubernetes-operator, chaos-engineering, slo-architect), ship-gate, security-guidance PreToolUse hook, Matt Pocock skills (write-a-skill, caveman, grill-me, handoff, grill-with-docs)	engineering/
🎯 Product	17	Product manager, agile PO, strategist, UX researcher, UI design, landing pages, SaaS scaffolder, analytics, experiment designer, discovery, roadmap communicator, code-to-prd, apple-hig-expert	product-team/
📣 Marketing	46	8 pods: Content, SEO + AEO (`aeo` — E-E-A-T audit, citation tracking across 5 LLMs), CRO, Channels, Growth, Intelligence, Sales + context foundation + orchestration router	marketing-skill/
🚀 Productivity	6	`capture` (brain-dump-to-action), `email` pair (inbox-setup + inbox-triage), `reflect` (journal), `handoff` (Matt Pocock-inspired), `andreessen` (market-first decision mode)	productivity/
🎨 Marketing (top-level)	1	`landing` — single-file HTML landing-page generator (4 design styles, GSAP patterns, brand palette validator)	marketing/
🔬 Research (academic)	8	`research` orchestrator (hybrid router + fallback) + 7 specialists: `pulse`, `litreview`, `grants` (NIH), `dossier`, `patent`, `syllabus`, `notebooklm`	research/
🧪 Research Operations ✨v2.9.0	5	Enterprise/cross-functional research: orchestrator + `clinical-research` (study design), `research-finance` (R&D program finance), `market-research` (sizing/survey/segmentation), `product-research` (user research) — each with onboarding + customization + opt-in autoresearch bridge	research-ops/
📋 Project Management	9	Senior PM, scrum master, Jira, Confluence, Atlassian admin, templates + bundled Atlassian Remote MCP	project-management/
🏥 Regulatory & QM	18	ISO 13485, MDR 2017/745, FDA, ISO 27001, GDPR, SOC 2, CAPA, risk management	ra-qm-team/
🛡️ Compliance OS	9	Compliance operating system — controls, evidence, audit-readiness workflows	compliance-os/
💼 C-Level Advisory	66	Full C-suite (CEO/CTO/CFO/CMO/CRO/CPO/COO/CHRO/CISO/GC/CDO/CAIO/CCO/VPE) + founder-mode agents + orchestration + board meetings + culture & collaboration	c-level-advisor/
📈 Business & Growth	5	Customer success, sales engineer, revenue ops, contracts & proposals, BizDev toolkit	business-growth/
🏭 Business Operations	7	Orchestrator + process-mapper, vendor-management, capacity-planner, internal-comms, knowledge-ops, procurement-optimizer	business-operations/
🤝 Commercial	8	Orchestrator + pricing-strategist, deal-desk, partnerships-architect, channel-economics, commercial-policy, rfp-responder, commercial-forecaster	commercial/
💰 Finance	4	Financial analyst (DCF, budgeting, forecasting), SaaS metrics coach, business investment advisor	finance/

Personas

Pre-configured agent identities with curated skill loadouts, workflows, and distinct communication styles. Personas go beyond "use these skills" — they define how an agent thinks, prioritizes, and communicates.

Persona	Domain	Best For
Startup CTO	Engineering + Strategy	Architecture decisions, tech stack selection, team building, technical due diligence
Growth Marketer	Marketing + Growth	Content-led growth, launch strategy, channel optimization, bootstrapped marketing
Solo Founder	Cross-domain	One-person s

…

Footnotes

Hermes Agent is BYO-sync tier: the repo ships a pre-generated .hermes/skills/claude-skills/ tree, but you run python scripts/sync-hermes-skills.py once locally to install into ~/.hermes/skills/. Uses the same agentskills.io SKILL.md standard — no format conversion. ↩
Mistral Vibe is also BYO-sync tier: the repo ships a pre-generated .vibe/skills/claude-skills/ tree, run ./scripts/vibe-install.sh once locally to install into ~/.vibe/skills/. Same agentskills.io SKILL.md standard — no format conversion. Docs: https://docs.mistral.ai/mistral-vibe/agents-skills. ↩