Back to Skills

Cost

AI Build Cost Tracker — track how much AI is costing you per feature. Use when user wants to track AI spending, understand cost per feature, optimize AI usage, or budget their Claude/GPT costs.

ai
By Houseofmvps
10913Updated 1 day agoJavaScriptMIT

Skill Content

# AI Build Cost Tracker

Know exactly what each feature costs to build with AI. Budget smarter, spend less.

## Process

### Phase 1: Show Current State

```bash
node ${CLAUDE_PLUGIN_ROOT}/tools/cost-tracker.mjs <project-directory> show
```

Parse the JSON output for cost history and summary.

### Phase 2: Cost Dashboard

Present spending overview:

**Total Spend:**
- All-time total cost
- This week / this month breakdown
- Average cost per task

**By Task/Feature:**
- Ranked list of features by cost (most expensive first)
- Flag any outliers (tasks costing 3x+ the average)

**By Model:**
- Cost breakdown by AI model used
- Potential savings from model switching

**Daily Trend:**
- Last 30 days of daily spending
- Identify high-spend days

### Phase 3: Cost Insights

Generate actionable insights:

1. **Where money goes** — what types of tasks cost the most (debugging? features? refactoring?)
2. **Optimization opportunities** — tasks that could use a cheaper model
3. **Budget projection** — at current rate, monthly spend estimate
4. **Cost per line of code** — rough estimate based on git diff

### Phase 4: Logging New Costs

To log a new cost entry:
```bash
node ${CLAUDE_PLUGIN_ROOT}/tools/cost-tracker.mjs <project-directory> log "<label>" <input_tokens> <output_tokens> --model=claude-opus-4-6
```

Labels should be descriptive kebab-case: `auth-feature`, `bug-fix-login`, `refactor-api`, `seo-optimization`

### Phase 5: Recommendations

Based on spending patterns:

**If debugging > 30% of spend:**
- "Consider investing in more tests upfront — debugging is your biggest cost driver"
- Suggest running `/test-driven-development` skill

**If a single task > 3x average:**
- "The [task] cost $X — consider breaking large tasks into smaller chunks"

**Model optimization:**
- "Routine refactoring with Sonnet instead of Opus would save ~$X/month"
- "Code review tasks can use Haiku — potential savings of ~$X/month"

## Logging Guide

Help the user understand when and how to log costs:

- Log at the END of each task/feature (not during)
- Use consistent labels across sessions
- Include ALL tokens (input + output) from the conversation
- Estimate if exact numbers aren't available (Claude Code shows token usage in the UI)

### Phase 6: Build vs. Buy Framework

When reviewing cost data, help the user think about the economics of building with AI:

**Worth building with AI (high ROI):**
- Features that would take 2+ days manually but 2 hours with AI
- Boilerplate-heavy work (CRUD, migrations, test suites, API endpoints)
- Exploration and prototyping (try 3 approaches, keep the best one)

**Consider alternatives:**
- If a feature costs >$100 in AI tokens, check if a library or SaaS already does it
- If debugging the same issue costs >$20 in AI tokens, the root cause is architectural — fix the architecture, not the symptom
- If AI-generated code needs significant manual correction, the prompt needs work, not more tokens

**Cost benchmarks for solo founders:**
| Task Type | Expected Cost | If Higher, Investigate |
|---|---|---|
| New API endpoint | $1-5 | Complex business logic or unclear spec |
| Bug fix | $0.50-3 | Missing tests or hard-to-reproduce issue |
| Full feature (frontend + backend) | $5-20 | Large scope or frequent rework |
| Refactoring | $2-10 | Unclear boundaries or missing tests |
| Content/copy generation | $0.25-1 | Too many revision cycles |

## Key Principles

- **AI is an investment, not a cost.** The goal isn't to minimize spend — it's to maximize ROI. A $50 feature that generates $5K MRR is a great investment. But a $50 bug fix that should have cost $5 with better tests is waste.
- **Track to learn, not to punish.** Cost data reveals where your process is inefficient. High debugging costs mean insufficient tests. High rework costs mean unclear specs.
- **Optimize the expensive tasks, not the cheap ones.** Switching from Opus to Haiku for a $0.10 task saves nothing. Reducing a $50 debugging session by improving test coverage saves real money.

How to use

  1. Copy the skill content above
  2. Create a .claude/skills directory in your project
  3. Save as .claude/skills/ultraship-cost.md
  4. Use /ultraship-cost in Claude Code to invoke this skill
<div align="center"> <img src="assets/hero-banner.jpg" alt="Ultraship — Claude Code Plugin" width="100%"/>

Claude Code plugin. 43 expert-level skills for building, shipping, and scaling production software. 37 audit tools (accessibility, vibe-coding security, AI evals, pentest, code quality, bundle size, SEO + AI Readiness check) plus a blocking ship-gate close the loop before deploy. A built-in Currency Guard keeps Claude on current docs, not stale training data.

npm version npm downloads npm total GitHub stars License: MIT CI Sponsor


Follow @kaileskkhumar LinkedIn houseofmvps.com kailxlabs.co

Built by Kaileskkhumar, founder of HouseofMVPs and Kailxlabs

</div>
0 dependencies · 274 tests · Node.js ESM · MIT

Install

# Claude Code plugin
claude plugin marketplace add Houseofmvps/ultraship
claude plugin install ultraship

# Or standalone via npx
npx ultraship ship .
npx ultraship seo .
npx ultraship security .

How It Works

flowchart LR
    U["You type a<br/>slash command"] --> S["Skill<br/>(markdown instructions)"]
    S --> A["Agent<br/>(dispatched worker)"]
    S --> T["Tools<br/>(Node.js scripts)"]
    A --> T
    T --> O["JSON Results"]
    O --> R["Scorecard / Report /<br/>Actionable Fixes"]

    style U fill:#f59e0b,stroke:#d97706,color:#000
    style S fill:#8b5cf6,stroke:#7c3aed,color:#fff
    style A fill:#3b82f6,stroke:#2563eb,color:#fff
    style T fill:#10b981,stroke:#059669,color:#000
    style R fill:#ef4444,stroke:#dc2626,color:#fff
flowchart TD
    subgraph Lifecycle["Full Lifecycle Coverage"]
        direction LR
        I["Idea<br/>/brainstorm"] --> B["Build<br/>/sprint"]
        B --> AU["Audit<br/>/ship /seo /secure"]
        AU --> D["Ship<br/>/deploy"]
        D --> L["Launch<br/>/launch /compete"]
        L --> G["Grow<br/>/grow /cost"]
        G --> RE["Rescue<br/>/rescue /canary"]
    end

    style I fill:#8b5cf6,stroke:#7c3aed,color:#fff
    style B fill:#3b82f6,stroke:#2563eb,color:#fff
    style AU fill:#f59e0b,stroke:#d97706,color:#000
    style D fill:#10b981,stroke:#059669,color:#000
    style L fill:#06b6d4,stroke:#0891b2,color:#000
    style G fill:#84cc16,stroke:#65a30d,color:#000
    style RE fill:#ef4444,stroke:#dc2626,color:#fff

What /ship Does

/ship runs 6 tools in parallel and outputs a scorecard:

flowchart LR
    SHIP["/ship"] --> SEO["seo-scanner<br/>63 rules"]
    SHIP --> A11Y["a11y-scanner<br/>WCAG 2.2"]
    SHIP --> SEC["secret-scanner<br/>+ npm audit"]
    SHIP --> CODE["code-profiler<br/>N+1, leaks, ReDoS"]
    SHIP --> BUNDLE["bundle-tracker<br/>JS/CSS/images"]
    SHIP --> ENV["env-validator<br/>+ migration-checker"]

    SEO --> SC["Scorecard<br/>READY TO SHIP"]
    A11Y --> SC
    SEC --> SC
    CODE --> SC
    BUNDLE --> SC
    ENV --> SC

    style SHIP fill:#f59e0b,stroke:#d97706,color:#000
    style SC fill:#10b981,stroke:#059669,color:#000
    style SEO fill:#3b82f6,stroke:#2563eb,color:#fff
    style SEC fill:#3b82f6,stroke:#2563eb,color:#fff
    style CODE fill:#3b82f6,stroke:#2563eb,color:#fff
    style BUNDLE fill:#3b82f6,stroke:#2563eb,color:#fff
    style ENV fill:#3b82f6,stroke:#2563eb,color:#fff
+===========================================+
|      U L T R A S H I P   S C O R E       |
+===========================================+
|  SEO + AI Vis.  92/100  ############-    |
|  Security        95/100  ############-    |
|  Code Quality    88/100  ###########--    |
|  Bundle Size     97/100  ############-    |
+===========================================+
|   OVERALL         90/100                  |
|   STATUS          READY TO SHIP           |
+===========================================+
<details> <summary>Demo</summary> <img src="assets/demo.gif" alt="Ultraship — SEO audit, secret scanning, scorecard" width="100%"/> </details>

Tools (40)

Each tool is a standalone Node.js script (node tools/<name>.mjs). JSON output. Exit 0 always. No build step.

Auditing

ToolWhat it checks
seo-scanner63 rules: 39 SEO (meta tags, canonicals, headings, OG tags, structured data, sitemap, cross-page duplicate/orphan detection), 20 GEO (AI bot access in robots.txt, snippet restrictions, llms.txt, structured data for AI extraction), 4 AEO (FAQPage/HowTo/speakable schema)
a11y-scannerWCAG 2.2 A/AA static checks: missing alt text, unlabeled form controls, icon-only buttons, missing lang/title/main, heading order, positive tabindex, zoom disabled, duplicate ids, broken aria references. Zero false positives.
ship-gateBlocking quality gate — scores all auditors (shared math with /ship), compares to .ultraship/ship-gate.json thresholds, hard-fails on leaked secrets / critical findings, exits 1 on fail. Generates a pre-push hook + GitHub Actions workflow.
secret-scannerAWS keys, Stripe keys, JWT secrets, database URLs, private keys. Redacts values in output.
vibe-security-scannerVibe-Coding Security Sentinel — context secret-scanner misses: server-only secrets behind a NEXT_PUBLIC_/VITE_ prefix, a decoded Supabase service_role key exposed to the client, service_role in a "use client" file, Supabase tables with no RLS. Zero false positives.
eval-scannerLocates every LLM call site (Anthropic, OpenAI, Gemini, Mistral, Cohere, Ollama, Vercel AI SDK, LangChain) by provider + model id, detects the test runner and whether an eval suite exists. Flags AI features shipping with no evals. Seeds /evals. Zero false positives.
code-profilerN+1 queries, sync I/O in handlers, unbounded queries, missing indexes, memory leaks, sequential awaits, ReDoS risk
bundle-trackerJS/CSS/image sizes in build output. Detects heavy deps (momentdayjs, lodash→native). History for before/after. Monorepo-aware.
dep-doctorUnused dependencies via import graph analysis (not just grep). Dead wrapper files. Outdated packages.
content-scorerFlesch-Kincaid readability, keyword density, thin content detection, GEO heading analysis
lighthouse-runnerLighthouse via headless Chrome. Core Web Vitals, render-blocking resources, diagnostics.

Validation

ToolWhat it checks
health-checkHTTP status, response time, SSL certificate (issuer, expiry), 6 security headers
env-validatorCompares .env.example against actual .env. Catches missing/empty/placeholder vars.
migration-checkerPending DB migrations for Drizzle, Prisma, Knex
og-validatorOpen Graph tags, image reachability, size validation
redirect-checkerRedirect chains, loops, mixed HTTP/HTTPS. Sitemap-based bulk check.
api-smoke-testHit API endpoints, check status codes, response times, CORS headers

Generators

ToolWhat it creates
sitemap-generatorsitemap.xml from HTML files and routes
robots-generatorAI-friendly robots.txt (allows GPTBot, PerplexityBot, ClaudeBot)
llms-txt-generatorllms.txt for AI assistant discoverability
structured-data-generatorJSON-LD schema markup

Competitive & Launch

ToolWhat it does
compete-analyzerCompares two URLs: tech stack, SEO score, security headers, response time. ASCII comparison card.
launch-prepReads project, generates PH/Twitter/LinkedIn/HN copy, 14-item checklist, press kit
demo-prepFinds console.logs, TODOs, placeholder text, missing favicons. Scores demo readiness.

Operations

ToolWhat it does
incident-commanderHealth check + git culprit analysis + error patterns + rollback commands + post-mortem template
growth-trackerUptime, git velocity, SEO trajectory, dep health. Stores snapshots for week-over-week comparison.
cost-trackerLog AI token usage per feature/model. Built-in pricing for Claude, GPT-4o, Gemini. Daily trends.
pentest-scannerAutomated penetration testing: XSS, SQLi, SSTI, command injection, path traversal, CORS, JWT, GraphQL introspection, prototype pollution, race conditions, request smuggling. Zero false positives, every finding has proof-of-concept.
canary-monitorPost-deploy canary monitoring: HTTP status, response time, error patterns, baseline regression detection. Auto-saves baselines for future comparison.
retro-analyzerSprint retrospective: git velocity, commit patterns (features vs fixes), test health, hot files, shipping cadence. Generates insights and recommendations.
learnings-managerProject learnings CRUD: save, search, list, prune, export. Structured knowledge that compounds across sessions.

Project Analysis

ToolWhat it does
onboard-generatorAuto-generates developer guide: stack, directory tree, routes, schema, env vars, Mermaid diagram
architecture-mapper4 Mermaid diagrams: system overview, route tree, DB ER, data flow. Circular dependency + orphan detection.
pattern-analyzerAnalyzes testing, error handling, TypeScript usage, CI/CD, git practices. Cross-repo comparison.
audit-historySaves/compares audit scores over time

Integrations (optional)

ToolWhat it does
gsc-clientGoogle Search Console: submit sitemaps, inspect URLs, query rankings (requires ULTRASHIP_GSC_CREDENTIALS)
bing-webmasterBing Webmaster: submit sitemaps/URLs, IndexNow instant push, keyword research, backlinks, site-scan, URL inspection (requires ULTRASHIP_BING_KEY). Powers ChatGPT Search + Microsoft Copilot.
ga4-clientGoogle Analytics 4: overview, top-pages, landing-pages, traffic-sources, conversions, user-journey, devices, realtime, ai-traffic (ChatGPT/Perplexity/Copilot tracking), organic (search-only). --organic flag.
keyword-intelligence12-command keyword engine: analyze, quick-wins, cannibalization, content-gaps, intent-map, trending, high-intent, page-keywords, content-decay, difficulty, anomalies (CTR anomalies), cross-reference (GSC↔GA4). --brand flag for non-brand filtering.
index-doctorIndex diagnosis: inspect URLs via GSC URL Inspection API, diagnose 15+ coverage states, auto-fix and submit to Bing.

View source on GitHub