When Claude Code builds AI features, it skips the quality checks. This pack makes those checks required.
Drop one folder into your project. Claude Code now must test every AI feature before shipping, keep a history of every prompt change, plan what happens when the AI fails, and check for security holes — or it cannot mark the task done.
Not edge cases. What happens by default when Claude Code builds an AI feature with no quality rules.
The feature launched. Nobody measured whether it actually worked. Three weeks later, a prompt change made quality drop from 89% to 72% and nobody noticed until users started complaining — because there was no test to catch it.
The instruction sent to the AI was edited directly in the server settings. No history, no way to undo. When something broke, nobody could tell which of the three recent edits caused it.
The AI service went down. The product threw an error. Users saw a blank screen. The developer added a fallback handler at 3am. It should have been designed before the feature ever shipped.
The AI had no protection against manipulation. A user found a way to make it ignore its instructions on day two and posted it publicly. The fix took three days. A two-hour review would have caught it.
Each rule has a required output format that must contain real values. If Claude skips testing the feature, it cannot produce a real pass rate. If it skips the safety check, it cannot produce a real safety report. There is no "it looks good" path.
Claude cannot write Pass rate: 91.7% and Baseline: v1.0.0 @ 88.3% without actually running the test. That is the enforcement.
Three mandatory checks that block shipping if not satisfied. Five workflow rules that enforce how AI development work should be done.
Each specialist has a narrow focus and explicit rules for what it won't do. Each produces output at a real file path — not a chat message.
Writes and improves AI instructions. Saves every version. Never claims quality without running a test first.
Builds test suites for AI features. Defines what 'working' means with real metrics. Writes the tests that the prompt-engineer must pass.
Designs AI-powered search and retrieval features. Always checks your data first. Measures how accurate the results are before the feature goes live.
Runs a structured comparison of AI models for your specific task. Never picks based on name recognition or general benchmarks.
Reviews every AI feature before it reaches users. Checks for 4 risk categories. Every feature gets PASS or BLOCK — nothing in between.
The difference is work done before the PR was opened, not problems caught after the feature shipped.
prompts/classifier/v1.2.0.md in git. Rollback: change one line.Sparse-clone installs only the skills and agents directories. Nothing else touches your project.
skills/ and .claude/agents/ into your project using cp -n (non-destructive — won't overwrite existing files).
No global changes, no dependencies installed, no configuration required.
Each pack enforces one domain. All work standalone. All share the same enforcement model — Completion Statement Formats that require real values.
25 pre-configured engineering specialists, a lead orchestrator, 18 workflow skills, and a Pipeline Auditor. The base layer.
AI engineering quality gates. Eval, prompt versioning, fallback design, RAG pipelines, safety review.
AI UI design enforcement. States, streaming UI, prompt UX, accessibility, design tokens — the design layer AI features need.
Product quality gates. PRD, feature scoping, metric definition, research synthesis, A/B test design, AI feature validation.
Growth quality gates. Positioning, copy, funnel analysis, experiments, retention design, AI messaging review.