Migrate Agentic QE projects from v2 to v3 with zero data loss
Multi-agent autonomous startup system for Claude Code. Triggers on "Loki Mode". Orchestrates 100+ specialized agents across engineering, QA, DevOps, security, data/ML, business operations, marketing, HR, and customer success. Takes PRD to fully deployed, revenue-generating product with zero human intervention. Features Task tool for subagent dispatch, parallel code review with 3 specialized reviewers, severity-based issue triage, distributed task queue with dead letter handling, automatic deployment to cloud providers, A/B testing, customer feedback loops, incident response, circuit breakers, and self-healing. Handles rate limits via distributed state checkpoints and auto-resume with exponential backoff. Requires --dangerously-skip-permissions flag.
Write focused pytest tests as standalone functions (one test per function), avoiding test classes.
Measure and improve code coverage in the Duroxide durable execution runtime. Use when asked about coverage, testing coverage, running llvm-cov, or improving test coverage percentages.
Template and guide for creating skills. Demonstrates the standard skill structure with resources, docs, examples, and templates directories. Use this as a reference when building new protocol integrations.
This skill should be used when the user asks to "test my site", "test the site", "run site tests", "check if site is working", "verify site", "smoke test", "test pages", "check api calls", "test web api", "verify deployment works", or wants to test a deployed, activated Power Pages site at runtime using browser-based navigation, page crawling, and API request verification.
Practical Python scripts for debugging awf - parse logs, diagnose issues, inspect containers, test domains
Test a published Copilot Studio agent — send test messages, run batch test suites, or analyze evaluation results.
Send a message to a bot via DirectLine v3 REST API and get the full response. Use when the user has a DirectLine secret or Copilot Studio token endpoint URL. Supports auth/sign-in flows via OAuthCard detection.
Runs automated tests to validate plugin integrity across 14 categories. Use before creating PRs, after making changes to skills or templates, or to verify plugin health.
Run unit tests that require the Spanner emulator. Use this skill when the user wants to run tests in packages like satellite/metabase, satellite/metainfo, or any other tests that interact with Spanner. Automatically handles checking for and configuring the Spanner emulator environment.
Create comprehensive bidirectional requirements traceability matrix mapping acceptance criteria → implementation → tests with gap analysis, severity ratings, and coverage assessment. Maps each AC to implementation evidence (files, functions, code snippets) and test coverage (test files, scenarios, priorities). Use during quality review or for compliance audits to verify complete requirements coverage.
Automatic cache invalidation system với Laravel Observers và Next.js On-Demand Revalidation. Tự động sync data real-time giữa backend và frontend khi admin update. USE WHEN cần setup cache management, sync frontend-backend, API cache strategy, hoặc user phàn nàn "phải Ctrl+F5 mới thấy data mới".
AI-powered code generation toolkit (UV scripts migrated to builder-skill-uvscript)
Centralized JSON validation for AGENT_SUCCESS_CRITERIA with defensive parsing and injection attack prevention (CVSS 8.2)
Unified agent management from selection through completion - spawning, execution, output processing. Use when selecting agents for tasks, spawning agents with dependency validation, processing agent outputs, or tracking agent lifecycle events with audit trails.
Validates BAZINGA completion claims with independent verification. Spawned ONLY when PM sends BAZINGA. Acts as final quality gate - verifies test failures, coverage, evidence, and criteria independently. Returns ACCEPT or REJECT verdict.
Design, configure, launch, and analyze ablation sweeps for GRPO training. Use for hypothesis testing, hyperparameter experiments, and systematic comparisons.
study-skills for enhanced learning effectiveness and personal development.
Review checkpoint specs and tests to identify tests that encode ambiguous interpretations rather than explicit requirements. Use when asked to check checkpoint_N.md against test_checkpoint_N.py, when auditing tests for ambiguity, or when reviewing snapshot eval failures for interpretive issues.