Skill Tester
Quality assurance testing for AI agent skills.
Install for
Real Conversations
See how the Skill Tester skill responds to actual requests. These are representative examples of what your AI agent will produce.
I just finished building a new skill called log-analyzer — it has SKILL.md, README.md, two Python scripts in scripts/, and a references/ directory. I want to make sure it meets POWERFUL tier requirements before I open a PR. How do I run the full quality check?
This skill includes executable scripts — I'll need your permission before running them.
Here's the full three-tool validation sequence for POWERFUL tier:
# Step 1 — Structural validation
python3 scripts/skill_validator.py skills/log-analyzer --tier POWERFUL
# Step 2 — Script testing (syntax, imports, runtime, output format)
python3 scripts/script_tester.py skills/log-analyzer --timeout 30 --sample-data
# Step 3 — Quality scoring
python3 scripts/quality_scorer.py skills/log-analyzer --detailed --recommendations
What POWERFUL tier requires (checklist before running):
- SKILL.md: minimum 300 lines
- 2-3 Python scripts, each 500-800 lines
--jsonand human-readable output modes on all scripts- Comprehensive argparse with multiple modes/subcommands
references/directory with documentationexpected_outputs/directory for regression testing
Interpreting results:
=== SKILL VALIDATION REPORT ===
Skill: skills/log-analyzer
Tier: POWERFUL
Overall Score: 82/100 (B)
Structure Validation: ✓ PASS
├─ SKILL.md: ✓ EXISTS (312 lines)
├─ scripts/: ✓ EXISTS (2 files)
├─ references/: ✓ EXISTS
└─ expected_outputs/: ⚠ MISSING ← add this
Script Testing:
├─ log_scanner.py: ✓ PASS (syntax, imports, --help, --json)
└─ log_summarizer.py: ⚠ WARN — no --json output mode
Recommendations:
• Add expected_outputs/ directory with sample runs
• Add --json flag to log_summarizer.py
• Add --format text/json to argument parser
Fix the warnings before the PR — a B score (80+) is acceptable for POWERFUL tier. Anything below 75 will be flagged in CI.
# Run with --json for CI-parseable output
python3 scripts/quality_scorer.py skills/log-analyzer --json | jq '.overall_score'
# Must be >= 75
Get Started
How to install and use this skill in your preferred environment.
Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.
Models & Context
Which AI models and context windows work best with this skill.
Recommended Models
Larger models produce more detailed, production-ready outputs.
Context Window
This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.
All current frontier models support 100K+ context.
Pro tips for best results
Be specific
Include numbers — users, budget, RPS — so the skill can size the architecture.
Share constraints
Compliance needs, team size, and existing stack all improve the output.
Iterate
Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.
Combine skills
Pair with companion skills below for end-to-end coverage.
Ready to try Skill Tester?
Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.
Install for