Engineering Advanced TestingQAAI Skills

Skill Tester

Quality assurance testing for AI agent skills.

Install for

Real Conversations

See how the Skill Tester skill responds to actual requests. These are representative examples of what your AI agent will produce.

You

I just finished building a new skill called log-analyzer — it has SKILL.md, README.md, two Python scripts in scripts/, and a references/ directory. I want to make sure it meets POWERFUL tier requirements before I open a PR. How do I run the full quality check?

Agent

This skill includes executable scripts — I'll need your permission before running them.

Here's the full three-tool validation sequence for POWERFUL tier:

# Step 1 — Structural validation
python3 scripts/skill_validator.py skills/log-analyzer --tier POWERFUL

# Step 2 — Script testing (syntax, imports, runtime, output format)
python3 scripts/script_tester.py skills/log-analyzer --timeout 30 --sample-data

# Step 3 — Quality scoring
python3 scripts/quality_scorer.py skills/log-analyzer --detailed --recommendations

What POWERFUL tier requires (checklist before running):

  • SKILL.md: minimum 300 lines
  • 2-3 Python scripts, each 500-800 lines
  • --json and human-readable output modes on all scripts
  • Comprehensive argparse with multiple modes/subcommands
  • references/ directory with documentation
  • expected_outputs/ directory for regression testing

Interpreting results:

=== SKILL VALIDATION REPORT ===
Skill: skills/log-analyzer
Tier: POWERFUL
Overall Score: 82/100 (B)

Structure Validation: ✓ PASS
├─ SKILL.md: ✓ EXISTS (312 lines)
├─ scripts/: ✓ EXISTS (2 files)
├─ references/: ✓ EXISTS
└─ expected_outputs/: ⚠ MISSING  ← add this

Script Testing:
├─ log_scanner.py: ✓ PASS (syntax, imports, --help, --json)
└─ log_summarizer.py: ⚠ WARN — no --json output mode

Recommendations:
• Add expected_outputs/ directory with sample runs
• Add --json flag to log_summarizer.py
• Add --format text/json to argument parser

Fix the warnings before the PR — a B score (80+) is acceptable for POWERFUL tier. Anything below 75 will be flagged in CI.

# Run with --json for CI-parseable output
python3 scripts/quality_scorer.py skills/log-analyzer --json | jq '.overall_score'
# Must be >= 75

Get Started

How to install and use this skill in your preferred environment.

Skills are designed for AI coding agents (Claude Code, Cursor, Windsurf) and IDE-based workflows where the agent can read files, run scripts, and act on your codebase.

Models & Context

Which AI models and context windows work best with this skill.

Recommended Models

Best
Claude Opus 4.8 Claude Sonnet 4 GPT-4.1 Gemini 2.5 Pro Grok 3 Kimi K2
Good
Claude Haiku 4.5 GPT-4.1 mini Gemini 2.5 Flash Grok 3 mini

Larger models produce more detailed, production-ready outputs.

Context Window

This skill's SKILL.md is typically 3–10 KB — fits in any modern context window.

8K Skill only
32K+ Skill + conversation
100K+ Skill + references + codebase

All current frontier models support 100K+ context.

Pro tips for best results

1

Be specific

Include numbers — users, budget, RPS — so the skill can size the architecture.

2

Share constraints

Compliance needs, team size, and existing stack all improve the output.

3

Iterate

Start with a high-level design, then ask follow-ups for IaC, cost analysis, or security review.

4

Combine skills

Pair with companion skills below for end-to-end coverage.

Ready to try Skill Tester?

Install the skill and start getting expert-level guidance in your workflow — any agent, any IDE.

Install for

← Browse all skills