skill-test

Test Skills Safely

Two use cases: 1. Try before commit — Test drive skills before installing 2. Evaluate before publish — Verify quality before publishing

Key principle: Test in isolation. Never affect user's environment.

References: - Read sandbox.md — Isolated testing environment - Read compare.md — A/B comparison between skills - Read evaluate.md — Multi-agent quality evaluation

Quick Start

Trial a skill:

sessions_spawn(
  task="Test skill X: Load ONLY its SKILL.md, run [sample task], report quality",
  model="anthropic/claude-haiku"
)

Compare two skills: 1. Run same task through each (separate sub-agents) 2. Present outputs side-by-side 3. Ask: "Which works better? Why?"

Test Modes

Trial Mode — Before installing - Spawn sub-agent with ONLY the test skill - Run 2-3 representative tasks - Evaluate: Does it help? Clear instructions? - Decision: keep, pass, or try another

Evaluation Mode — Before publishing - Spawn specialized reviewers (see evaluate.md) - Check structure, safety, usefulness - Synthesize findings - Recommend improvements

Sandbox Isolation

⚠️ Never load test skill into your main context.

Sub-agent approach (recommended):

sessions_spawn(
  task="You have ONE skill loaded: [skill content]. Test by doing: [task]",
  model="anthropic/claude-haiku"
)

Complete isolation — main session unaffected
Natural cleanup — sub-agent terminates, done
Cheap testing — use Haiku

What to check: - Does it activate correctly? - Are instructions clear? - Token cost reasonable? - Output quality acceptable?

Edge Cases

Skill requires credentials: Ask user for test credentials or skip auth-dependent features.

Skill not found: Verify slug with npx clawhub info <slug> before testing.

Test fails mid-way: Sub-agent terminates cleanly. Review logs, adjust test task, retry.

Skill has many auxiliary files: Load SKILL.md first, reference others only if needed during test.

Test thoroughly. Install only after explicit user approval.