agent-cost-strategy
v1.2.1Tiered model selection and cost optimization for multi-agent AI workflows. Use this skill whenever you are choosing a model for a task, spinning up a sub-agent, setting up cron jobs or heartbeats, or trying to reduce API spend. Also use when the user says "save costs", "which model should I use", "o...
Installation
Agent Cost Strategy
Use the cheapest model that can reliably do the job. Most tasks don't need your most powerful model.
The Three Tiers
| Tier | When to Use | Examples |
|---|---|---|
| Fast/Cheap | Sub-agents, background tasks, automated fixes, simple lookups, short replies | Claude Haiku, GPT-4o-mini, Gemini Flash |
| Mid-tier | Main session dialogue, moderate reasoning, multi-step tasks | Claude Sonnet, GPT-4o, Gemini Pro |
| Powerful | Architecture decisions, deep reviews, hard problems, after cheaper models fail twice | Claude Opus, GPT-4.5, Gemini Ultra |
Task → Tier Routing
Fix failing tests → Fast/Cheap
Write boilerplate → Fast/Cheap
Research / search → Fast/Cheap
Cron / scheduled tasks → Fast/Cheap (always)
Short replies (hi, ok) → Fast/Cheap (always)
Background monitoring → Fast/Cheap (always)
Build new feature → Mid-tier
Review a PR → Mid-tier
Main assistant dialogue → Mid-tier (default)
Architecture decisions → Powerful
Deep code review → Powerful
Stuck after 2 attempts → Escalate one tier up
Heartbeat / Cron Model Rule
Always specify the cheapest model for scheduled and background tasks — they run frequently and costs add up fast. Check your platform's config for how to set a model per cron/heartbeat job.
For heartbeat intervals: set them just under your provider's cache TTL to keep the prompt cache warm and pay cache-read rates instead of full input rates. Check your provider's docs for the exact TTL.
Communication Pattern Rule
One-word and short conversational messages (hi, thanks, ok, sure, yes, no) should always route to Fast/Cheap. Never burn a mid-tier or powerful model on an acknowledgment.
Cache Optimization
Prompt caching cuts costs 50-90% on repeated context. See references/cache-optimization.md for patterns.
Signs You're Over-Spending
- Running powerful models on tasks Fast/Cheap can handle
- No caching on repeated system prompts
- Heartbeat/cron jobs using the default (expensive) model
- Spawning sub-agents without specifying a model tier