SkillHub

semantic-model-router

v1.0.3

Smart LLM Router — routes every query to the cheapest capable model. Supports 17 models across Anthropic, OpenAI, Google, DeepSeek & xAI (Grok). Uses a pre-trained ML classifier. No extra API keys required.

Sourced from ClawHub, Authored by Ray

Installation

Please help me install the skill `semantic-model-router` from SkillHub official store. npx skills add rayray1218/semantic-model-router

Semantic Model Router

Smart LLM router that saves up to 99% on inference costs by routing each request to the cheapest model that can handle it. Powered by a pre-trained ML classifier and semantic embeddings — no external calls, no API keys needed.

Install

openclaw plugins install @rayray1218/semantic-model-router

Quick Start

from scripts.model_router import ModelRouter

router = ModelRouter()
res = router.route("Design a distributed caching layer for a fintech platform.")
print(res["report"])
# [ClawRouter] anthropic/claude-sonnet-4-6 (ELITE, ml, conf=0.97)
#              Cost: $3.0/M | Baseline: $10.0/M | Saved: 70.0%

How Routing Works

Queries are classified into three tiers through a 3-stage pipeline:

  1. ML Classifier (primary): A Logistic Regression model trained on 6,000+ labeled queries. Runs in <1ms from embedded weights in model_weights.py.
  2. Semantic Embeddings (fallback): Cosine similarity to tier intent vectors via sentence-transformers.
  3. Keyword Rules (last resort): Pattern matching with no dependencies.
Tier Default Model Typical Workload Cost/1M vs Baseline
BASIC deepseek/deepseek-chat Greetings, simple Q&A, chit-chat $0.14 99% saved
BALANCED openai/gpt-4o-mini Summaries, translations, explanations $0.15 99% saved
ELITE anthropic/claude-sonnet-4-6 Complex coding, architecture, security $3.00 70% saved

Supported Models (17 total, verified Feb 2026)

Anthropic

Model Input /1M Output /1M
anthropic/claude-sonnet-4-6 $3.00 $15.00 ★ ELITE default
anthropic/claude-opus-4-5 $5.00 $25.00
anthropic/claude-haiku-4-5 $0.80 $4.00

OpenAI

Model Input /1M Output /1M
openai/gpt-5 $1.25 $10.00
openai/gpt-4o $2.50 $10.00
openai/gpt-4o-mini $0.15 $0.60 ★ BALANCED default
openai/o3 $2.00 $8.00
openai/o4-mini $1.10 $4.40

Google

Model Input /1M Output /1M
google/gemini-3.0-pro $1.25 $10.00
google/gemini-2.5-pro $1.25 $10.00
google/gemini-2.5-flash $0.30 $2.50
google/gemini-2.5-flash-lite $0.10 $0.40

DeepSeek

Model Input /1M Output /1M
deepseek/deepseek-chat (V3.2) $0.28 $0.42 ★ BASIC default
deepseek/deepseek-reasoner (V3.2) $0.28 $0.42

xAI (Grok)

Model Input /1M Output /1M
xai/grok-3 $3.00 $15.00
xai/grok-3-mini $0.30 $0.50

Pricing source: Official API docs of each provider, verified Feb 2026.

Override Models at Runtime

# Use GPT-5.2 for ELITE, Gemini Flash Lite for BASIC
router = ModelRouter(
    elite_model="openai/gpt-5.2",
    balanced_model="google/gemini-2.5-flash",
    basic_model="google/gemini-2.5-flash-lite",
)
# Swap a tier's model without recreating the router
router.set_model("ELITE", "anthropic/claude-opus-4-5")

List All Available Models (CLI)

python3 scripts/model_router.py --list-models

CLI Usage

# Route a single query
python3 scripts/model_router.py "Implement AES encryption from scratch"

# Override ELITE model
python3 scripts/model_router.py --elite openai/gpt-5.2 "Write a compiler"

# Run full smoke-test
python3 scripts/model_router.py

Dynamic Keyword Expansion

router.add_keywords("ELITE", ["cryptographic proof", "zero-knowledge"])

Example Output

Query                                              Predicted  Expected   ✓  Cost Info
────────────────────────────────────────────────────────────────────────────────────
How are you doing today?                           BASIC      BASIC      ✓  $0.14/M  saved 98.6%
Summarize this article in three bullet points.     BALANCED   BALANCED   ✓  $0.15/M  saved 98.5%
Implement a thread-safe LRU cache in Python.       ELITE      ELITE      ✓  $3.0/M   saved 70.0%

Security & Privacy

  • Zero external calls: All classification runs locally.
  • No API keys: The router itself needs none.
  • Transparent weights: All model parameters live in scripts/model_weights.py — fully auditable.

Save costs, route smarter. Built for the OpenClaw community.