TokenRanger

TokenRanger compresses session context through a local Ollama SLM before sending to cloud LLMs — reducing input token costs by 50–80% per turn with graceful fallthrough if anything goes wrong.

Plugin repo: https://github.com/peterjohannmedina/openclaw-plugin-tokenranger
npm: openclaw-plugin-tokenranger
Maintained by: @peterjohannmedina

When to Load This Skill

User asks to install, configure, or troubleshoot TokenRanger
User wants to reduce token costs or enable context compression
User runs /tokenranger commands and needs help interpreting output
User wants to switch compression strategy (GPU/CPU/off)
User asks about upgrading or uninstalling TokenRanger

How It Works

User message → OpenClaw gateway
  → before_agent_start hook
  → Turn 1: skip (full fidelity)
  → Turn 2+: send history to localhost:8100/compress
  → FastAPI sidecar runs LangChain LCEL chain via Ollama
  → Compressed summary prepended to context
  → Cloud LLM receives compressed context instead of full history

Inference strategy is auto-selected by GPU availability:

Strategy	Trigger	Model	Approach
`full`	GPU available	`mistral:7b`	Deep semantic summarization
`light`	CPU only	`phi3.5:3b`	Extractive bullet points
`passthrough`	Ollama unreachable	—	Truncate to last 20 lines

Install

Step 1 — Install the plugin

openclaw plugins install openclaw-plugin-tokenranger

To pin an exact version:

openclaw plugins install [email protected] --pin

Step 2 — First-time setup

openclaw tokenranger setup

This pulls Ollama models, creates the Python venv, installs FastAPI/LangChain deps, and registers the sidecar as a system service (systemd on Linux, launchd on macOS).

Step 3 — Restart gateway

openclaw gateway restart

Step 4 — Verify

openclaw tokenranger

Should show current settings and sidecar status (reachable / unreachable).

Configuration

Set config values with:

openclaw config set plugins.entries.tokenranger.config.<key> <value>
openclaw gateway restart

Key	Default	Description
`serviceUrl`	`http://127.0.0.1:8100`	TokenRanger sidecar URL
`timeoutMs`	`10000`	Max wait before fallthrough
`minPromptLength`	`500`	Min chars before compressing
`ollamaUrl`	`http://127.0.0.1:11434`	Ollama API URL
`preferredModel`	`mistral:7b`	Model for GPU strategy
`compressionStrategy`	`auto`	`auto` / `full` / `light` / `passthrough`
`inferenceMode`	`auto`	`auto` / `cpu` / `gpu` / `remote`

Force CPU-only mode:

openclaw config set plugins.entries.tokenranger.config.compressionStrategy light
openclaw config set plugins.entries.tokenranger.config.inferenceMode cpu
openclaw gateway restart

Commands

Command	Description
`/tokenranger`	Show current settings and sidecar health
`/tokenranger mode gpu`	Force GPU (full) compression
`/tokenranger mode cpu`	Force CPU (light) compression
`/tokenranger mode off`	Disable compression (passthrough)
`/tokenranger model`	List available Ollama models
`/tokenranger toggle`	Enable / disable the plugin

Upgrading

# Check for updates (dry run)
openclaw plugins update tokenranger --dry-run

# Apply update
openclaw plugins update tokenranger
openclaw tokenranger setup   # re-runs setup if sidecar deps changed
openclaw gateway restart

To pin a specific version:

openclaw plugins install [email protected] --pin
openclaw tokenranger setup
openclaw gateway restart

List all published versions:

npm view openclaw-plugin-tokenranger versions --json

Uninstalling

openclaw plugins uninstall tokenranger
openclaw gateway restart

Remove the sidecar service manually:

# Linux
systemctl --user stop tokenranger && systemctl --user disable tokenranger
rm ~/.config/systemd/user/tokenranger.service

# macOS
launchctl unload ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist
rm ~/Library/LaunchAgents/com.peterjohannmedina.tokenranger.plist

Troubleshooting

Sidecar unreachable after setup:

# Linux
systemctl --user status tokenranger
journalctl --user -u tokenranger -n 50

# macOS
launchctl list | grep tokenranger
cat ~/Library/Logs/tokenranger.log

# Manual start (any platform)
~/.openclaw/extensions/tokenranger/service/start.sh

Ollama not found:

curl http://127.0.0.1:11434/api/tags
# If not running:
ollama serve

Compression not reducing tokens: - Check minPromptLength — default 500 chars; short conversations are skipped by design - Run /tokenranger to confirm strategy is not passthrough - Check sidecar logs for errors

Graceful degradation: TokenRanger never blocks a message. Any failure → silent fallthrough to uncompressed cloud LLM call.

Performance Reference

5-turn Discord benchmark (GPU, mistral:7b-instruct):

Turn	Input tokens	Compressed	Reduction
2	732	125	82.9%
3	1,180	150	87.3%
4	1,685	212	87.4%
5	2,028	277	86.3%

Cumulative: 5,866 → 885 tokens (84.9% reduction)