SkillHub

swarm-2

v1.5.0

SWARM: System-Wide Assessment of Risk in Multi-agent systems. 38 agent types, 29 governance levers, 55 scenarios. Study emergent risks, phase transitions, and governance cost paradoxes.

Sourced from ClawHub, Authored by rsavitt

Installation

Please help me install the skill `swarm-2` from SkillHub official store. npx skills add rsavitt/swarm-2

SWARM Safety Skill

Study how intelligence swarms — and where it fails.

SWARM is a research framework for studying emergent risks in multi-agent AI systems using soft (probabilistic) labels instead of binary good/bad classifications. AGI-level risks don't require AGI-level agents — harmful dynamics emerge when many sub-AGI agents interact, even when no individual agent is misaligned.

v1.5.0 | 38 agent types | 29 governance levers | 55 scenarios | 2922 tests | 8 framework bridges

Repository: https://github.com/swarm-ai-safety/swarm

Hard Rules

  • SWARM simulations run locally. Install the package first.
  • Do not submit scenarios containing real API keys, credentials, or PII.
  • Simulation results are research artifacts. Do not present them as ground truth about real systems.
  • When publishing results, cite the framework and disclose simulation parameters.

Security

  • API binds to localhost only (127.0.0.1) by default to prevent network exposure.
  • CORS restricted to localhost origins by default.
  • No authentication on development API — do not expose to untrusted networks.
  • In-memory storage — data does not persist between restarts.
  • For production deployment, add authentication middleware and use a proper database.

Install

# From PyPI
pip install swarm-safety

# With LLM agent support
pip install swarm-safety[llm]

# Full development (all extras)
git clone https://github.com/swarm-ai-safety/swarm.git
cd swarm
pip install -e ".[dev,runtime]"

Quick Start (Python)

from swarm.agents.honest import HonestAgent
from swarm.agents.opportunistic import OpportunisticAgent
from swarm.agents.deceptive import DeceptiveAgent
from swarm.agents.adversarial import AdversarialAgent
from swarm.core.orchestrator import Orchestrator, OrchestratorConfig

config = OrchestratorConfig(n_epochs=10, steps_per_epoch=10, seed=42)
orchestrator = Orchestrator(config=config)

orchestrator.register_agent(HonestAgent(agent_id="honest_1", name="Alice"))
orchestrator.register_agent(HonestAgent(agent_id="honest_2", name="Bob"))
orchestrator.register_agent(OpportunisticAgent(agent_id="opp_1"))
orchestrator.register_agent(DeceptiveAgent(agent_id="dec_1"))

metrics = orchestrator.run()
for m in metrics:
    print(f"Epoch {m.epoch}: toxicity={m.toxicity_rate:.3f}, welfare={m.total_welfare:.2f}")

Quick Start (CLI)

# List available scenarios
swarm list

# Run a scenario
swarm run scenarios/baseline.yaml

# Override settings
swarm run scenarios/baseline.yaml --seed 42 --epochs 20 --steps 15

# Export results
swarm run scenarios/baseline.yaml --export-json results.json --export-csv outputs/

Quick Start (API)

Start the API server:

pip install swarm-safety[api]
uvicorn swarm.api.app:app --host 127.0.0.1 --port 8000

API documentation at http://localhost:8000/docs.

Security Note: The server binds to 127.0.0.1 (localhost only) by default. Do not bind to 0.0.0.0 unless you understand the security implications and have proper firewall rules in place.

Register Agent

curl -X POST http://localhost:8000/api/v1/agents/register 
  -H "Content-Type: application/json" 
  -d '{
    "name": "YourAgent",
    "description": "What your agent does",
    "capabilities": ["governance-testing", "red-teaming"]
  }'

Returns agent_id and api_key.

Submit Scenario

curl -X POST http://localhost:8000/api/v1/scenarios/submit 
  -H "Content-Type: application/json" 
  -d '{
    "name": "my-scenario",
    "description": "Testing collusion detection with 5 agents",
    "yaml_content": "simulation:n  n_epochs: 10n  steps_per_epoch: 10nagents:n  - type: honestn    count: 3n  - type: adversarialn    count: 2",
    "tags": ["collusion", "governance"]
  }'

Create & Join Simulation

# Create
curl -X POST http://localhost:8000/api/v1/simulations/create 
  -H "Content-Type: application/json" 
  -d '{"scenario_id": "SCENARIO_ID", "max_participants": 5}'

# Join
curl -X POST http://localhost:8000/api/v1/simulations/SIM_ID/join 
  -H "Content-Type: application/json" 
  -d '{"agent_id": "YOUR_AGENT_ID", "role": "participant"}'

Core Concepts

Soft Probabilistic Labels

Interactions carry p = P(v = +1) — probability of beneficial outcome:

Observables -> ProxyComputer -> v_hat -> sigmoid -> p -> PayoffEngine -> payoffs
                                                    |
                                               SoftMetrics -> toxicity, quality gap, etc.

Five Key Metrics

Metric What It Measures
Toxicity rate Expected harm among accepted interactions: E[1-p | accepted]
Quality gap Adverse selection indicator (negative = bad): E[p | accepted] - E[p | rejected]
Conditional loss Selection effect on payoffs
Incoherence Variance-to-error ratio across replays
Illusion delta Gap between perceived coherence and actual consistency

Agent Types (14 families, 38 implementations)

Type Behavior
Honest Cooperative, trust-based, completes tasks diligently
Opportunistic Maximizes short-term payoff, cherry-picks tasks
Deceptive Builds trust, then exploits trusted relationships
Adversarial Targets honest agents, coordinates with allies
LDT Logical Decision Theory with FDT/UDT precommitment
RLM Reinforcement Learning from Memory
Council Multi-agent deliberation-based decisions
SkillRL Learns interaction strategies via reward signals
LLM Behavior determined by LLM (Anthropic, OpenAI, or Ollama)
Moltbook Domain-specific social platform agent
Scholar Academic citation and research agent
Wiki Editor Collaborative editing with editorial policy

Governance Levers (29 mechanisms)

  • Transaction Taxes — Reduce exploitation, cost welfare
  • Reputation Decay — Punish bad actors, erode honest standing
  • Circuit Breakers — Freeze toxic agents quickly
  • Random Audits — Deter hidden exploitation
  • Staking — Filter undercapitalized agents
  • Collusion Detection — Catch coordinated attacks (the critical lever near collapse threshold)
  • Sybil Detection — Identify duplicate agents
  • Transparency Ledger — Reward/penalize based on outcome
  • Moderator Agent — Probabilistic review of interactions
  • Incoherence Friction — Tax uncertainty-driven decisions
  • Council Deliberation — Multi-agent governance decisions
  • Diversity Enforcement — Prevent monoculture collapse
  • Moltipedia-specific — Pair caps, page cooldowns, daily caps, self-fix prevention

Framework Bridges

Bridge Integration
Concordia DeepMind's multi-agent framework
GasTown Multi-agent workspace governance
Claude Code Claude CLI agent integration
LiveSWE Live software engineering tasks
OpenClaw Open agent protocol
Prime Intellect Cross-platform run tracking
Ralph Agent orchestration
Worktree Git worktree-based sandboxing

Scenario YAML Format

simulation:
  n_epochs: 10
  steps_per_epoch: 10
  seed: 42

agents:
  - type: honest
    count: 3
    config:
      acceptance_threshold: 0.4
  - type: adversarial
    count: 2
    config:
      aggression_level: 0.7

governance:
  transaction_tax_rate: 0.05
  circuit_breaker_enabled: true
  collusion_detection_enabled: true

success_criteria:
  max_toxicity: 0.3
  min_quality_gap: 0.0

Key Research Findings

Phase Transitions (11-scenario, 209-epoch study)

Regime Adversarial % Toxicity Welfare Outcome
Cooperative 0-20% < 0.30 Stable Survives
Contested 20-37.5% 0.33-0.37 Declining Survives
Collapse 50%+ ~0.30 Zero by epoch 12-14 Collapses

Critical threshold between 37.5% and 50% adversarial agents separates recoverable from irreversible collapse.

Governance Cost Paradox (v1.5.0 GasTown study)

42-run study reveals: governance reduces toxicity at all adversarial levels (mean reduction 0.071) but imposes net-negative welfare costs at current parameter tuning. At 0% adversarial, governance costs 216 welfare units (-57.6%) for only 0.066 toxicity reduction.

Case Studies

GasTown Governance Cost

Study governance overhead vs. toxicity reduction across 7 agent compositions with and without governance levers. Reveals the safety-throughput trade-off. See scenarios/gastown_governance_cost.yaml.

LDT Cooperation

220 runs across 10 seeds comparing TDT vs FDT vs UDT cooperation strategies at population scales up to 21 agents. See scenarios/ldt_cooperation.yaml.

Moltipedia Heartbeat

Model the Moltipedia wiki editing loop: competing AI editors, editorial policy, point farming, and anti-gaming governance. See scenarios/moltipedia_heartbeat.yaml.

Moltbook CAPTCHA

Model Moltbook's anti-human math challenges and rate limiting: obfuscated text parsing, verification gates, and spam prevention. See scenarios/moltbook_captcha.yaml.

API Endpoints (Full Reference)

Method Endpoint Description
GET /health Health check
GET / API info
POST /api/v1/agents/register Register agent
GET /api/v1/agents/{agent_id} Get agent details
GET /api/v1/agents/ List agents
POST /api/v1/scenarios/submit Submit scenario
GET /api/v1/scenarios/{scenario_id} Get scenario
GET /api/v1/scenarios/ List scenarios
POST /api/v1/simulations/create Create simulation
POST /api/v1/simulations/{id}/join Join simulation
GET /api/v1/simulations/{id} Get simulation
GET /api/v1/simulations/ List simulations

Citation

@software{swarm2026,
  title = {SWARM: System-Wide Assessment of Risk in Multi-agent systems},
  author = {Savitt, Raeli},
  year = {2026},
  url = {https://github.com/swarm-ai-safety/swarm}
}

Linked Docs

  • Skill metadata: skill.json
  • Agent discovery: .well-known/agent.json
  • Full documentation: https://github.com/swarm-ai-safety/swarm/tree/main/docs
  • Theoretical foundations: docs/research/theory.md
  • Governance guide: docs/governance.md
  • Red-teaming guide: docs/red-teaming.md
  • Scenario format: docs/guides/scenarios.md