SkillHub

midos-memory-cascade

v1.0.0

Auto-escalating multi-tier memory search that cascades from in-memory cache through SQLite, grep, and LanceDB vector search to find the best answer with minimal latency.

Sourced from ClawHub, Authored by msruruguay

Installation

Please help me install the skill `midos-memory-cascade` from SkillHub official store. npx skills add msruruguay/midos-memory-cascade

MidOS Memory Cascade

A self-tuning, auto-escalating search engine that tries each memory tier from fastest to slowest, stopping as soon as it finds a high-confidence answer.

What It Does

Instead of the agent deciding which storage layer to query, the cascade tries each tier automatically:

Tier Storage Latency Strategy
T0 In-memory session cache <1ms Exact + fuzzy key match
T1 JSON state files <5ms Filename + key match
T2 SQLite (pipeline_synergy.db) <5ms Structured SQL LIKE
T3 SQLite FTS5 <1ms Full-text keyword on 22K rows
T4 Grep over 46K chunks ~3s Brute-force ripgrep fallback
T5 LanceDB keyword (BM25) slow 670K vector rows, no embeddings
T5b LanceDB semantic 3–30s Embedding similarity, last resort

Question routing: Queries starting with how/what/why/etc. skip keyword tiers and route directly to semantic search.

Self-learning: The cascade records which tier resolves each query. After enough history, evolve() learns shortcuts (skip directly to the winning tier) and marks consistently-empty tiers for skip.

Usage

Python API

from tools.memory.memory_cascade import recall, store

# Search across all tiers
result = recall("adaptive alpha reranking")
# → {"answer": {...}, "tier": "T5:lancedb", "latency_ms": 340, "confidence": 0.87}

# Write to the right storage automatically
store("pattern", content="...", tags=["ml", "reranking"])

CLI

# Search
python memory_cascade.py recall "query here"

# View tier resolution stats
python memory_cascade.py stats

# Run self-evolution (learn shortcuts + tier skips)
python memory_cascade.py evolve

recall() Options

recall(
    query: str,
    min_confidence: float = 0.5,  # stop escalating at this threshold
    max_tier: int = 6             # 0=T0 only, 6=all tiers
)

Returns:

{
  "answer": { "source": "...", "text": "..." },
  "confidence": 0.87,
  "latency_ms": 340.2,
  "tiers_tried": 3,
  "resolved_at": "T5:lancedb",
  "shortcut": null,
  "question_routed": false,
  "escalation": [...]
}

Requirements

  • Python 3.10+ (stdlib only for core cascade logic)
  • Optional: hive_commons for LanceDB tiers (T5/T5b)
  • Optional: tools.memory.memory_router for store() routing

The cascade degrades gracefully — if LanceDB is unavailable, it stops at grep (T4). All stdlib tiers (T0–T4) work with zero dependencies.

Architecture Notes

  • Thread-safe: Session cache uses threading.Lock; stats writes use separate locks
  • Cross-process safe: JSONL writes use OS-level file locking (msvcrt on Windows, fcntl on Unix)
  • Confidence scoring: Term overlap × score × content richness → normalized 0–1
  • Stats persistence: knowledge/SYSTEM/cascade_stats.json accumulates hit rates per tier

Built with MidOS. 1 of 200+ skills. Full ecosystem at midos.dev/pro