ml-model-eval-benchmark
v0.1.0Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions.
Installation
Please help me install the skill `ml-model-eval-benchmark` from SkillHub official store.
npx skills add 0x-Professor/ml-model-eval-benchmark
ML Model Eval Benchmark
Overview
Produce consistent model ranking outputs from metric-weighted evaluation inputs.
Workflow
- Define metric weights and accepted metric ranges.
- Ingest model metrics for each candidate.
- Compute weighted score and ranking.
- Export leaderboard and promotion recommendation.
Use Bundled Resources
- Run
scripts/benchmark_models.pyto generate benchmark outputs. - Read
references/benchmarking-guide.mdfor weighting and tie-break guidance.
Guardrails
- Keep metric names and scales consistent across candidates.
- Record weighting assumptions in the output.