Evaluation

3 items

Projects

matchspec

Eval framework. Define correct, test against it, get results.

21 Go
Go AI Evaluation Mist-stack

evaldriven.org

Ship evals before you ship features.

7 Markdown
AI Evaluation Methodology

mcpbr supermodeltools

Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench.

4 Python
Python AI Evaluation MCP Methodology

All tags

AI (10) Aws (1) Cloud Computing (1) Compiler (1) Evaluation (3) Go (6) Inference (1) MCP (2) Methodology (2) Mist-stack (5) Observability (3) Python (2) TypeScript (4)