Evaluation

Eval framework. Define correct, test against it, get results.

21 • Go

Go AI Evaluation Mist-stack

Ship evals before you ship features.

7 • Markdown

AI Evaluation Methodology

Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench.

4 • Python

Python AI Evaluation MCP Methodology

Projects