mcpbr supermodeltools Benchmark runner for Model Context Protocol servers. Paired comparison experiments on SWE-bench. 4 • Python Python AI Evaluation MCP Methodology