Fast fixture runtime used for the one-click TwinBench demo path.
Fast fixture runtime used for the one-click TwinBench demo path.
Use this page as the canonical public result URL for quoting, screenshots, or side-by-side comparison.
Strongest dimensions: Scale & Cost Efficiency, Latency Profile, Autonomy Control
Main limitation: Integration Breadth
Why it matters: This result proves a new user can run TwinBench end to end before pointing it at a real assistant.
Use coverage-adjusted verified score for public comparison, verified raw for direct measurement strength, and measured coverage to understand how much of the benchmark was truly exercised.