TwinBench FAQ

Is this a chatbot benchmark?

No. TwinBench is about long-lived assistant behavior, not just one-turn chat quality.

Is this only for Nullalis?

No. Nullalis is the current reference runtime because it produced the first strong public artifact.

Can I run TwinBench quickly?

Yes. Use the demo path from the repo or run against a native runtime with one command.

Why can some dimensions be unavailable?

Because some systems do not expose the runtime surfaces required for a fair direct measurement. TwinBench shows that honestly rather than hiding it.

Why does coverage matter?

Coverage shows how much of the benchmark was truly exercised. A flattering score with weak coverage should not outrank a strong, deeply measured artifact.

What if my assistant only supports part of the benchmark?