TwinBench is one benchmark family for personal AI assistants. It reports verified evidence, projected score, measured coverage, and dimension-level reason codes so weak coverage stays visible.
TwinBench should become a public board, but every result must keep its class, coverage, and evidence basis.
The headline ranking number is coverage-adjusted verified score, not the most flattering number in the artifact.
Unsupported surfaces, missing bootstrap, and partial measurement are reported explicitly instead of flattened into a false failure.
Verified is what the run directly proved. Projected is the broader estimate with explicit assumptions. Measured coverage tells you how much of the benchmark was directly exercised.
TwinBench uses coverage-adjusted verified score for public ranking because it rewards both strength and honest measurement.
Some systems do not expose the runtime surfaces needed for a fair direct measurement. TwinBench records that explicitly instead of pretending they cleanly failed a dimension.