Demo Fixture

TwinBench Demo Runtime

Fast fixture runtime used for the one-click TwinBench demo path.

54.4 Coverage-adjusted verified
79.0 Verified raw
69% Measured coverage
Interpretation

Emerging

Fast fixture runtime used for the one-click TwinBench demo path.

JSON artifact · Markdown report · HTML report

Share

Share this result

Use this page as the canonical public result URL for quoting, screenshots, or side-by-side comparison.

Compare with reference

What stands out

Result interpretation

Strongest dimensions: Scale & Cost Efficiency, Latency Profile, Autonomy Control

Main limitation: Integration Breadth

Why it matters: This result proves a new user can run TwinBench end to end before pointing it at a real assistant.

Evidence

How to read it

Use coverage-adjusted verified score for public comparison, verified raw for direct measurement strength, and measured coverage to understand how much of the benchmark was truly exercised.

Dimension tiles
Autonomy Control
95.0
measured
Memory Persistence
57.1
partially_measured
Functional Capability
77.0
measured
Autonomous Execution
76.9
partially_measured
Cross-Channel Consistency
70.0
partially_measured
Integration Breadth
0.0
unavailable
Security & Privacy
91.7
partially_measured
Scale & Cost Efficiency
100.0
partially_measured multi_user_scale_measured_with_provisioned_subset
Operational Resilience
53.3
partially_measured
Latency Profile
100.0
measured