Methodology
These public-facing capability pillars are plain-language summaries built on a deeper coverage map spanning reasoning, learning, truthfulness, self-monitoring, social competence, multimodal understanding, safety, and robustness. Each granular question is intended to be backable by benchmarks, controlled studies, audits, red-team exercises, longitudinal trials, or expert-blind review.
In progressLow confidenceTool use
AI can use digital tools and external systems reliably
An AI system can choose tools well, execute across systems, recover from failures, and use documentation to operate unfamiliar interfaces.
Progress70%
Updated Mar 12, 2026
Evidence items 5
Sub-questions 5