Benchmark suite: Task-goal inference from messy instructions
Open Capability Benchmark Consortium
Hidden-answer prompt suites show strong recovery of user intent from indirect, noisy, and incomplete requests.
Linked to
AI can understand what a task is really asking
Can it infer the real goal from messy instructions?
Mar 6, 2026