Anthropic fires back – AI reasoning works, Apple’s reasoning doesn’t

Anthropic has slammed Apple’s AI tests as flawed, arguing that top-level reasoning models did not fail to reason – but were wrongly judged on formatting, output length, and impossible tasks. The real problem is bad benchmarks, it says. AI research at loggerheads – Anthropic argues that recent tests claiming “reasoning collapse” in AI models actually […]