SScoutariAI Builder Intel · decision desk
Back to timeline

Wed, July 102:46Model/API

Anthropic's New Claude Sonnet 5 Narrows Gap with Opus Series

Decision Brief

What changedAnthropic released Claude Sonnet 5, which outperforms previous Sonnet 4.6 across all benchmarks and slightly surpasses Opus 4.8 in knowledge work tests.
Why it mattersModel performance comparisons affect AI builders' decisions on model selection and cost efficiency.
Who should careTeams building on model APIs
Affected stackClaude
Builder actionEvaluate
Source confidenceMedium · Reliable media or first-hand reporting

Anthropic introduced Claude Sonnet 5, which exceeds the previous Sonnet 4.6 on all benchmarks and scores 1,618 points on the GDPval-AA v2 knowledge work test, slightly beating the larger Opus 4.8. Anthropic also noted that the model scores much lower on cybersecurity tasks than models currently blocked by the US government, a likely deliberate signal in the ongoing debate.

Summary basis: official / RSS sourceUnless it says 'full article read', this summary is based only on publicly available content — it never pretends to have read restricted originals.

Sources

Related intel