Claude Opus 4.1 is better at agentic tasks, coding and reasoning, according to a Tuesday company blog post. Leaks of Claude Opus 4.1 began appearing the day before on social platform X and TestingCatalog.
Anthropic Chief Product Officer Mike Krieger said this release is different from previous model unveilings, Bloomberg reported Tuesday.
“In the past, we were too focused on only shipping the really big upgrades,” Krieger said, per the report.
Claude Opus 4.1 is a successor to Claude Opus 4, which launched May 22. Opus 4.1 shows gains on benchmarks such as SWE-Bench Verified, a coding evaluation test, where it scores two percentage points higher than the previous model, according to the blog post.
The 4.1 model is also strong in agentic terminal coding, with a score of 43.3% on the Terminal-Bench benchmark compared with 39.2% for Opus 4, 30.2% for OpenAI’s o3, and 25.3% for Google’s Gemini 2.5 Pro, the post said.
Customers such as Windsurf, a coding app being acquired by Cognition, and Japan’s Rakuten Group have reported quicker and more accurate completion of coding tasks using Claude Opus 4.1, per the post.
The Claude Opus 4.1 release came amid signs that rival OpenAI is nearing the debut of GPT-5, the Bloomberg report said. OpenAI executives have been teasing its release, with some reports speculating it could be as soon as this month.
“One thing I’ve learned, especially in AI as it’s moving quickly, is that we can focus on what we have — and what other folks are going to do is ultimately up to them,” Krieger said when asked about GPT-5, per the report.
Anthropic, founded in 2021 by former OpenAI researchers, has focused on building safer, high-performing AI systems. The startup is generating about $5 billion in annualized revenue and is finalizing a funding round that could value it at $170 billion, the report said.
For all PYMNTS AI coverage, subscribe to the daily AI Newsletter.
Read more:
Anthropic Yanks OpenAI’s Access to Claude Model
Anthropic Launches Claude for Financial Services to Power Data-Driven Decisions
AI Agents Do Well in Simulations, Falter in Real-World Shopkeeping Test