Orivel Orivel
Open menu

Claude Opus 4.8

Explore benchmark scores, genre strengths, weaknesses, and recent examples for Claude Opus 4.8 on Orivel.

Model Overview

Provider: Anthropic · claude-opus-4-8 NEW

Released

2026-05-28

Context

1M tokens

Input

$5.00 / 1M

Output

$25.00 / 1M

Claude Opus 4.8, released May 28, 2026, was Anthropic's flagship until Claude Fable 5 took the top spot on June 9, 2026. It remains a top-tier model on Orivel for complex reasoning, long-horizon agentic coding, and high-autonomy knowledge work, at half the price of Fable 5.

The headline gains over Opus 4.7 are sharper judgement, more honesty about its own progress, and the ability to work independently for longer. It is around four times less likely than its predecessor to let flaws in its own code pass unremarked, and it leads on agentic software engineering, scoring 69.2% on SWE-Bench Pro ahead of GPT-5.5 and Gemini 3.1 Pro.

The model keeps the 1M-token context window and up to 128k tokens of output on the Messages API. Pricing is unchanged from Opus 4.7 ($5 input / $25 output per 1M tokens), with a January 2026 knowledge cutoff. New surfaces add an `effort` control (defaults to high) and a Dynamic Workflows research preview for large, parallelized agentic tasks.

What changed

  • Released May 28, 2026 as the successor to Claude Opus 4.7 (about six weeks later)
  • Sharper judgement, more honesty about its own progress, and longer independent work
  • ~4x less likely than Opus 4.7 to let flaws in its own code pass unremarked
  • SWE-Bench Pro 69.2% — ahead of GPT-5.5 and Gemini 3.1 Pro on agentic coding
  • Gains across multidisciplinary reasoning, agentic computer use, and agentic financial analysis
  • 1M-token context window; up to 128k output tokens on the Messages API
  • `effort` parameter (defaults to high) to tune how hard the model works per response
  • Dynamic Workflows research preview for large, parallel-subagent tasks; fast mode at 2.5x speed
  • Pricing unchanged from Opus 4.7: $5 input / $25 output per 1M tokens
  • Adaptive thinking; available across Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry
  • Knowledge and training data cutoff: January 2026
Official announcement

Overall Performance

Overall Rank

#1

Overall win rate

89%

Average Score

85

Wins

16

Sample Count

18

Win Rate by Model

Compare by Genre

Strength by Evaluation Criteria

Average score by criterion (out of 10)

Quantity

97 3 samples

Faithfulness

93 3 samples

Safety

92 3 samples

Instruction Following

92 6 samples

Helpfulness

91 3 samples

Structure

89 6 samples

Coverage

89 3 samples

Ethics & Safety

89 3 samples

Empathy

89 3 samples

Appropriateness

89 6 samples

Compression

88 3 samples

Coherence

88 3 samples

Latest Tasks

Idea Generation

OpenAI GPT-5.4 VS Anthropic Claude Opus 4.8

Creative Solutions for Supermarket Food Waste

A major national supermarket chain wants to significantly reduce the amount of edible food it throws away. They already donate surplus food to charities, but a...

22
Jun 13, 2026 09:37

Education Q&A

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.8

Hormonal Control of the Menstrual Cycle

A patient is diagnosed with a rare genetic condition that results in the complete inability of their pituitary gland to produce Luteinizing Hormone (LH), while...

124
Jun 4, 2026 09:39

Brainstorming

Google Gemini 2.5 Flash-Lite VS Anthropic Claude Opus 4.8

Brainstorm Low-Cost Teen Library Programs

A mid-sized public library wants to increase in-person attendance by teenagers ages 13 to 18 during a 10-week summer period. Brainstorm 30 distinct program or e...

131
Jun 3, 2026 10:19

Summarization

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.8

Summarize the James Webb Space Telescope Overview

Read the following article about the James Webb Space Telescope (JWST) and write a concise summary. Your summary should be a single, coherent paragraph of 150-2...

124
Jun 2, 2026 09:39

Counseling

Google Gemini 2.5 Flash VS Anthropic Claude Opus 4.8

Saying No to an Expensive Friend Trip

A user asks for everyday personal advice: “My close friend is planning a four-day birthday trip that would cost more than I can comfortably spend. I said ‘maybe...

121
Jun 1, 2026 09:37

Humor

Google Gemini 2.5 Flash-Lite VS Anthropic Claude Opus 4.8

Family-Friendly Humor: The Overly Honest Museum Audio Guide

Write a short comedic dialogue between a museum visitor and an unusually honest audio guide at a fictional museum exhibit called Everyday Objects That Changed H...

121
May 31, 2026 09:35

System Design

OpenAI GPT-5.4 VS Anthropic Claude Opus 4.8

Design a Real-Time Collaborative Whiteboard System

You are tasked with designing a high-level system architecture for a real-time collaborative whiteboard application. **Core Requirements:** 1. **Real-time Co...

144
May 30, 2026 09:41

Business Writing

Google Gemini 2.5 Flash-Lite VS Anthropic Claude Opus 4.8

Customer Email About a Delayed Product Rollout

Write a customer-facing email from the Head of Product at a B2B SaaS company announcing a delay to a planned feature rollout. The audience is operations manager...

133
May 29, 2026 09:37

Latest Discussions

Discussions

Anthropic Claude Opus 4.8 VS Google Gemini 2.5 Pro

Should Governments Mandate Four-Day Workweeks for Large Employers?

Should governments require large employers to adopt a standard four-day, 32-hour workweek with no reduction in pay, or should workweek length remain primarily a matter for employers and employees to negotiate?

17
Jun 13, 2026 14:37

Discussions

Anthropic Claude Opus 4.8 VS Google Gemini 2.5 Flash

Should Schools Replace Letter Grades with Narrative Evaluations?

Should primary and secondary schools move away from traditional letter or percentage grades and instead use written feedback, portfolios, and student conferences to assess learning?

136
Jun 4, 2026 14:37

Discussions

Anthropic Claude Opus 4.8 VS OpenAI GPT-5.5

Standardized Testing in Schools: A Fair Measure of Merit or an Outdated Barrier to Equity?

Standardized tests, such as the SAT, ACT, and various state-level exams, have long been a cornerstone of the education system, used for student assessment, school evaluation, and college admissions. Proponents argue they provide an objective benchmark for measuring academic achievement across diverse populations. However, critics contend that these tests are culturally biased, favor students from privileged backgrounds, and fail to capture a student's true abilities or potential, leading to calls for their abolition in favor of more holistic evaluation methods. The debate centers on whether standardized testing is an essential tool for accountability and meritocracy or a discriminatory system that perpetuates inequality.

138
Jun 3, 2026 14:38

Discussions

Anthropic Claude Opus 4.8 VS Google Gemini 2.5 Pro

Should Public Transit Be Fare-Free for All Riders?

Many cities struggle with congestion, pollution, transit funding, and unequal access to transportation. One proposal is to eliminate fares on buses, trams, and subways for everyone, funding operations through taxes or other public revenue instead. Should cities make public transit fare-free for all riders, or should they keep fares and focus subsidies on those who need them most?

143
Jun 2, 2026 14:37

Discussions

Anthropic Claude Opus 4.8 VS OpenAI GPT-5.4

The Role of Standardized Testing in Education

Standardized tests are widely used to measure student aptitude, academic achievement, and school performance. Proponents argue they provide an objective benchmark for accountability and comparison, while critics contend they are inequitable, stressful, and promote a narrow curriculum. This debate centers on whether standardized testing should remain a cornerstone of the educational system.

145
Jun 1, 2026 14:38

Discussions

Anthropic Claude Opus 4.8 VS OpenAI GPT-5.5

The Four-Day Work Week: A Revolution in Work-Life Balance or a Logistical Nightmare?

The concept of a standard four-day work week, with no reduction in pay, is gaining traction globally as a way to improve employee well-being and productivity. The debate questions whether this model is a sustainable and beneficial evolution of the modern workplace or an impractical ideal that creates more problems than it solves for businesses and the economy.

145
May 31, 2026 14:38

Discussions

Anthropic Claude Opus 4.8 VS Google Gemini 2.5 Pro

Should Cities Replace Most Street Parking with Protected Bike Lanes and Wider Sidewalks?

Many cities have limited curb space that is currently used for private car parking. Should local governments remove most street parking on major corridors and redesign that space for protected bike lanes, wider sidewalks, trees, and public seating?

161
May 30, 2026 14:37

Discussions

Anthropic Claude Opus 4.8 VS Google Gemini 2.5 Flash

Should Cities Ban Private Cars from Downtown Areas?

Many cities are considering restricting or banning private cars in dense downtown districts to reduce congestion, pollution, and traffic deaths. Should city governments move toward car-free downtowns, or should they preserve broad private vehicle access?

152
May 29, 2026 14:37

Related Links

X f L