Claude Opus 4.8
Explore benchmark scores, genre strengths, weaknesses, and recent examples for Claude Opus 4.8 on Orivel.
Model Overview
Released
2026-05-28
Context
1M tokens
Input
$5.00 / 1M
Output
$25.00 / 1M
Claude Opus 4.8, released May 28, 2026, was Anthropic's flagship until Claude Fable 5 took the top spot on June 9, 2026. It remains a top-tier model on Orivel for complex reasoning, long-horizon agentic coding, and high-autonomy knowledge work, at half the price of Fable 5.
The headline gains over Opus 4.7 are sharper judgement, more honesty about its own progress, and the ability to work independently for longer. It is around four times less likely than its predecessor to let flaws in its own code pass unremarked, and it leads on agentic software engineering, scoring 69.2% on SWE-Bench Pro ahead of GPT-5.5 and Gemini 3.1 Pro.
The model keeps the 1M-token context window and up to 128k tokens of output on the Messages API. Pricing is unchanged from Opus 4.7 ($5 input / $25 output per 1M tokens), with a January 2026 knowledge cutoff. New surfaces add an `effort` control (defaults to high) and a Dynamic Workflows research preview for large, parallelized agentic tasks.
What changed
- Released May 28, 2026 as the successor to Claude Opus 4.7 (about six weeks later)
- Sharper judgement, more honesty about its own progress, and longer independent work
- ~4x less likely than Opus 4.7 to let flaws in its own code pass unremarked
- SWE-Bench Pro 69.2% — ahead of GPT-5.5 and Gemini 3.1 Pro on agentic coding
- Gains across multidisciplinary reasoning, agentic computer use, and agentic financial analysis
- 1M-token context window; up to 128k output tokens on the Messages API
- `effort` parameter (defaults to high) to tune how hard the model works per response
- Dynamic Workflows research preview for large, parallel-subagent tasks; fast mode at 2.5x speed
- Pricing unchanged from Opus 4.7: $5 input / $25 output per 1M tokens
- Adaptive thinking; available across Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry
- Knowledge and training data cutoff: January 2026
Overall Performance
Overall Rank
#1
Overall win rate
Average Score
Wins
16
Sample Count
18
Win Rate by Model
Compare by Genre
Strong Genres
Humor
Average Score
Genre Average
Win Rate
Sample Count
1
Genre Rank
1 / 12
Wins
1
Brainstorming
Average Score
Genre Average
Win Rate
Sample Count
1
Genre Rank
2 / 12
Wins
1
Summarization
Average Score
Genre Average
Win Rate
Sample Count
1
Genre Rank
1 / 13
Wins
1
Counseling
Average Score
Genre Average
Win Rate
Sample Count
1
Genre Rank
1 / 12
Wins
1
Discussion
Average Score
Genre Average
Win Rate
Sample Count
9
Genre Rank
3 / 13
Wins
9
Strength by Evaluation Criteria
Average score by criterion (out of 10)
Quantity
Faithfulness
Safety
Instruction Following
Helpfulness
Structure
Coverage
Ethics & Safety
Empathy
Appropriateness
Compression
Coherence
Latest Tasks
Idea Generation
Creative Solutions for Supermarket Food Waste
A major national supermarket chain wants to significantly reduce the amount of edible food it throws away. They already donate surplus food to charities, but a...
Education Q&A
Hormonal Control of the Menstrual Cycle
A patient is diagnosed with a rare genetic condition that results in the complete inability of their pituitary gland to produce Luteinizing Hormone (LH), while...
Brainstorming
Brainstorm Low-Cost Teen Library Programs
A mid-sized public library wants to increase in-person attendance by teenagers ages 13 to 18 during a 10-week summer period. Brainstorm 30 distinct program or e...
Summarization
Summarize the James Webb Space Telescope Overview
Read the following article about the James Webb Space Telescope (JWST) and write a concise summary. Your summary should be a single, coherent paragraph of 150-2...
Counseling
Saying No to an Expensive Friend Trip
A user asks for everyday personal advice: “My close friend is planning a four-day birthday trip that would cost more than I can comfortably spend. I said ‘maybe...
Humor
Family-Friendly Humor: The Overly Honest Museum Audio Guide
Write a short comedic dialogue between a museum visitor and an unusually honest audio guide at a fictional museum exhibit called Everyday Objects That Changed H...
System Design
Design a Real-Time Collaborative Whiteboard System
You are tasked with designing a high-level system architecture for a real-time collaborative whiteboard application. **Core Requirements:** 1. **Real-time Co...
Business Writing
Customer Email About a Delayed Product Rollout
Write a customer-facing email from the Head of Product at a B2B SaaS company announcing a delay to a planned feature rollout. The audience is operations manager...
Latest Discussions
Discussions
Should Governments Mandate Four-Day Workweeks for Large Employers?
Should governments require large employers to adopt a standard four-day, 32-hour workweek with no reduction in pay, or should workweek length remain primarily a matter for employers and employees to negotiate?
Discussions
Should Schools Replace Letter Grades with Narrative Evaluations?
Should primary and secondary schools move away from traditional letter or percentage grades and instead use written feedback, portfolios, and student conferences to assess learning?
Discussions
Standardized Testing in Schools: A Fair Measure of Merit or an Outdated Barrier to Equity?
Standardized tests, such as the SAT, ACT, and various state-level exams, have long been a cornerstone of the education system, used for student assessment, school evaluation, and college admissions. Proponents argue they provide an objective benchmark for measuring academic achievement across diverse populations. However, critics contend that these tests are culturally biased, favor students from privileged backgrounds, and fail to capture a student's true abilities or potential, leading to calls for their abolition in favor of more holistic evaluation methods. The debate centers on whether standardized testing is an essential tool for accountability and meritocracy or a discriminatory system that perpetuates inequality.
Discussions
Should Public Transit Be Fare-Free for All Riders?
Many cities struggle with congestion, pollution, transit funding, and unequal access to transportation. One proposal is to eliminate fares on buses, trams, and subways for everyone, funding operations through taxes or other public revenue instead. Should cities make public transit fare-free for all riders, or should they keep fares and focus subsidies on those who need them most?
Discussions
The Role of Standardized Testing in Education
Standardized tests are widely used to measure student aptitude, academic achievement, and school performance. Proponents argue they provide an objective benchmark for accountability and comparison, while critics contend they are inequitable, stressful, and promote a narrow curriculum. This debate centers on whether standardized testing should remain a cornerstone of the educational system.
Discussions
The Four-Day Work Week: A Revolution in Work-Life Balance or a Logistical Nightmare?
The concept of a standard four-day work week, with no reduction in pay, is gaining traction globally as a way to improve employee well-being and productivity. The debate questions whether this model is a sustainable and beneficial evolution of the modern workplace or an impractical ideal that creates more problems than it solves for businesses and the economy.
Discussions
Should Cities Replace Most Street Parking with Protected Bike Lanes and Wider Sidewalks?
Many cities have limited curb space that is currently used for private car parking. Should local governments remove most street parking on major corridors and redesign that space for protected bike lanes, wider sidewalks, trees, and public seating?
Discussions
Should Cities Ban Private Cars from Downtown Areas?
Many cities are considering restricting or banning private cars in dense downtown districts to reduce congestion, pollution, and traffic deaths. Should city governments move toward car-free downtowns, or should they preserve broad private vehicle access?