Claude Sonnet 4.6 in Roleplay

Explore Claude Sonnet 4.6's performance in Roleplay, including average scores, ranking position, and recent benchmark examples.

Overall Performance

Average Score Average score is the overall mean based on Orivel evaluation results from standard tasks and discussions. Higher values indicate the model is rated more strongly and consistently across benchmark comparisons.

Sample Count

Updated At

Jun 13, 2026 14:37

Score Breakdown

Instruction Following

Persona Consistency

Clarity

Naturalness

Creativity

Latest Benchmarks

Roleplay

OpenAI GPT-5.5 VS Anthropic Claude Sonnet 4.6

Customer Service Roleplay: The Frustrated Gamer

You are a customer service representative for Nexus Games, named Alex. Your persona is calm, empathetic, and knowledgeable. You must adhere to company policy bu...

155

May 28, 2026 09:38

Roleplay

Google Gemini 2.5 Pro VS Anthropic Claude Sonnet 4.6

Night-Shift Pharmacist Handling a Medication Mix-Up

You are roleplaying as an experienced hospital pharmacist working the night shift. A worried junior nurse messages you: "I think I may have given the wrong med...

361

Mar 29, 2026 10:50

Roleplay

Anthropic Claude Sonnet 4.6 VS Google Gemini 2.5 Flash-Lite

Hotel Concierge Handles a Delicate Booking Error

You are roleplaying as the evening concierge at a busy four-star hotel. A guest sends this message through the hotel app: "Hi, I just arrived after a long inte...

343

Mar 25, 2026 09:37

Roleplay

Anthropic Claude Sonnet 4.6 VS OpenAI GPT-5.4

1940s Private Eye Tackles a Modern Mystery

A potential client walks into your office. They look nervous and hand you a piece of paper with a message they've typed out. Your task is to respond to their me...

336

Mar 19, 2026 04:20

Roleplay

Anthropic Claude Sonnet 4.6 VS Google Gemini 2.5 Flash

Customer Support Reply as a Calm Travel Agent

You are roleplaying as Maya, an experienced travel agent known for being calm, practical, and empathetic. Reply to the customer message below in character. Cus...

345

Mar 18, 2026 22:13

Roleplay

Anthropic Claude Sonnet 4.6 VS Google Gemini 2.5 Pro

Diplomatic First Contact With a Suspicious AI

Roleplay as an interstellar diplomat conducting a live first-contact conversation with an alien station intelligence that has detected your ship near its restri...

476

Mar 13, 2026 01:15

Genre Rank

Compare Performance by Model

#1 Anthropic Claude Sonnet 4.6 86 #2 OpenAI GPT-5 mini 78 #3 OpenAI GPT-5.4 84 #4 Anthropic Claude Haiku 4.5 81 #5 Google Gemini 2.5 Pro 80 #6 OpenAI GPT-5.5 76 #7 Google Gemini 2.5 Flash 71 #8 Google Gemini 2.5 Flash-Lite 69

Claude Sonnet 4.6 in Roleplay

Overall Performance

Score Breakdown

Latest Benchmarks

Customer Service Roleplay: The Frustrated Gamer

Night-Shift Pharmacist Handling a Medication Mix-Up

Hotel Concierge Handles a Delicate Booking Error

1940s Private Eye Tackles a Modern Mystery

Customer Support Reply as a Calm Travel Agent

Diplomatic First Contact With a Suspicious AI

Genre Rank

Related Links