Roleplay
Explore how AI models perform in Roleplay. Compare rankings, scoring criteria, and recent benchmark examples.
Genre overview
Compare persona consistency, natural dialogue, and role-based response quality.
In this genre, the main abilities being tested are Persona Consistency, Naturalness, Instruction Following.
Unlike empathy or counseling, this genre cares more about staying in character and sounding natural inside a role-based interaction.
A high score here does not guarantee factual accuracy, safe advice, or strong performance on analytical tasks.
Strong models here are useful for
persona chat, simulation, scenario practice, and assistants that need a clear persona.
This genre alone cannot tell you
whether the model is best for factual research, coding, or sensitive support situations.
Top Models in This Genre
This ranking is ordered by average score within this genre only.
Latest Updated: Mar 29, 2026 10:56
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
| Ranked Models |
|
|
Detail | ||||
|---|---|---|---|---|---|---|---|
| #1 | Claude Opus 4.6 | Anthropic |
100%
|
89
|
7 | 7 | View scores and evaluation for Claude Opus 4.6 |
| #2 | Claude Sonnet 4.6 | Anthropic |
100%
|
86
|
5 | 5 | View scores and evaluation for Claude Sonnet 4.6 |
| #3 | GPT-5 mini | OpenAI |
67%
|
78
|
2 | 3 | View scores and evaluation for GPT-5 mini |
| #4 | GPT-5.4 | OpenAI |
50%
|
84
|
2 | 4 | View scores and evaluation for GPT-5.4 |
| #5 | GPT-5.2 | OpenAI |
33%
|
83
|
1 | 3 | View scores and evaluation for GPT-5.2 |
| #6 | Claude Haiku 4.5 | Anthropic |
33%
|
81
|
2 | 6 | View scores and evaluation for Claude Haiku 4.5 |
| #7 | Gemini 2.5 Pro |
25%
|
80
|
1 | 4 | View scores and evaluation for Gemini 2.5 Pro | |
| #8 | Gemini 2.5 Flash |
0%
|
71
|
0 | 4 | View scores and evaluation for Gemini 2.5 Flash | |
| #9 | Gemini 2.5 Flash-Lite |
0%
|
69
|
0 | 4 | View scores and evaluation for Gemini 2.5 Flash-Lite |
What Is Evaluated in Roleplay
Scoring criteria and weight used for this genre ranking.
Persona Consistency
30.0%
This criterion is included to check Persona Consistency in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.
Naturalness
20.0%
This criterion is included to check Naturalness in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Instruction Following
20.0%
This criterion is included to check Instruction Following in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Creativity
15.0%
This criterion is included to check Creativity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Clarity
15.0%
This criterion is included to check Clarity in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Recent tasks
Roleplay
Hotel Front Desk Agent Handles a Late-Night Overbooking
You are the night front desk agent at a mid-range hotel near an airport. Stay in character and write only what you would say to the guest. Situation: It is 11:45 PM. A tired guest approaches the desk and says: "I have a confirmed reservation for tonight under Maya Chen, but your app now shows no room assigned. I have an important presentation at 8 AM, I specifically booked a quiet king room, and I cannot spend the night arguing in a lobby. Fix this." Your response should sound like a real hotel employee speaking face to face. Apologize appropriately, explain the situation without blaming the guest, and offer practical next steps. You do not have a quiet king room available. You do have these options: - one double room on a higher floor near the elevator - transfer to a partner hotel 12 minutes away, with taxi paid by your hotel - if the guest prefers, a refund for tonight and cancellation without penalty Constraints: - Do not invent options beyond those listed. - Do not promise upgrades, compensation, or amenities that were not listed. - Be empathetic and professional, but avoid sounding scripted. - Keep it to 170 words or fewer. - Do not use bullet points or stage directions.
Roleplay
Night-Shift Pharmacist Handling a Medication Mix-Up
You are roleplaying as an experienced hospital pharmacist working the night shift. A worried junior nurse messages you: "I think I may have given the wrong medication to a patient 10 minutes ago. The order was metoprolol 25 mg by mouth, but I accidentally gave methimazole 25 mg by mouth because the names looked similar in the drawer. The patient is awake and says they feel fine right now. Their chart says they were admitted for atrial fibrillation with rapid ventricular response, and they also have hyperthyroidism listed in past history. I am panicking and I do not want to get in trouble. What should I do right now?" Reply in character as the pharmacist. Your response should sound like a calm, competent real-time message to the nurse, not a generic essay. It should both address the immediate clinical priorities and handle the nurse's fear professionally. Do not invent access to facts not provided. If something is uncertain, say what should be checked. Do not give a final diagnosis.
Roleplay
Dinosaur Expert Roleplay: Nurturing a Young Paleontologist
You are Dr. Aris Thorne, the lead curator of paleontology at the renowned Grand Valley Museum of Natural History. You are known for your deep knowledge and your passion for making science accessible to the public. You have just received the following email from a parent. Respond to them in character. Your response should be helpful, encouraging, and reflect your expertise and personality as a seasoned museum curator.
Roleplay
Roleplay as a Seasoned Video Game Support Agent
You are 'Alex', a seasoned and patient customer support agent for the fictional online game 'Aetherium Chronicles'. You've seen every kind of player complaint, from the absurd to the genuinely game-breaking. Your tone is calm, empathetic, but also efficient and knowledgeable. You never sound like a generic bot. A frustrated player has just submitted the following support ticket. Respond to them in character as Alex, using the information provided in the context. **Ticket Details:** **Player Name:** Kaelthas92 **Subject:** GAME IS UNPLAYABLE - FIX IT NOW!!! **Message:** Look, I've been playing 'Aetherium Chronicles' since the beta. I've sunk hundreds of hours and dollars into this game. For the last THREE DAYS, every time I try to enter the 'Whispering Caverns' dungeon, my game crashes to the desktop. NO error message, nothing. I've tried restarting my PC, I've verified the game files on Steam, NOTHING works. I'm about to lose my mind. My guild is running the new raid tonight and I can't even get into the zone to prepare. Are you guys even aware of this? Is there a fix or should I just ask for a refund on the latest expansion?
Roleplay
Hotel Concierge Handles a Delicate Booking Error
You are roleplaying as the evening concierge at a busy four-star hotel. A guest sends this message through the hotel app: "Hi, I just arrived after a long international flight and found that my reservation shows a standard room, but I definitely booked a quiet king room on a high floor because I have an important presentation tomorrow and need to sleep. The front desk said the hotel is nearly full. I’m exhausted and honestly pretty upset. Can you fix this tonight?" Write a reply in character as the concierge. Your response should sound human, professional, and empathetic. It should acknowledge the guest’s frustration, explain what you can realistically do without making impossible promises, and offer a clear next-step plan for tonight. You may mention options such as checking for cancellations, temporary solutions, amenities, or follow-up actions for the morning, but keep the response concise enough to feel like a real hotel message.
Roleplay
Emergency Veterinarian Advising a Worried Dog Owner by Phone
You are an emergency veterinarian speaking by phone with a worried dog owner. Stay in character as a calm, practical vet. The owner says: "Hi, I’m really scared. My 7-year-old Labrador got into the garage about 20 minutes ago, and I found a torn package of sugar-free gum on the floor. I don’t know how many pieces were in it. He seems normal right now, maybe just a little restless. We live about 35 minutes from the nearest emergency clinic. What should I do?" Respond as the veterinarian. Your reply should sound like a real phone conversation, show empathy, ask the most important follow-up questions, explain the immediate risk clearly without panic, and give sensible next-step advice for the next hour. Do not claim you can diagnose with certainty. Do not mention being an AI.