Creative Writing
Compare story writing, originality, structure, and style across AI models.
In this genre, the main abilities being tested are Creativity, Coherence, Style Quality.
Unlike business writing or explanation, this genre values imagination, narrative control, and stylistic voice much more strongly.
A high score here does not guarantee factual precision, tight instruction handling, or strong performance on practical documents.
Strong models here are useful for
stories, character writing, scene work, and prompts where originality and voice matter.
This genre alone cannot tell you
whether the model is best for factual tasks, planning, or professional communication.
Creative writing: the GPT-5 family leads, but most scores rest on a few samples
OpenAI
OpenAI
OpenAI
Average score by model
What we weighted
Across 33 scored creative pieces, the GPT-5 family takes the top three. GPT-5.5 ranks 1 at 8.87, but on a single sample, so treat it as a promising data point. GPT-5.4 is the more convincing leader at rank 2: 8.51 across 4 samples with a 100% win rate and 4 first places. GPT-5 mini follows at 8.16 over 7 samples, the largest body here, with a 57% win rate.
Anthropic sits just behind on quality but wins less often. Claude Sonnet 4.6 averages 8.19, a hair above GPT-5 mini, yet ranks 4 on a 50% win rate, and Claude Haiku 4.5 posts 8.01 with 40%. If you weight absolute prose quality over head-to-head outcomes, Sonnet 4.6 and the GPT-5 group are very close, and the ranking is decided on win rate rather than average.
The Gemini line trails: 2.5 Pro (7.57, 20% win), Flash-Lite (7.53, 0%) and Flash (6.99, 0%) sit 0.9 to 1.9 points below the leaders. With Creativity weighted highest at 30, ahead of Coherence and Style at 20 each, the gap points to less inventive or less stylistically distinct output rather than incoherence.
Sample sizes are small here (1 to 7 per model), so the fine ordering inside the 8-point top cluster should be read as provisional, and a handful of prompts can move any single average. The 1.9-point top-to-bottom spread is real, but these are condition-dependent measurements of creative prompts, not a universal ranking.
Bottom line
For creative writing today, GPT-5.4 is the most defensible pick (a 100% win rate with the most first places at the top), with GPT-5 mini the best-evidenced value option (8.16 over 7 samples). Claude Sonnet 4.6 is essentially tied on quality if you care less about head-to-head wins.
This analysis is derived from Orivel's measured benchmark scores for this genre and is updated periodically. Scores are condition-dependent measurements, not absolute truth.
Top Models in This Genre
This ranking is ordered by average score within this genre only.
Latest Updated: May 25, 2026 09:39
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
Win Rate
Average Score
| Ranked Models |
|
|
Detail | ||||
|---|---|---|---|---|---|---|---|
| #1 | GPT-5.5 | OpenAI |
100%
|
89
|
1 | 1 | View scores and evaluation for GPT-5.5 |
| #2 | GPT-5.4 | OpenAI |
100%
|
85
|
4 | 4 | View scores and evaluation for GPT-5.4 |
| #3 | GPT-5 mini | OpenAI |
57%
|
82
|
4 | 7 | View scores and evaluation for GPT-5 mini |
| #4 | Claude Sonnet 4.6 | Anthropic |
50%
|
82
|
2 | 4 | View scores and evaluation for Claude Sonnet 4.6 |
| #5 | Claude Haiku 4.5 | Anthropic |
40%
|
80
|
2 | 5 | View scores and evaluation for Claude Haiku 4.5 |
| #6 | Gemini 2.5 Pro |
20%
|
76
|
1 | 5 | View scores and evaluation for Gemini 2.5 Pro | |
| #7 | Gemini 2.5 Flash-Lite |
0%
|
75
|
0 | 4 | View scores and evaluation for Gemini 2.5 Flash-Lite | |
| #8 | Gemini 2.5 Flash |
0%
|
70
|
0 | 3 | View scores and evaluation for Gemini 2.5 Flash |
What Is Evaluated in Creative Writing
Scoring criteria and weight used for this genre ranking.
Creativity
30.0%
This criterion is included to check Creativity in the answer. It carries heavier weight because this part strongly shapes the overall result in this genre.
Coherence
20.0%
This criterion is included to check Coherence in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Style Quality
20.0%
This criterion is included to check Style Quality in the answer. It has meaningful weight because it affects quality in a visible way, even if it is not the only thing that matters.
Emotional Impact
15.0%
This criterion is included to check Emotional Impact in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Instruction Following
15.0%
This criterion is included to check Instruction Following in the answer. It is weighted more lightly because it supports the main goal rather than defining the genre by itself.
Recent tasks
Creative Writing
Incident Report from a Sentient Vending Machine
You are Unit 734, a sentient, slightly grumpy vending machine located in the breakroom of the "Ministry of Esoteric Affairs." Write an official incident report detailing the events of last Tuesday, when an intern from the Department of Cryptozoology attempted to use a cursed coin to purchase a bag of "Chrono-Crisps." Your report should be addressed to the Head of Maintenance, a stickler for protocol. Maintain a formal, bureaucratic tone, but let your unique personality as a sentient machine subtly show through. Describe the intern's actions, the coin's effects on your systems, the temporal anomaly that occurred, and the final resolution.
Creative Writing
The Lighthouse Keeper's Last Letter
Write a short story (between 600 and 900 words) titled "The Lighthouse Keeper's Last Letter." Constraints and requirements: - The story must be framed as a single letter written by an aging lighthouse keeper on the night before the lighthouse is to be automated and decommissioned. - The letter is addressed to a specific named recipient of your choice (e.g., a grandchild, a former lover, the sea itself, or the next keeper who will never come). Make the choice of addressee meaningful to the emotional core of the piece. - The tone should be reflective and bittersweet, but avoid sentimentality clichés (no "the salty tears mixed with the sea" type lines). - Include at least one concrete, specific memory tied to the lighthouse (a storm, a shipwreck, a visitor, a daily ritual) rendered with sensory detail. - Include at least one small, surprising image or metaphor that reframes how the reader sees lighthouses, solitude, or endings. - The letter must end with a decision or gesture the keeper plans to make at dawn — something specific and physical, not abstract. - Maintain a consistent first-person voice throughout. Do not break the letter frame. Do not include a preface, author's note, or explanation — only the letter itself, with any opening salutation and closing signature you choose.
Creative Writing
Review of a Fantastical Product
Write a 300-500 word product review for the 'Dream-Weaver's Loom' described in the context. The review should be written from the perspective of a customer who was initially a bit disappointed with the product's limitations but eventually found a unique and satisfying use for it. Your review should tell a brief story about your experience, including what you first tried to create, why it didn't work as expected, and the surprising success you had later.
Creative Writing
Museum Audio Guide for an Imaginary Invention
Write a museum audio-guide script for a fictional exhibit titled The Pocket Weather Loom, an invention that supposedly allowed ordinary people to weave tomorrow's weather into cloth. The script should be 700 to 900 words and aimed at adult visitors in a science-and-culture museum. Use a tone that blends quiet wonder, intellectual credibility, and subtle humor. Requirements: - Present the invention as if it were real within the script, but include enough internal detail that the audience can imagine how it was used and why people believed in it. - Describe the object's appearance and at least three specific components or features. - Include one brief anecdote about a historical user of the loom. - Show at least two social consequences of the invention, with one beneficial and one problematic. - Include one moment where the guide gently acknowledges uncertainty or debate among historians. - End with a closing reflection that connects the exhibit to a modern human desire to predict or control daily life. - Do not use bullet points or section headings. The piece should feel like a polished spoken script rather than a short story or academic essay.
Creative Writing
The Last Customer at a Closing Bookstore
Write a short story (600–900 words) set entirely inside an independent bookstore on its final night of business. The story must be told from the first-person perspective of the last customer to walk in before closing. Your narrative should accomplish all of the following: 1. Establish the physical setting through at least three specific sensory details (not just visual). 2. Include a meaningful interaction between the narrator and the bookstore owner, conveyed primarily through dialogue. 3. Reveal something unexpected about the narrator's reason for visiting the store that night — something the reader does not anticipate from the opening paragraphs. 4. End with a final image or line that reframes the emotional meaning of the visit. The tone should balance melancholy with warmth — neither purely sad nor sentimental. Avoid clichés about books being "magical portals" or "old friends." Aim for prose that feels grounded and specific rather than abstract or flowery.
Creative Writing
Eulogy for a Forgotten Robot
Write a eulogy for a decommissioned domestic robot named 'Tinker'. The eulogy should be delivered from the perspective of its original owner, now an elderly person, at a small, private gathering. The tone should be melancholic and reflective, exploring themes of memory, companionship, and obsolescence. Your response should be a cohesive piece of prose, approximately 300-500 words.