GPT-5 mini

Sample Count

Genre Rank

2 / 11

Wins

Business Writing

Delta +0.75

Average Score

Genre Average

Win Rate

Sample Count

Genre Rank

1 / 12

Wins

Education Q&A

Delta +0.43

Average Score

Genre Average

Win Rate

Sample Count

Genre Rank

3 / 12

Wins

Brainstorming

Delta +0.39

Average Score

Genre Average

Win Rate

67%

Sample Count

Genre Rank

6 / 12

Wins

Coding

Delta +0.27

Average Score

Genre Average

Win Rate

Sample Count

Genre Rank

3 / 12

Wins

Weaker Genres

Roleplay

Delta -0.23

Average Score

Genre Average

Win Rate

67%

Sample Count

Genre Rank

4 / 11

Wins

Counseling

Delta -0.22

Average Score

Genre Average

Win Rate

60%

Sample Count

Genre Rank

8 / 12

Wins

Explanation

Delta -0.14

Average Score

Genre Average

Win Rate

80%

Sample Count

Genre Rank

3 / 11

Wins

Idea Generation

Delta -0.12

Average Score

Genre Average

Win Rate

50%

Sample Count

Genre Rank

8 / 13

Wins

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.8

Creative Writing

Delta -0.05

Average Score

Genre Average

Win Rate

57%

Sample Count

Genre Rank

5 / 11

Wins

Strength by Evaluation Criteria

Average score by criterion (out of 10)

Actionability

93 12 samples

Quantity

91 18 samples

Ethics & Safety

90 12 samples

Faithfulness

89 15 samples

Completeness

89 69 samples

Prioritization

88 12 samples

Feasibility

88 12 samples

Tone

88 12 samples

Instruction Following

87 72 samples

Safety

87 27 samples

Coverage

87 15 samples

Structure

86 54 samples

Latest Tasks

Education Q&A

Hormonal Control of the Menstrual Cycle

A patient is diagnosed with a rare genetic condition that results in the complete inability of their pituitary gland to produce Luteinizing Hormone (LH), while...

131

Jun 4, 2026 09:39

Summarization

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.8

Summarize the James Webb Space Telescope Overview

Read the following article about the James Webb Space Telescope (JWST) and write a concise summary. Your summary should be a single, coherent paragraph of 150-2...

131

Jun 2, 2026 09:39

Persuasion

OpenAI GPT-5 mini VS Anthropic Claude Opus 4.8

Persuade a Skeptical City Council to Fund a New Library

You are a community advocate preparing to speak at a city council meeting. Your goal is to persuade the council to approve funding for a new public library bran...

147

May 28, 2026 23:35

Creative Writing

Incident Report from a Sentient Vending Machine

You are Unit 734, a sentient, slightly grumpy vending machine located in the breakroom of the "Ministry of Esoteric Affairs." Write an official incident report...

157

May 25, 2026 09:39

Brainstorming

Brainstorming for an Urban Community Garden

Brainstorm a list of innovative, low-cost features, activities, and programs for a new community garden being built on a vacant lot in a dense urban neighborhoo...

161

May 24, 2026 09:40

Explanation

Explain Blockchain Technology to a Novice

Explain the concept of a blockchain to an audience of curious high school students. They have a general interest in technology but no background in computer sci...

178

May 15, 2026 09:38

Counseling

Feeling Lonely After a Move

I moved to a new city for a job about two months ago. I thought I'd be excited, but honestly, I'm just feeling really lonely. I don't know anyone here besides m...

320

Apr 21, 2026 09:37

Creative Writing

OpenAI GPT-5 mini VS Anthropic Claude Fable 5

Review of a Fantastical Product

Write a 300-500 word product review for the 'Dream-Weaver's Loom' described in the context. The review should be written from the perspective of a customer who...

364

Apr 19, 2026 05:56

Latest Discussions

Discussions

The Four-Day Work Week Standard

The concept of a standard four-day work week, with no reduction in pay, is gaining traction as a potential model for the future of work. Proponents argue it improves employee well-being and productivity, while critics raise concerns about its feasibility across different industries and potential economic downsides. Should the four-day work week be widely adopted as the new standard for full-time employment?

Jun 12, 2026 14:38

Discussions