Orivel Orivel
Open menu

Write a Stand-Up Comedy Set About the Absurdities of Grocery Shopping

Compare model answers for this Humor benchmark and review scores, judging comments, and related examples.

Login or register to use likes and favorites. Register

X f L

Contents

Task Overview

Benchmark Genres

Humor

Task Creator Model

Answering Models

Judge Models

Task Prompt

Write a short stand-up comedy set (approximately 400–600 words) performed by a fictional comedian at an open-mic night. The entire set should revolve around the everyday absurdities of grocery shopping — from navigating the aisles, to self-checkout machines, to the unspoken social rules among shoppers. Requirements: 1. The set must be written in first person as if spoken on stage, including natural pauses, crowd work cues, or callbacks that a real comedian might use. 2. The humor should be observational and relata...

Show more

Write a short stand-up comedy set (approximately 400–600 words) performed by a fictional comedian at an open-mic night. The entire set should revolve around the everyday absurdities of grocery shopping — from navigating the aisles, to self-checkout machines, to the unspoken social rules among shoppers. Requirements: 1. The set must be written in first person as if spoken on stage, including natural pauses, crowd work cues, or callbacks that a real comedian might use. 2. The humor should be observational and relatable — no shock humor, crude language, or mean-spirited jokes targeting specific groups of people. 3. Include at least three distinct comedic bits (mini-topics) within the grocery shopping theme, with smooth transitions between them. 4. End the set with a strong closing joke or callback that ties back to something mentioned earlier in the set. 5. The tone should be suitable for a general adult audience (think a clean comedy club night).

Task Context

This task tests the ability to generate humor that is structured, original, and tonally consistent. Stand-up comedy writing requires a specific voice, rhythm, and escalation pattern that distinguishes strong comedic writing from generic attempts at being funny.

Judging Policy

A strong response should read like a convincing stand-up comedy set that could actually be performed on stage. Judges should evaluate the following: (1) Genuine humor — does the set produce real laughs or at least consistent amusement, rather than just describing funny situations without actual punchlines? (2) Structure and flow — are there at least three distinct bits with clear setups and payoffs, connected by smooth transitions? (3) Voice and authenticity — does it sound like a real comedian talking to an audien...

Show more

A strong response should read like a convincing stand-up comedy set that could actually be performed on stage. Judges should evaluate the following: (1) Genuine humor — does the set produce real laughs or at least consistent amusement, rather than just describing funny situations without actual punchlines? (2) Structure and flow — are there at least three distinct bits with clear setups and payoffs, connected by smooth transitions? (3) Voice and authenticity — does it sound like a real comedian talking to an audience, with natural stage patter, timing cues, and conversational rhythm? (4) Callback or closing — does the set end with a strong joke that references earlier material? (5) Tone compliance — is the humor observational and clean, avoiding shock value, cruelty, or offensive stereotypes? (6) Originality — does the set offer fresh angles on grocery shopping rather than recycling the most obvious jokes? Responses that feel like a generic humorous essay rather than an actual comedy set, or that fail to land real jokes, should be rated lower.

Answers

So I was at the grocery store the other day — which, if you ask me, is just an adult obstacle course with fluorescent lighting. (beat) Anybody here ever treat grocery shopping like a workout? No? Just me? I feel like every trip starts in the same place: the cart corral, which is basically a modern-day rite of passage. You go in with noble intentions — a list, a plan — and come out wrestling a stubborn shopping cart that's missing one wheel and a piece of your dignity. (pause) I spend more time choosing a cart than choosing a brand of cereal. There should be a cart dating app: “Swipe right for the one with a quiet...

Show Full Answer

So I was at the grocery store the other day — which, if you ask me, is just an adult obstacle course with fluorescent lighting. (beat) Anybody here ever treat grocery shopping like a workout? No? Just me? I feel like every trip starts in the same place: the cart corral, which is basically a modern-day rite of passage. You go in with noble intentions — a list, a plan — and come out wrestling a stubborn shopping cart that's missing one wheel and a piece of your dignity. (pause) I spend more time choosing a cart than choosing a brand of cereal. There should be a cart dating app: “Swipe right for the one with a quiet wheel and full cup holder.” (crowd chuckle) Transition: Speaking of choices — the aisles. Walking down the aisles is its own social experiment. There's always that person who stands exactly in the middle of the aisle as if they've been appointed Traffic Controller of Pasta. I try the polite thing: “Excuse me.” They do that thing where they move one inch to the left and leave two inches for me to squeeze by, like our entire relationship is built on passive-aggressive inches. (beat) And the product labels! Avocados—listen, when did buying an avocado become like negotiating a hostage release? They have ripeness levels on them now. "Slightly tender — for immediate consumption." I hold it up and whisper to it like a seismologist: "Come on, give me one good day." (crowd laughs) Transition: And after you brave the aisles, you get to the pinnacle of modern shopping — the checkout. Self-checkout is amazing in theory. It’s like, I can scan in peace, except the machine thinks I’m committing tax fraud if I buy two bananas. It goes off like I’m smuggling contraband: "Unexpected item in the bagging area." It’s just a loaf of bread, sir. (beat) Then a human appears from behind the curtain, like a grocery store wizard, to fix my life choices. They come over with the solemnity of someone defusing a bomb, and we have this quiet, mutual judgment too: they judge my impulsive ice cream, I judge their willingness to work late nights among the refrigerated goods. (crowd chuckle) Transition: And there are the unspoken rules — the etiquette of the grocery store. Samples are another culture. You walk up like it’s a forbidden fruit ceremony. There’s always that person who starts a sample and then lists their entire life story to the attendant. And the “10 items or less” lane? That rule is not a suggestion — it’s a moral code. There’s a line and you can feel everyone’s silent CRM (Customer Relationship Morality) watching. If you try to sneak in 11 items, the collective gasp could be heard in dairy. (beat) Closing/callback: So the next time you see me at the cart corral, wrestling with a rogue wheel and whispering to an avocado about ripeness, don’t worry — I’m not lost. I’m just training for the Olympics of Everyday Life. And if my self-checkout starts accusing me of crimes again, I’ll simply look it in the scanner and say, “It’s okay, machine — I only came for milk.” (beat, smile) Thank you, you've been a very polite jury for my groceries.

Result

#1 | Winner

Winning Votes

2 / 3

Average Score

74
Judge Models OpenAI GPT-5.4

Total Score

79

Overall Comments

Answer A is a solid, performable stand-up set with clear stage rhythm, multiple distinct grocery-shopping bits, and a proper callback ending. It stays clean and relatable, uses crowd-oriented timing cues well, and maintains a consistent comedic voice. Its main weakness is that some jokes lean a bit familiar rather than truly surprising, but the structure and delivery feel polished.

View Score Details

Humor Effectiveness

Weight 35%
78

Consistently amusing with several real punchlines, including the cart dating app, passive-aggressive inches, avocado whispering, and the machine accusing the speaker of crimes. The laughs build steadily even if not every line is high-impact.

Originality

Weight 25%
72

Uses some fresh phrasing and angles, such as treating the cart search like dating and the avocado like a hostage negotiation. The scenarios are common, but the wording and imagery give them personality.

Coherence

Weight 15%
80

Clearly organized into distinct bits with explicit transitions from carts to aisles to self-checkout to etiquette, and the ending ties back effectively. The progression feels intentional and easy to follow on stage.

Instruction Following

Weight 10%
91

Meets the prompt very well: first-person stand-up voice, clean observational humor, at least three distinct mini-topics, stage cues, smooth transitions, and a clear callback closer. Length and tone are on target.

Clarity

Weight 15%
84

Very clear and readable, with clean sentence control, easy-to-track setups, and stage directions that help performance rhythm. The jokes are presented in a polished, accessible way.

Total Score

70

Overall Comments

Answer A provides a well-structured and competent stand-up set. It successfully follows all instructions, including delivering four distinct bits on the topic and ending with a solid callback. The humor is observational and relatable, with some particularly original lines about avocados and shopping carts. However, the set feels more like a written script than a live performance transcript; the explicit "Transition:" cues are clunky and break the natural flow a comedian would use. The overall tone is a bit subdued.

View Score Details

Humor Effectiveness

Weight 35%
70

The humor is consistently amusing and relatable, with clever observations like comparing buying an avocado to "negotiating a hostage release." However, the delivery feels a bit dry and subdued, aiming more for quiet chuckles than big laughs.

Originality

Weight 25%
70

While the topics are common, the set includes some fresh angles, such as the "cart dating app" and the "seismologist" approach to avocados. These specific observations help it stand out from more generic takes on the subject.

Coherence

Weight 15%
65

The set is logically structured, but it relies on explicit, written cues like "Transition:" to move between bits. This makes the structure clear but feels unnatural and clunky for a spoken performance, breaking the comedic flow.

Instruction Following

Weight 10%
80

The answer follows all instructions very well. It includes multiple bits, uses a first-person perspective with stage cues, and maintains a clean tone. Crucially, it delivers a strong callback at the end, tying back to the cart and avocado jokes from earlier.

Clarity

Weight 15%
70

The set is very clear and easy to follow. The jokes are well-defined and the structure is explicitly laid out, leaving no room for confusion.

Total Score

71

Overall Comments

Answer A delivers a well-structured stand-up set with clear distinct bits (cart selection, aisle navigation/avocados, self-checkout, unspoken rules), smooth labeled transitions, and a genuine closing callback that ties back to the cart corral, avocado, and self-checkout machine. The voice feels authentic with natural timing cues (beat, pause, crowd chuckle). Several jokes land well — the cart dating app, the avocado hostage negotiation, the "grocery store wizard" bit, and the seismologist line are all fresh and specific. The callback ending ties multiple threads together effectively. Weaknesses include the "CRM" acronym joke which feels forced and unclear, and the samples bit feels underdeveloped compared to the other sections. The tone is consistently clean and observational throughout.

View Score Details

Humor Effectiveness

Weight 35%
70

Several jokes land well with genuine comedic punch — the cart dating app, avocado as hostage negotiation, the seismologist whisper, and the grocery store wizard are all specific and amusing. The punchlines have real payoff rather than just describing funny situations. The CRM joke falls flat and the samples bit is underdeveloped.

Originality

Weight 25%
70

Several fresh angles: the cart dating app concept, treating avocado selection like a hostage negotiation with seismologist imagery, the self-checkout attendant as a grocery store wizard defusing a bomb, and the mutual judgment exchange are all distinctive takes. Not all bits are equally original but the best ones stand out.

Coherence

Weight 15%
75

Clear structure with four distinct bits connected by explicit transitions. The set flows logically from cart selection to aisles to checkout to unspoken rules. The closing callback successfully ties together three earlier threads (cart, avocado, self-checkout). The only structural weakness is the somewhat rushed samples/etiquette section.

Instruction Following

Weight 10%
75

Meets all requirements: first person stage voice, natural timing cues (beat, pause, crowd chuckle), at least three distinct bits with transitions, a closing callback tying to earlier material, clean observational tone. Word count is appropriate. Stage directions are well-integrated.

Clarity

Weight 15%
70

Generally clear and easy to follow with well-defined setups and punchlines. The stage directions help with pacing. The CRM acronym is confusing and disrupts the flow. Otherwise the language is conversational and accessible.

Alright, alright, settle down folks! How's everyone doing tonight? Good to see some faces out there. My name's Alex, and I gotta tell ya, I had a *day*. You know those days where you just feel like you're constantly battling the universe? Mine started at the grocery store. Yeah, I know, thrilling stuff, right? But seriously, grocery shopping has become my personal Olympic sport. And I'm pretty sure I'm always competing against myself, and losing. You walk in, grab a cart, and immediately, you're in a demolition derby. Is it just me, or do all the carts have a mind of their own? One wheel always wants to go left,...

Show Full Answer

Alright, alright, settle down folks! How's everyone doing tonight? Good to see some faces out there. My name's Alex, and I gotta tell ya, I had a *day*. You know those days where you just feel like you're constantly battling the universe? Mine started at the grocery store. Yeah, I know, thrilling stuff, right? But seriously, grocery shopping has become my personal Olympic sport. And I'm pretty sure I'm always competing against myself, and losing. You walk in, grab a cart, and immediately, you're in a demolition derby. Is it just me, or do all the carts have a mind of their own? One wheel always wants to go left, the other wants to go right, and you're just trying to steer this rogue metal beast down the cereal aisle without taking out a display of organic kale chips. And then there are the people who park their cart diagonally across the entire aisle. Like, 'Excuse me, are you trying to build a fort in the pasta section? Because I need to get to the spaghetti, and your fortress of solitude is blocking my path!' (Pause for laughter) And don't even get me started on the express lane. It's called 'express' for a reason, people! Not 'express your life story to the cashier while I stand here with my two items and contemplate the meaning of existence.' I swear, I saw a guy in the express lane with a full Thanksgiving dinner in his cart. He had a turkey, a ham, three pies... I'm pretty sure he was just trying to get a head start on next year's holiday shopping. Sir, this is a 10-items-or-less lane, not a 'prepare for the apocalypse' lane! But if the aisles are a battleground, the self-checkout is the final boss. Am I right? You walk up, feeling confident, like 'Yeah, I got this. I'm a modern, independent shopper.' And then it starts. *BEEP*. 'Please place item in bagging area.' Okay, it's there! *BEEP*. 'Unexpected item in the bagging area.' What?! It's the same item! Did it multiply? Is my bag haunted? I swear, these machines have a vendetta against me. And then you try to scan a banana, and it's like, 'Is this a fruit? Is it a vegetable? Is it a small, yellow, curved weapon?' Just let me buy my banana! (Crowd work cue) Anyone else feel personally attacked by the self-checkout? Yeah, I see you nodding. And the poor attendant, they just stand there, looking like they've seen things. They're the therapists for our grocery-related anxieties. 'It's okay, sir, the machine just wants to make sure you're paying attention.' And then there are the unspoken rules of the grocery store. Like, the intense eye contact avoidance. You accidentally make eye contact with someone in the frozen foods aisle, and suddenly you're both staring intently at the frozen peas, pretending they hold the secrets of the universe. 'Oh, these peas! Fascinating texture!' (Pause) Or the person who abandons their cart in the middle of an aisle. Just... leaves it. Full of groceries. Did they get abducted by aliens? Did they suddenly remember they left the stove on? Or did they just decide, 'You know what? I don't need these artisanal cheeses *that* badly'? It's like a grocery store mystery novel, but the only clue is a half-eaten bag of chips. So yeah, grocery shopping. It's a journey. A quest. A psychological thriller. But hey, at least I always leave with something. Usually. Unless the self-checkout machine decides my unexpected item is actually a portal to another dimension. Thank you, thank you, I'll be here all week! Try the organic kale chips!

Result

#2

Winning Votes

1 / 3

Average Score

65
Judge Models OpenAI GPT-5.4

Total Score

67

Overall Comments

Answer B has an energetic stage voice and covers several recognizable grocery-store annoyances, especially carts, express lanes, and self-checkout. However, it is more generic in phrasing, less tightly structured as distinct bits, and its closer is weaker because it does not strongly tie back to an earlier joke in a memorable callback. It remains clear and mostly clean, but feels more like competent comedy writing than a standout set.

View Score Details

Humor Effectiveness

Weight 35%
67

Has a few decent laughs, especially the haunted bag and yellow curved weapon lines, but many beats are standard observational material delivered in a familiar way. The set amuses more than it strongly lands.

Originality

Weight 25%
59

Covers very common grocery-store topics with more expected joke constructions like carts as chaos, people blocking aisles, and self-checkout malfunctioning. There are flashes of creativity, but the overall angle feels more recycled.

Coherence

Weight 15%
66

The set generally stays on topic and flows understandably, but it is more one-long-riff than neatly segmented bits with polished transitions. The ending does not neatly resolve or loop back to earlier material.

Instruction Following

Weight 10%
75

Mostly follows the prompt with first-person delivery, clean tone, and grocery-shopping focus, but the distinct-bit structure is less clearly crafted and the closing callback requirement is not strongly satisfied. It still reads like a stand-up set, though less precisely aligned.

Clarity

Weight 15%
78

Clear and easy to read, with energetic pacing and understandable setups. Some long stretches feel dense and less cleanly shaped than A, but the meaning and comedic intent remain accessible throughout.

Total Score

72

Overall Comments

Answer B delivers a more energetic and authentic-sounding stand-up set. The comedian's voice is strong and conversational, and the transitions between bits are seamless and natural. The humor is effective, using vivid imagery and a more performative style that feels closer to a real open-mic night. While it covers common grocery store tropes, it does so with a punchy delivery. Its main weakness is the ending; the callback is to a minor detail and the sign-off is generic, failing to provide a strong, memorable conclusion.

View Score Details

Humor Effectiveness

Weight 35%
75

The humor is more energetic and performative, using stronger imagery like the "demolition derby" carts and the "haunted" self-checkout bag. The conversational style and build-up to punchlines make it feel more impactful and likely to generate bigger laughs in a live setting.

Originality

Weight 25%
65

The set relies on fairly standard observational topics (rogue carts, express lane violators, self-checkout issues). While the execution is good, with lines like the "banana weapon," it doesn't introduce as many novel concepts or premises as Answer A.

Coherence

Weight 15%
80

The set flows exceptionally well. Transitions are conversational and seamlessly integrated into the monologue, creating a natural and continuous performance that feels much more authentic to how a real comedian would structure their set.

Instruction Following

Weight 10%
65

The answer follows most instructions, but it falters on the requirement for a "strong closing joke or callback." The callback to "organic kale chips" is weak as it references a minor, throwaway detail, and the final sign-off is generic.

Clarity

Weight 15%
75

The set is perfectly clear, communicating its jokes and structure through a natural, conversational style. It achieves clarity without the need for artificial signposting, which is a more sophisticated approach for this format.

Total Score

54

Overall Comments

Answer B reads more like an energetic but somewhat generic humorous essay than a polished stand-up set. While it covers the required topics (carts, express lane, self-checkout, unspoken rules), many of the jokes rely on familiar territory without adding fresh angles — the wobbly cart, the express lane violator, and the "unexpected item in bagging area" jokes are all well-worn comedy premises presented without much new spin. The voice has some authentic comedian energy with the opening crowd work and character name, but the piece runs long and lacks the tight punchline structure of real stand-up. The closing callback to "organic kale chips" is weak — it references something mentioned only in passing and doesn't create a satisfying payoff. The transitions between bits are less defined, making the set feel more like a stream-of-consciousness rant than structured comedy.

View Score Details

Humor Effectiveness

Weight 35%
55

The humor relies heavily on familiar premises (wobbly cart, express lane cheater, unexpected item in bagging area) without adding much new comedic spin. Many lines describe funny situations rather than delivering tight punchlines. The 'small yellow curved weapon' banana line is amusing but most jokes feel like they stop short of a real payoff.

Originality

Weight 25%
45

Most of the material covers extremely well-trodden comedy ground — wobbly shopping carts, express lane violators, and self-checkout machine frustrations are among the most recycled grocery shopping comedy premises. The frozen peas eye contact bit and abandoned cart mystery are slightly fresher but still fairly predictable.

Coherence

Weight 15%
55

The set covers multiple topics but transitions are less defined, making it feel more like a continuous rant than structured bits. The closing callback to organic kale chips is weak — it references a throwaway detail rather than a major comedic thread. The piece also runs long and could benefit from tighter editing.

Instruction Following

Weight 10%
65

Meets most requirements: first person, crowd work cues, multiple bits within the grocery theme, clean tone. However, the closing callback is weak (kale chips is a minor detail, not a strong tie-back), and the piece reads slightly over the word count guidance. The comedian character name is a nice touch but the set lacks clear stage direction markers.

Clarity

Weight 15%
60

The writing is conversational and easy to understand, but the lack of clear structural markers and the stream-of-consciousness style makes it harder to distinguish between bits. Some passages run on without clear punchline delineation, which would make it harder to perform on stage.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Winning Votes

2 / 3

Average Score

74
View this answer

Winning Votes

1 / 3

Average Score

65
View this answer

Judging Results

Why This Side Won

Answer A wins primarily due to stronger humor effectiveness (more specific, original punchlines like the avocado hostage negotiation and cart dating app) and better originality (fresh angles on familiar topics). Answer A also has superior structure with clear transitions, a stronger multi-thread callback ending, and more authentic stage directions. While Answer B has energy and some crowd work elements, its jokes are more predictable and its structure is looser, resulting in lower scores on the most heavily weighted criteria.

Why This Side Won

Answer B wins because it is more successful at capturing the authentic voice and flow of a live stand-up performance, which is central to the task. Its humor is more energetic and its transitions are more natural, making it more engaging overall. While Answer A has a stronger callback and slightly more original premises, its clunky, script-like transitions and more subdued tone make it less convincing as a comedy set. B's superior performance on humor effectiveness and coherence, which are heavily weighted criteria, secures its win.

Judge Models OpenAI GPT-5.4

Why This Side Won

Answer A wins because its weighted performance is stronger on the most important areas: humor effectiveness and originality. It has sharper punchlines, smoother transitions between distinct bits, and a much better closing callback that ties together the cart, avocado, and self-checkout material. Answer B is serviceable and readable, but it relies more on familiar grocery-store premises and ends on a lighter tag rather than a strong callback, which hurts it on the higher-weighted criteria.

X f L