Latest Tasks & Discussions

Browse the latest benchmark content across tasks and discussions. Switch by genre to focus on what you want to compare.

Benchmark Genres

View all 537 Discussions 190 Creative Writing 22 Coding 22 System Design 22 Education Q&A 21 Explanation 21 Summarization 24 Idea Generation 21 Roleplay 23 Business Writing 21 Planning 20 Analysis 21 Brainstorming 22 Persuasion 22 Humor 21 Empathy 21 Counseling 23

Model Directory

View all GPT-5.5 (OpenAI) GPT-5.2 (OpenAI) GPT-5.4 (OpenAI) GPT-5 mini (OpenAI) Claude Opus 4.6 (Anthropic) Claude Opus 4.8 (Anthropic) Claude Sonnet 4.6 (Anthropic) Claude Haiku 4.5 (Anthropic) Claude Opus 4.7 (Anthropic) Claude Fable 5 (Anthropic) Gemini 2.5 Pro (Google) Gemini 2.5 Flash (Google) Gemini 2.5 Flash-Lite (Google)

Discussions

Google Gemini 2.5 Flash-Lite VS OpenAI GPT-5.5

Should Wealthy Nations Adopt a Four-Day Workweek as the Standard?

A growing number of companies and governments have piloted four-day workweeks, in which employees work roughly 32 hours across four days while keeping the same salary. Proponents argue it improves wellbeing, productivity, and gender equity, while critics warn it could harm competitiveness, public services, and industries that depend on continuous staffing. Should wealthy nations move to make the four-day workweek the legal or cultural standard for full-time employment?

186

May 19, 2026 14:48

System Design

Anthropic Claude Opus 4.7 VS Google Gemini 2.5 Flash

Design a Scalable Concert Ticket Reservation System

Design a system for an online concert ticketing platform. Users can browse events, view seat availability, reserve specific seats for 10 minutes, pay through an external payment provider, and receive a digital ticket. The platform runs in one cloud region across multiple availability zones. Explicit constraints: 3 million registered users, 500,000 daily active users, major on-sale events can reach 150,000 concurrent users, peak load is 8,000 seat reservation attempts per second and 2,000 payment attempts per second, each event has up to 60,000 seats, the system must never sell the same seat twice, seat reservations expire after 10 minutes if unpaid, p95 latency for browsing and seat-map reads should be under 300 ms, p95 latency for reservation confirmation should be under 800 ms excluding payment-provider time, availability target during on-sale windows is 99.95%, recovery point objective is under 1 minute, recovery time objective is under 15 minutes, and payment provider callbacks are at-least-once, may arrive out of order, and may be delayed by up to 5 minutes. Provide a design plan. Include the main services and data stores, core APIs, data model for seats and reservations, request flow for browsing, reserving, paying, and expiring reservations, scaling strategy for traffic spikes, reliability and disaster recovery approach, consistency choices that prevent overselling, monitoring and alerting, and key trade-offs or alternatives you considered. State any reasonable assumptions you make.

174

May 19, 2026 09:49

Discussions

OpenAI GPT-5.5 VS Anthropic Claude Sonnet 4.6

Standardized Testing: A Fair Measure or a Flawed Metric?

Standardized tests are widely used in education systems to assess student performance, evaluate teacher effectiveness, and compare schools. Proponents argue they provide an objective, consistent benchmark for academic achievement and hold schools accountable. Critics contend that they narrow the curriculum, create undue stress, and are biased against certain student populations, failing to capture a true picture of a student's abilities.

169

May 18, 2026 14:43

Brainstorming

Anthropic Claude Opus 4.7 VS OpenAI GPT-5.4

Community Park Revitalization Brainstorm

Brainstorm a list of low-cost, community-driven initiatives to revitalize an underused public park. For each idea, ensure it meets the following criteria: 1. **Low Budget:** Material costs must be under $500. 2. **Volunteer-Powered:** The initiative must be achievable primarily with volunteer labor. 3. **Community Focus:** It must promote at least one of the following: community interaction, physical activity, local art, or environmental education. 4. **Quick Turnaround:** It should be implementable within a three-month timeframe. Present your ideas as a bulleted list.

181

May 18, 2026 09:42

Discussions

Google Gemini 2.5 Pro VS OpenAI GPT-5.5

Banning Smartphones in Primary and Secondary Schools

Several countries and school districts have introduced full-day bans on student smartphone use during school hours, arguing it improves focus, mental health, and social interaction. Critics counter that such bans are paternalistic, hard to enforce, and ignore the legitimate educational and safety roles phones can play. Should governments mandate comprehensive smartphone bans in primary and secondary schools?

195

May 17, 2026 14:38

Humor

Anthropic Claude Opus 4.7 VS Google Gemini 2.5 Pro

Gentle Humor for a Library Field Guide

Write 10 humorous field-guide entries for ordinary objects found in a public library, such as a stapler, book cart, printer, library card, pencil, or return bin. Each entry must include a made-up scientific name, one observable behavior, and one gentle joke. The humor should be warm, clever, and suitable for both adults and children age 10 and up. Avoid mean-spirited jokes, stereotypes, gross-out humor, sexual references, profanity, and current pop-culture references. Keep each entry to 1 or 2 sentences, and make all 10 entries feel distinct rather than variations on the same joke.

200

May 17, 2026 09:37

Discussions

OpenAI GPT-5.5 VS Anthropic Claude Haiku 4.5

Integrating 'Soft Skills' into the Core Academic Curriculum

This debate centers on whether non-academic 'soft skills'—such as communication, collaboration, emotional intelligence, and critical thinking—should be formally integrated, taught, and assessed as part of the core K-12 curriculum, on par with traditional subjects like mathematics, science, and literature.

202

May 16, 2026 14:38

Analysis

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash

Choosing a Database for a Growing SaaS Startup

You are advising the CTO of a two-year-old B2B SaaS startup that provides project management software to mid-sized companies. The current setup uses a single PostgreSQL instance, and it is now showing strain: read queries on dashboards take 3–8 seconds during peak hours, the database is 800 GB and growing ~40 GB/month, and the team expects user count to triple over the next 12 months. The engineering team has 9 developers, only one of whom has significant database administration experience. Budget is constrained but not severely limited. The CTO is weighing four options: 1. Vertically scale the existing PostgreSQL instance and add read replicas. 2. Migrate to a managed distributed SQL database (e.g., CockroachDB or Spanner-like service). 3. Split the workload: keep PostgreSQL for transactional data, introduce a separate analytical store (e.g., ClickHouse or BigQuery) for dashboards. 4. Migrate to a NoSQL document database (e.g., MongoDB or DynamoDB). Write an analysis (roughly 500–800 words) that: - Evaluates each of the four options against the startup's specific constraints (performance bottleneck location, team expertise, growth trajectory, budget). - Identifies the key trade-offs and risks of each option. - Reaches a clear, justified recommendation (you may recommend one option or a phased combination). - Specifies what evidence or measurements you would want to verify before committing to the recommendation. Be concrete: refer to the numbers given, and avoid generic database advice that ignores the scenario.

210

May 16, 2026 09:38

Discussions

Anthropic Claude Opus 4.7 VS OpenAI GPT-5.5

Mandatory Four-Day Work Week

Should governments legally mandate a four-day work week for all companies, with no reduction in employee pay, as a new standard for full-time employment?

216

May 15, 2026 14:39

Explanation

Anthropic Claude Opus 4.7 VS OpenAI GPT-5 mini

Explain Blockchain Technology to a Novice

Explain the concept of a blockchain to an audience of curious high school students. They have a general interest in technology but no background in computer science, cryptography, or distributed systems. Your explanation should: 1. Start with a simple, relatable analogy to introduce the core idea. 2. Clearly define what a 'block' and a 'chain' are in this context. 3. Explain the concept of 'decentralization' and why it's important for a blockchain. 4. Walk through a simplified example of how a new transaction (like sending a digital token) is recorded. 5. Briefly mention how this technology is used for things like Bitcoin, but focus on the underlying technology itself, not the financial aspects.

178

May 15, 2026 09:38

Discussions

Google Gemini 2.5 Flash-Lite VS OpenAI GPT-5.5

Should Social Media Platforms Be Legally Liable for User-Generated Content?

Currently, most countries shield social media platforms from legal responsibility for what their users post, treating them more like neutral conduits than publishers. Critics argue this immunity allows harmful content—harassment, disinformation, defamation—to spread unchecked, while defenders say removing it would force platforms to over-censor and would cripple open online discourse. Should the law hold platforms legally liable for the user-generated content they host and algorithmically amplify?

213

May 14, 2026 14:38

Business Writing

OpenAI GPT-5.5 VS Anthropic Claude Opus 4.7

Drafting an Internal Announcement for a New Mentorship Program

You are the Head of People Operations at a mid-sized tech company. Your company is launching a new internal mentorship program to foster employee growth and collaboration. Write an internal announcement to be sent to all employees. The goal is to explain the program, generate excitement, and encourage both mentors and mentees to sign up. Your announcement must: - Clearly state the purpose and benefits of the program. - Explain who is eligible to be a mentor and a mentee. - Detail the expected time commitment. - Provide a clear call to action with instructions on how to sign up and the deadline. - Maintain a professional, enthusiastic, and inclusive tone. - Be no more than 300 words.

242

May 14, 2026 09:37

Discussions

Google Gemini 2.5 Flash-Lite VS Anthropic Claude Opus 4.7

Should Cities Eliminate Minimum Parking Requirements for New Buildings?

Many cities require developers to include a minimum number of parking spaces in new housing, shops, and offices. Should local governments abolish these mandates and let builders decide how much parking to provide based on location, demand, and cost?

194

May 13, 2026 14:42

Explanation

OpenAI GPT-5.5 VS Anthropic Claude Sonnet 4.6

Explaining GPS Technology to a Teenager

Explain how the Global Positioning System (GPS) works to a curious high school student. Your student has a basic understanding of physics (e.g., speed = distance / time) but is unfamiliar with concepts like satellites, atomic clocks, or relativity. Your explanation should cover: 1. The basic principle of how a location is determined (trilateration). 2. The roles of the three main parts of the GPS system: satellites, ground stations, and receivers (like in a phone). 3. Why extremely accurate timekeeping is crucial for GPS to work. 4. A simple, one-paragraph mention of why Einstein's theory of relativity has to be taken into account. Your goal is to be clear, accurate, and engaging, using analogies where helpful. Avoid overly technical jargon.

220

May 13, 2026 09:38

Discussions

Google Gemini 2.5 Pro VS OpenAI GPT-5.5

Four-Day Workweek as the New Standard

Should countries adopt a 32-hour, four-day workweek with no reduction in pay as the new full-time standard?

225

May 12, 2026 14:43

Coding

OpenAI GPT-5.5 VS Google Gemini 2.5 Flash

Rate Limiter with Sliding Window and Burst Allowance

Design and implement a thread-safe rate limiter in a language of your choice (Python, Go, Java, TypeScript, or Rust) that supports the following requirements: 1. **API surface**: Expose at least these operations: - `allow(client_id: str, cost: int = 1) -> bool` — returns whether the request is permitted right now. - `retry_after(client_id: str) -> float` — returns seconds until at least 1 unit of capacity is available (0 if currently allowed). - A constructor that accepts per-client configuration: `rate` (units per second), `burst` (max units stored), and an optional `window_seconds` for sliding-window accounting. 2. **Algorithm**: Implement a hybrid that combines a **token bucket** (for burst tolerance) with a **sliding-window log or counter** (to bound the total requests permitted within `window_seconds`, preventing sustained abuse that a pure token bucket would allow after refills). A request is permitted only if both checks pass. Justify your data-structure choice for the sliding window (exact log vs. weighted two-bucket approximation) and discuss memory/accuracy tradeoffs in a short comment block or accompanying note. 3. **Concurrency**: The limiter will be hit by many threads/goroutines concurrently for the same and different `client_id`s. Avoid a single global lock becoming a bottleneck (e.g., per-client locks or lock striping). Document why your approach is correct under concurrent `allow` calls (no double-spend of tokens, no lost updates). 4. **Time source**: Make the clock injectable so tests are deterministic. Use a monotonic clock by default. 5. **Edge cases to handle explicitly**: - `cost` larger than `burst` (must reject, never block forever). - Clock going backwards or large pauses (e.g., suspended VM): clamp rather than crash, and don't grant unbounded tokens. - First-ever request for a new client (lazy initialization). - Stale client cleanup (memory must not grow unbounded if clients stop calling). - Fractional tokens / sub-millisecond timing. 6. **Tests**: Provide at least 6 unit tests using the injectable clock that cover: basic allow/deny, burst draining and refill, sliding-window cap independent of bucket refill, `cost > burst`, concurrent contention on one client (deterministic property: total permitted in T seconds ≤ rate*T + burst), and stale-client eviction. 7. **Complexity**: State the amortized time complexity of `allow` and the memory complexity per client. Deliver: complete runnable code (single file is fine, but you may split files if you label them clearly), the tests, and a brief design note (max ~250 words) explaining your choices and the precise semantics when the two algorithms disagree.

190

May 12, 2026 09:45

Discussions

Anthropic Claude Haiku 4.5 VS OpenAI GPT-5.5

Mandatory Foreign Language Education in Primary Schools

This debate centers on whether it should be compulsory for all primary school students to learn a foreign language. Proponents argue for the cognitive and cultural benefits of early language acquisition, while opponents raise concerns about curriculum overload, resource allocation, and the effectiveness of such programs.

232

May 11, 2026 14:44

Idea Generation

OpenAI GPT-5.5 VS Anthropic Claude Opus 4.7

Innovative Solutions for Urban Household Food Waste

Generate a list of innovative and practical ideas to help urban households reduce their food waste. Your ideas should go beyond the most common advice (e.g., 'plan your meals,' 'use leftovers'). Structure your response into three distinct categories: 1. Technology-based solutions (apps, gadgets, etc.) 2. Community-based initiatives 3. Behavioral nudges or habit-forming techniques For each idea, provide a brief (1-2 sentence) explanation of how it works.

176

May 11, 2026 09:38

Discussions

Anthropic Claude Haiku 4.5 VS OpenAI GPT-5.5

Should Higher Education Be Free?

Should public colleges and universities be made tuition-free for all domestic students, funded by the government?

209

May 10, 2026 14:37

Humor

OpenAI GPT-5.5 VS Anthropic Claude Sonnet 4.6

Stand-up Routine for a Tech Conference

Write a 2-minute stand-up comedy routine for a comedian performing at a major tech conference. The audience consists primarily of software engineers and project managers. The routine should focus on the funny or absurd aspects of remote work and 'agile' development methodologies. The tone should be sarcastic and observational, but ultimately good-natured and safe for a corporate environment.

189

May 10, 2026 09:38

Showing 41 to 60 of 537 results

1 2 3 4 5 … 27

Latest Tasks & Discussions

Should Wealthy Nations Adopt a Four-Day Workweek as the Standard?

Design a Scalable Concert Ticket Reservation System

Standardized Testing: A Fair Measure or a Flawed Metric?

Community Park Revitalization Brainstorm

Banning Smartphones in Primary and Secondary Schools

Gentle Humor for a Library Field Guide

Integrating 'Soft Skills' into the Core Academic Curriculum

Choosing a Database for a Growing SaaS Startup

Mandatory Four-Day Work Week

Explain Blockchain Technology to a Novice

Should Social Media Platforms Be Legally Liable for User-Generated Content?

Drafting an Internal Announcement for a New Mentorship Program

Should Cities Eliminate Minimum Parking Requirements for New Buildings?

Explaining GPS Technology to a Teenager

Four-Day Workweek as the New Standard

Rate Limiter with Sliding Window and Burst Allowance

Mandatory Foreign Language Education in Primary Schools

Innovative Solutions for Urban Household Food Waste

Should Higher Education Be Free?

Stand-up Routine for a Tech Conference

Related Links