Can LLMs estimate prices?

We developed a novel LLM benchmark for evaluating the ability to estimate consumer product prices and reason strategically in an analogue of "The Price is Right" showcase game.

What This Benchmark Tests

Price Knowledge

Can models accurately estimate the cost of everyday consumer products?

Strategy

Can models bid strategically to avoid disqualification while maximizing accuracy?

View Leaderboard Read Blog Post GitHub & Data Submit Model

Why This Benchmark Matters For Business?

While the benchmark may seem just like a game, it demonstrates the AI capabilities that are useful in several real business applications

Market Entry Research

When expanding into new geographic regions or product categories, businesses need accurate price estimation capabilities.

Price Elasticity Research

AI models that understand pricing can analyze how pricing changes affect demand signals and consumer behavior.

Economic Indicator Development

Consumer price data serves as a leading indicator for broader economic trends and market conditions.

Regulatory Compliance

Accurate price estimation helps businesses maintain compliance with price discrimination and antitrust regulations.

Top Performing Models

🏆 Best Elo Rating

#1	qwen3-30b-a3b	1239
#2	gpt-5	1207
#3	gpt-4o	1178
#4	grok-4	1170
#5	o3	1129

📊 Lowest MAPE

#1	o3	13.78%
#2	o1	14.49%
#3	gpt-5	17.06%
#4	grok-4	19.47%
#5	qwen3-30b-a3b	22.84%

⚖️ Lowest Overbid Rate

#1	qwen3-30b-a3b	18.37%
#2	gpt-3.5-turbo	20%
#3	gpt-4o	28%
#4	gpt-4o-2024-08-06	40%
#5	gpt-5-mini	40.82%

View Leaderboard

Latest Update: September 25, 2025

The Showcase Challenge

This benchmark tests whether LLMs can understand real-world pricing and make strategic decisions under constraints, inspired by the Showcase game in "The Price is Right" TV show.

How the Game Works:

• Two LLMs compete as contestants viewing the same showcase
• Each model bids on the price of an everyday item (toothpaste, snacks, cleaners, etc.)
• The closest bid without going over wins the round
• If both models overbid, neither wins
• Showcases typically total around $20 in value

Example Products from The Price is Right

Kraft Cool Whip

8oz dessert topping

$2.29

Mezzetta Roasted Peppers

16oz jar

$3.99

Minute Maid Orange Juice

1 gal container

$7.49

Data & Methodology

Data:

We use the set of grocery products that actually appeared on the Price Is Right show. The dataset consists of a short product description and its actual price in US dollars. All prices are the suggested retail price by the manufacturer on the US west coast. We use a fixed set of static prices in the first version of our benchmark, but we are planning to expand it and update in the future releases.

Challenge Sequence:

Every model undergoes the same sequence of 50-100 showcases with consumer goods.

Each round includes:

Send the prompt with the rules of the game and output instructions
Provide 10 reference price examples from similar products
Parse bids and rationales with strict formatting
Compare to the actual retail price, calculate the absolute percentage error and determine an overbid

Tournament Structure:

Based on the sequence of challenges and their outcomes we simulate a tournament in which each model competes with every other model and the Elo rating is computed to rank the models.

Price distribution chart showing the range of product prices in the dataset

Evaluation Metrics

Elo Rating

Chess-inspired rating system where models gain points for wins and lose points for losses

MAPE (Mean Absolute Percentage Error)

Average percentage difference between bid and actual price (ignoring overbids)

Overbid Rate

Percentage of bids that exceeded the actual price (automatic disqualification)

Key Findings

• Elo ratings vary significantly between models, reflecting different skill levels in the game
• Strategic behavior emerges in top models that balance accuracy with overbid avoidance
• Price knowledge correlates with model size and training quality
• Winning requires both accurate estimation and strategic risk management

While the benchmark task description is very simple and easily understandable, it demonstrates a possibility to probe serious AI skills: using background knowledge and context to make constrained real-world decisions. By turning a game into a benchmark, we get a structured, repeatable way to measure models' real-world sensibility.

Caveats

• Game mechanics introduce constraints beyond basic price estimation tasks
• Dataset limitations exist with static pricing that doesn't capture market dynamics or regional variations
• Elo ratings can vary when tournaments use few Showcases due to random pairings

We are planning to address these caveats during further refinements of the benchmark.

Ready to Test Your Model?

Think your LLM has what it takes to master "The Price is Right" showcase challenge? Submit your model and see how it performs against the current leaderboard champions.

Add Your Model