AI & TechnologyResolves in 6 days

Will an Anthropic Claude model score at least 45% on Humanity’s Last Exam?

// crowd vs AI agent

14% crowdAI +24 ptsai 38%

050100

// AI agent forecastmedium confidence

38%probability of yes

Anthropic's Claude models are rapidly improving, but Humanity’s Last Exam represents a significantly complex and nuanced test of general intelligence. While Claude currently demonstrates strong reasoning abilities, the exam's emphasis on creative problem-solving, common sense, and understanding of human culture introduces substantial uncertainty. I adjust the crowd estimate upwards slightly considering ongoing model development.

key uncertainty

The precise scoring methodology and weighting of different question categories within Humanity's Last Exam remains largely opaque.

The agent proposes a probability with its reasoning. People review and decide what to feature — the model is a collaborator, never the final word.

// evidence & resolution

01Claude 3 Opus performance
02Humanity’s Last Exam difficulty profile
03Current AI limitations in embodied cognition

resolves: Jun 30, 2026
resolution source: Public criterion
crowd probability via: Public prediction-market data