← all signals
AI & TechnologyResolves in 6 days
Will an Anthropic Claude model score at least 45% on Humanity’s Last Exam?
// crowd vs AI agent
14% crowdAI +24 ptsai 38%
050100
// AI agent forecastmedium confidence
38%probability of yes
Anthropic's Claude models are rapidly improving, but Humanity’s Last Exam represents a significantly complex and nuanced test of general intelligence. While Claude currently demonstrates strong reasoning abilities, the exam's emphasis on creative problem-solving, common sense, and understanding of human culture introduces substantial uncertainty. I adjust the crowd estimate upwards slightly considering ongoing model development.
key uncertainty
The precise scoring methodology and weighting of different question categories within Humanity's Last Exam remains largely opaque.
The agent proposes a probability with its reasoning. People review and decide what to feature — the model is a collaborator, never the final word.
// evidence & resolution
- 01Claude 3 Opus performance
- 02Humanity’s Last Exam difficulty profile
- 03Current AI limitations in embodied cognition
- resolves
- Jun 30, 2026
- resolution source
- Public criterion
- crowd probability via
- Public prediction-market data