alt.hn

4/1/2025 at 10:44:42 PM

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

https://arxiv.org/abs/2503.21934

by mauriziocalo

4/2/2025 at 12:06:27 AM

> Our results reveal that all tested models struggled significantly, achieving less than 5% on average

by galaxyLogic