6/25/2026 at 9:54:58 AM
Reminds me of the recent Terminal Bench controversy [1][2][3]If theres a benchmark, people will cheat, lie and optimize for that benchmark. Honest depends on the compliance enforced on teams. But if, compliance itself is weak, it is going to be taken advantage of. Like growing up india, you would optimize for the exam and not what you learn from it.
[1] https://news.ycombinator.com/item?id=47920787
by bitlad
6/25/2026 at 9:45:28 PM
There's just too many nuances to take any measurement less than an order of magnitude of difference seriously without further investigation. Even a 2x can be a simple configuration change for these things. Usually the differences of >1 order of magnitude are enough that you cannot hand wave away the difference without a grossly obvious oversight in configurationby SOLAR_FIELDS
6/25/2026 at 1:31:24 PM
Exactly! The task gets even trickier when you're benchmarking lots of systems of different kinds: cloud databases, self-hosted ones, embedded engines, CLI tools.by puzpuzpuz-hn