alt.hn

5/28/2026 at 1:10:39 AM

Finding Miscompiles for Fun, Not Profit

https://newsletter.semianalysis.com/p/finding-miscompiles-for-fun-not-profit

by tmoertel

5/29/2026 at 11:45:06 PM

Given the $10k price tag for tokens and high rate of bugs (several per minute) they mention, it'd be very interesting to see this experiment run with cheaper models too.

I wonder if we get to a world where a full repo sweep like this is a default Github action after commit.

by mNovak

5/30/2026 at 1:37:48 AM

Most C/C++ projects I know don't even run tests with ASan/TSan/UBSan before each commit/merge.

by 1over137

5/30/2026 at 2:42:29 AM

and in the meantime, just a sweep of the committed code (or the to-be-committed code for lots of us) and the code it interacts with, is increasingly catching lots of problems.

by Quarrel

5/30/2026 at 7:37:35 AM

Boy, I told DeepSeek V4 Flash Free to find compiler bugs, esp. with the gcc torture test suite, and it did find plenty. For free. To fix them in my compiler it did cost about $40 or so. Cooperate guys just have too much budget in their hands to throw at the rich.

by rurban

5/30/2026 at 12:47:33 AM

Author here; I'm happy to answer questions, take criticism, etc etc.

by jlebar

5/30/2026 at 3:06:57 AM

Thank you for posting this.

I had heard LLMs were finding a lot of bugs very quickly and now I can see what that looks like from a user perspective.

by ebiederm

5/30/2026 at 2:43:25 AM

> Codex and I collaboratively wrote a fuzzer.

Why are you using phrasing that equates AI and humans? You used Codex to write a fuzzer. It didn't decide to join you.

by eqvinox

5/30/2026 at 7:02:26 AM

Why are you using phrasing that equates AI and humans? Codex isn't in a position to decide whether to do work.

by derdi

5/30/2026 at 10:30:22 AM

I wonder how much damage all those countless bugs caused in real life.

Does MRI low level code produce wrong images? Do some kind of unexpected http connection quirks happen? Does (LL)M inference produce randomly wrong and non reproduceable output? Graphical artifacts in video games? Application crashes that happen once every billionth request? Security vulnerabilities? Race conditions?

by Traubenfuchs

5/30/2026 at 2:10:19 AM

[flagged]

by jalospinoso