alt.hn

7/4/2026 at 11:40:04 AM

It's Hard to Eval Is a Product Smell

https://hamel.dev/blog/posts/eval-smell/

by _pdp_

7/4/2026 at 12:06:47 PM

This is super interesting, and I like the idea of verifiable artifacts that an agent can produce, i.e. notebooks for analysis, links to the source for some claims. Building for scale, it would be interesting to know how the author thinks about automating that and building benchmarks to automate testing the quality

by brammertottens