alt.hn

1/31/2026 at 3:48:37 PM

Browser Agent Benchmark: Comparing LLM models for web automation

by MagMueller

2/1/2026 at 5:45:36 AM

Since we're in this topic, can anyone suggest good AI-based tool for exploratory (fuzzy?) web testing?

by wiradikusuma

2/1/2026 at 5:24:52 AM

It's lacking the best model (Opus 4.5) on the benchmark tho.

by pixel_popping

2/1/2026 at 3:13:26 PM

Yeah but then their own product might not score the highest.

by djohnston

2/2/2026 at 1:43:26 PM

Exactly why I'm pointing it out, which feels a bit corrupt, but understandable.

by pixel_popping

2/2/2026 at 3:31:24 PM

tbh i was a bit cranky yesterday - even if they are #2 on a legit benchmark that would be impressive

by djohnston

1/31/2026 at 3:54:55 PM

[dead]

by MagMueller