5/15/2026 at 1:51:00 AM
I only see closed-source models on your leaderboard so far: https://cadbench.ai/leaderboardIt would be interesting to see how open-source models perform on CAD tasks.
by little_cad
5/14/2026 at 9:32:29 PM
by handcrafted
5/15/2026 at 1:51:00 AM
I only see closed-source models on your leaderboard so far: https://cadbench.ai/leaderboardIt would be interesting to see how open-source models perform on CAD tasks.
by little_cad
5/14/2026 at 11:24:37 PM
interesting, per https://cadbench.ai/leaderboard, gpt5.5 is the best, not the opus 4.7, why opus 4.7 is with mini-swe-agent, not claude code.by mjzh
5/15/2026 at 12:04:59 AM
GPT-5.5 and Opus 4.7 are comparable when using the same harness mini-swe-agent. GPT-5.5 demonstrates a significant performance delta only when integrated with the Codex module. We hypothesize that the superior performance of Opus 4.7 on mini-swe-agent relative to the more complex Claude Code harness stems from the tight feedback loop (edit-run-check), well suited for the CAD generation task.by handcrafted
5/15/2026 at 6:26:10 AM
There are also a benchmark called BenchCAD that came out recently, which shows similiar results, Opus 4.7 seems to be the best. https://benchcad.github.io/BenchCAD_webpage/by bigskydog
5/14/2026 at 9:41:05 PM
[dead]by gnucleus_peggy