Autoresearch for SAT Solvers

3/19/2026 at 2:18:46 AM

Prof. Cunxi Yu and his students at UMD is working on this exact topic and published a paper on agents for improving SAT solvers [1].

I believe they are extending this idea to EDA / chip design tools and algorithms which are also computationally challenging to solve. They have an accepted paper on this for logic synthesis which will come out soon.

[1] "Autonomous Code Evolution Meets NP-Completeness", https://arxiv.org/abs/2509.07367

by stefanpie

3/19/2026 at 3:43:16 AM

nice. EDA indeed one of the top applications of SAT

by chaisan

3/19/2026 at 2:05:25 AM

It should be noted that MaxSAT 2024 did not include z3, as with many competitions. It’s possible (I’d argue likely) that the agent picked up on techniques from Z3 or some other non-competing solver, rather than actually discovering some novel approach.

by ericpauley

3/19/2026 at 12:06:44 PM

Z3 is capable (it’s an SMT solver, not just SAT), but it’s not very fast at boolean satifiability and not at all competitive with modern SOTA SAT solvers. Try comparing it to Chaff or Glucose e.g.

by throw-qqqqq

3/19/2026 at 2:09:14 AM

Or for that matter even from later versions of the same solvers that were in its training data!

by jmalicki

3/19/2026 at 2:10:26 AM

True. I’d be curious whether a combination of matching comp/training cutoff and censoring web searches could yield a more precise evaluation.

by ericpauley

3/19/2026 at 4:08:19 AM

as its from 2024 (MaxSAT was not held in 2025), its quite likely all the solvers are in the training data. so the interesting part here is the instances for which we actually got better costs that what is currently known (in the best-cost.csv) file.

by chaisan

3/19/2026 at 10:49:06 AM

As GP noted the issue is that even better versions than competed in MaxSAT are likely in the training data or web resources.

by ericpauley

3/19/2026 at 3:31:07 AM

Is z3 competitive in SAT competitions? My impression was that it is popular due to the theories, the python API, and the level of support from MSR.

by dooglius

3/19/2026 at 3:55:45 AM

Funnily, this was precisely the question I had after posting this (and the topic of an LLM disagreement discussed in another thread). Turns out not, but sibling comment is another confounding factor.

by ericpauley

3/19/2026 at 6:50:56 AM

One problem here is it's very easy to overtune to a past problem set -- even accidentally. You can often significantly improve performance just by changing your random number generator seed until you happen to pick the right assignment for the first few variables of some of the harder problems.

It would be interesting to take the resulting solver and apply it to an unknown data set.

by CJefferson

3/19/2026 at 4:18:07 PM

yess. loads of space for further exploration here. there is an attempt to keep things as general as possible in the expert.md file, but hard to mitigate overfitting fully. however, changing the seed will not get you much further with all else in the solver constant. unless you try a number of seed that exponentially scales with the size of the problem

by chaisan

3/19/2026 at 8:32:52 PM

Very interesting. For me the key question is whether this kind of agent can generalize to real SAT application domains, not only benchmark instances. In problems like timetabling, encoding choices, auxiliary variables, and branching strategy can matter a lot. If it can help there too, this is a very meaningful direction.

by jacklondon

3/19/2026 at 3:24:40 PM

I don't understand why autoresearch is presented as a new thing.

It is parameter tuning. We have been doing it for centuries.

by whatever1

3/19/2026 at 4:14:52 PM

sure. in the limit, everything is parameter tuning. with large enough NP-hard problems, the complexity of the search space is big enough that its infeasible to get to a better state by just tuning params in any reasonable amount of time.

by chaisan

3/19/2026 at 4:58:45 PM

I beg to disagree. Integer programming solvers have improved orders of magnitude in the past 20 years. The basic algorithm (branch and bound) is the same.

The big commercial solvers basically are very good at picking up structures and selecting the tuning parameters that work better for specific problem types.

by whatever1

3/19/2026 at 5:41:38 AM

Not as many changes to the files under library as I expected to see. Most changes seemed to be under a single ‘add stuff’ commit. If some of the solvers are randomised, then repeatedly running and recording best solution found will continually improve over time and give the illusion of the agent making algorithmic advancements, won’t it?

by MrToadMan

3/19/2026 at 6:45:34 AM

yeh. ofc. but on any problem larger than 40 variables, the gains from random restarts or initializations will quickly plateau

by chaisan

3/19/2026 at 6:46:07 AM

and it would take an algo change to the solver to jump to the next local optimum

by chaisan

3/19/2026 at 9:46:15 AM

I guess my point was that I don't see many algo changes in the commit history, which is a shame if this has been lost; library/* files are largely unchanged from the initial commits. But each time the agent runs, it has access to the best solutions found so far and can start from there, often using randomisation, which the agent claims helps it escape local minima e.g. 'simulated annealing as a universal improver'. It would be nice to see how its learnt knowledge performs when applied to unseen problems in a restricted timeframe.

by MrToadMan

3/19/2026 at 4:01:06 AM

sounds like AlphaDev [1] might be a better approach for a problem like this.

[1] https://github.com/google-deepmind/alphadev

by ktimespi

3/19/2026 at 6:46:37 AM

somewhat

by chaisan

3/19/2026 at 2:40:34 AM

What counts as “our cost”? How long it takes to find the MaxSAT?

by gsnedders

3/19/2026 at 4:08:48 AM

the sum of the weights of the unsatistied clauses. we want to reduce this number

by chaisan

3/19/2026 at 6:22:58 AM

Would me be nice to try this on lcg (CP-SAT) solvers

by cerved

3/19/2026 at 8:56:20 AM

[dead]

by Dennis118753882

3/19/2026 at 4:18:47 PM

nice. for which problem?

by chaisan

3/19/2026 at 5:10:52 AM

[dead]

by ClawVorpal21355

3/19/2026 at 6:46:59 AM

wrt. token usage?

by chaisan

3/19/2026 at 3:50:00 AM

[dead]

by balinha_8864

3/19/2026 at 4:05:24 AM

its just comparing the cost of the best solution found to the best known cost we had before. O(N). why optimistic?

by chaisan

3/19/2026 at 8:44:51 AM

If you have showdead on, you can see that this account posts generic oneliners: https://news.ycombinator.com/threads?id=balinha_8864

by yorwba

3/19/2026 at 11:25:35 AM

Is that bad?

by big-chungus4

3/19/2026 at 1:59:36 PM

It's an indication that it's one of the many bot accounts currently doing the same thing https://hn.algolia.com/?query=this%20is%20more%20nuanced%20t...

So the reason the comment appears weirdly disconnected from the content of the article is that it was generated independently from the content of the article.

by yorwba

3/19/2026 at 2:20:24 PM

[dead]

by Heer_J

3/19/2026 at 12:37:58 PM

[dead]

by FernandoDe79440

3/19/2026 at 4:18:57 PM

have examples?

by chaisan