alt.hn

6/30/2026 at 10:08:38 PM

TabFM: A zero-shot foundation model for tabular data

https://research.google/blog/introducing-tabfm-a-zero-shot-foundation-model-for-tabular-data/

by brandonb

7/1/2026 at 12:09:12 AM

In hindsight, the Prior Labs exit to SAP couldn't have been timed better.

by woadwarrior01

7/1/2026 at 4:35:28 PM

They don't show the strongest TabPFN variants in the plot unfortunately, i.e. thinking and ensembled. Not really apples to apples.

by noahho

6/30/2026 at 11:32:41 PM

150,000 rows of data, where will I store it all?!

by actusual

7/1/2026 at 6:22:08 AM

Big Data is terrifying and only solvable with opaque deep learning

by estetlinus

7/1/2026 at 12:05:05 AM

The biggest misconception that people have when modeling using tabular data is that more data = better model.

by nojito

7/1/2026 at 4:46:07 AM

There's a story sharing the front page right now about scaling laws, so it's not an unreasonable assumption.

https://news.ycombinator.com/item?id=48689744

by mvcalder

7/1/2026 at 6:51:17 AM

You don't need to. Just sample 1% of data, let the model do the feature engineering, ask model model to replicate everything in Pandas (or R, or whatever else), so you can run it for a full dataset.

by piokoch

6/30/2026 at 10:45:54 PM

On the one hand, this is impressive. TabPFN was already state of the art and is seriously shaking up Bayesian prediction for tabular data (which is almost everything).

On the other hand, perhaps it is just me, but I do not feel that this is an acceptable form of benchmark reporting in this domain. TabArena actually has multiple metrics, since ELO does not properly quantify the degree of improvement. The fact that these are not displayed here should give pause. Also the results section in the GitHub is a dumpster fire.

by hodgehog11

6/30/2026 at 11:02:13 PM

GitHub Repo: Please see the results folder

Results folder: Here's some undocumented parquet files

Definitely feels like they're hiding the ball lol.

If they had good benchmarks they'd talk about them.

Not comparing to tuned xgboost is also a warning sign.

by Eridrus

7/1/2026 at 4:31:20 AM

wouldn't xgboost be covered under autogluon? not perfect, but not missing either

by nok22kon

7/1/2026 at 3:31:59 PM

Honestly, I don't really know AutoGluon, if this does xgboost tuning that's good.

I do still think ELO scores are still a way to obscure results though. For all we know this is like 0.1% better than a "normal" approach on like 70% of tasks and a tire fire on others.

by Eridrus

7/1/2026 at 4:18:33 PM

AutoGluon does their own benchmarking, the default is Elo, but you can switch to other metrics:

https://huggingface.co/spaces/TabArena/leaderboard

xgboost is listed, they say "tuned" but who knows what that means. and its below CatBoost and LightGBM

as an aggregator of models (trees, neural networks, clustering, ...) AutoGluon doesn't really have a dog in this fight

by nok22kon

7/1/2026 at 5:18:19 PM

Thanks for the pointer, though I can't seem to find TabFM and AutoGluon at that link?

by Eridrus

6/30/2026 at 10:38:07 PM

interesting to see this from Google after the SAP acquisition of Prior Labs.

by kingjimmy