alt.hn

3/2/2026 at 12:57:40 AM

Show HN: Timber – Ollama for classical ML models, 336x faster than Python

https://github.com/kossisoroyce/timber

by kossisoroyce

3/2/2026 at 3:41:03 AM

Since generative AI exploded, it's all anyone talks about. But traditional ML still covers a vast space in real-world production systems. I don't need this tool right now, but glad to see work in this area.

by tl2do

3/2/2026 at 7:35:00 AM

A nice way to use traditional ML models today is to do feature extraction with a LLM and classification on top with trad ML model. Why? because this way you can tune your own decision boundary, and piggy back on features from a generic LLM to power the classifier.

For example CV triage, you use a LLM with a rubric to extract features, choosing the features you are going to rely on does a lot of work here. Then collect a few hundred examples, label them (accept/reject) and train your trad ML model on top, it will not have the LLM biases.

You can probably use any LLM for feature preparation, and retrain the small model in seconds as new data is added. A coding agent can write its own small-model-as-a-tool on the fly and use it in the same session.

by visarga

3/2/2026 at 9:21:55 AM

What do you mean by "feature extraction with an LLM?". I can get this for text based data, but would you do that on numeric data? Seems like there are better tools you could use for auto-ML in that sphere?

Unless by LLM feature extraction you mean something like "have claude code write some preprocessing pipeline"?

by benrutter

3/3/2026 at 4:43:40 PM

It's for unstructured inputs, text and images, where you need to extract specific features such as education level, experience with various technologies and tasks. The trick is to choose those features that actually matter for your company, and build a classifier on top so the decision is also calibrated by your own triage policy with a small training/test set. It works with few examples because it just needs a small classifier with few parameters to learn.

by visarga

3/2/2026 at 10:26:46 AM

Isn't the whole point for it to learn what features to extract?

by mirsadm

3/2/2026 at 3:28:02 AM

Ollama is quite a bad example here. Despite popular, it's a simple wrapper and more and more pushed by the app it wraps llama.cpp.

Don't understand here the parallel.

by mehdibl

3/2/2026 at 9:44:36 AM

TBVH I didn't think about naming it too much. I defaulted to Ollama because of the perceive simplicity and I wanted that same perceived simplicity to help adoption.

by kossisoroyce

3/2/2026 at 8:47:37 AM

This is the vLLM of classic ML, not Ollama.

by eleventyseven

3/2/2026 at 6:11:38 AM

I guess the parallel is "Ollama serve" which provides you with a direct REST API to interact with a LLM.

by ekianjo

3/2/2026 at 8:47:46 AM

llama-cpp provides an API server as well via llama-server (and a competent webgui too).

by sieve

3/2/2026 at 5:28:35 AM

Can you tell us more about the motivation for this project? I'm very curious if it was driven by a specific use case.

I know there are specialized trading firms that have implemented projects like this, but most industry workflows I know of still involve data pipelines with scientists doing intermediate data transformations before they feed them into these models. Even the c-backed libraries like numpy/pandas still explicitly depend on the cpython API and can't be compiled away, and this data feed step tends to be the bottleneck in my experience.

That isn't to say this isn't a worthy project - I've explored similar initiatives myself - but my conclusion was that unless your data source is pre-configured to feed directly into your specific model without any intermediate transformation steps, optimizing the inference time has marginal benefit in the overall pipeline. I lament this as an engineer that loves making things go fast but has to work with scientists that love the convenience of jupyter notebooks and the APIs of numpy/pandas.

by o10449366

3/2/2026 at 9:34:09 AM

The motivation was edge and latency-critical use cases on a product I consulted on. Feature vectors arrived pre-formed and a Python runtime in the hot path wass a non-starter. You're right that for most pipelines the transformation step is the bottleneck, not inference, and Timber doesn't solve that (though the Pipeline Fusion pass compiles sklearn scalers away entirely if your preprocessing is that simple). Timber is explicitly a tool for deployments where you've already solved the data plumbing and the model call itself is what's left to optimize.

by kossisoroyce

3/2/2026 at 3:50:48 AM

"classical ML" models typically have a more narrow range of applicability. in my mind the value of ollama is that you can easily download and swap-out different models with the same API. many of the models will be roughly interchangeable with tradeoffs you can compute.

if you're working on a fraud problem an open-source fraud model will probably be useless (if it even could exist). and if you own the entire training to inference pipeline i'm not sure what this offers? i guess you can easily swap the backends? maybe for ensembling?

by brokensegue

3/2/2026 at 4:29:55 PM

> if you own the entire training to inference pipeline i'm not sure what this offers

336x faster than Python, and swapping backends in a production environment can be far from trivial

by eleventyseven

3/2/2026 at 7:35:23 AM

Wouldn't it be much more useful if the request received raw input (i.e. before feature extraction), and not the feature vector?

by theanonymousone

3/2/2026 at 9:01:09 AM

You can do that with Onnx. You can graft the preprocessing layers to the actual model [1] and then serve that. Honestly, I already thought that ONNX (CPU at least) was already low level code and already very optimized.

@Author - if you see this is it possible to add comparisons (ie "vanilla" inference latencies vs timber)?

[1] https://gist.github.com/msteiner-google/5f03534b0df58d32abcc... <-- A gist I put together in the past that goes from PyTorch to ONNX and grafts the preprocessing layers to the model, so you can pass the raw input.

by marcyb5st

3/2/2026 at 9:38:42 AM

I'll check this out as soon as I am at my desk.

by kossisoroyce

3/2/2026 at 4:00:36 AM

If the focus is performance, why use a separate process and have to deal with data serialization overhead?

Why not a typical shared library that can be loaded in python, R, Julia, etc., and run on large data sets without even a memory copy?

by rudhdb773b

3/2/2026 at 6:15:58 AM

This lets you not even need Python, r, Julia, etc but directly connect to your backend systems that are presumably in a fast language. If Python is in your call stack then you already don’t care about absolute performance.

by bob001

3/2/2026 at 9:35:44 AM

I owe you a beer!

by kossisoroyce

3/2/2026 at 4:06:59 AM

Perhaps because the performance is good enough and this approach is much simpler and portable than shared libraries across platforms.

by sriram_malhar

3/2/2026 at 9:36:51 AM

Exactly. The objective is to abstract away completely. Shared libraries just add too much overhead.

by kossisoroyce

3/2/2026 at 3:22:04 AM

Can’t check it out yet, but the concept alone sounds great. Thank you for sharing.

by Dansvidania

3/2/2026 at 9:34:58 AM

You're welcome!

by kossisoroyce

3/2/2026 at 2:07:47 PM

Does this use something like xnnpack under the hood?

by deepsquirrelnet

3/2/2026 at 6:41:21 AM

Nice idea, i needed something like it

by palashkulsh

3/3/2026 at 3:39:48 AM

inference is not usually slow for classical ML (save lightgbm)

by bbstats

3/2/2026 at 2:40:27 AM

I have been waiting for this! Nice

by jnstrdm05

3/2/2026 at 9:21:39 AM

Glad you got it just in time!

by kossisoroyce

3/4/2026 at 10:20:30 PM

[dead]

by STARGA

3/2/2026 at 7:24:57 AM

It would be safer to use a Zig or Rust or Nim target. C risks memory-unsafe behavior. The risk profile is even bigger for vibe-coded implementations.

by OutOfHere

3/2/2026 at 9:26:55 AM

Fair point in general, but the risk profile here is actually quite low. The generated C is purely computational, with no heap allocation, no pointer arithmetic, no user-controlled memory, no IO. It's essentially a deeply nested if/else tree over a fixed-size float array. The "unsafe" surface in C is largely a non-issue when the code is statically shaped at compile time from a deterministic compiler pass.

Rust/Zig/Nim would add toolchain complexity with minimal safety gain for this specific output shape. Those were my considerations.

by kossisoroyce

3/2/2026 at 2:43:24 PM

> Rust/Zig/Nim would add toolchain complexity

Fair response in general, but Zig is well known to lower toolchain complexity, not add it.

by OutOfHere

3/3/2026 at 6:36:18 AM

Unless you already have a working C compiler toolchain, but not Zig, which describes... most Unix systems I would guess.

by reverius42