alt.hn

6/28/2026 at 10:32:04 PM

Knowledge Distillation of Black-Box Large Language Models (2024)

https://arxiv.org/abs/2401.07013

by babelfish

6/29/2026 at 2:08:39 PM

probably more interesting (from 01/2026) https://arxiv.org/pdf/2511.10643 "Black-Box On-Policy Distillation of Large Language Models". they got a qwen 2.5 14B model trained to GPT5 level using the described technique "Generative Adversarial Distillation (GAD)".

by potus_kushner

6/29/2026 at 9:13:44 AM

Considering the very small difference between just SFT on the student model as compared to SFT + DPO on a proxy, doesn't it make sense to concentrate on ensuring the SFT dataset is perfect rather than sorry about DPO etc? And just train directly on the student model?

by phantompeace

6/29/2026 at 12:53:05 AM

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

Related paper that's a good read: https://arxiv.org/abs/1908.08962

by dmezzetti

6/28/2026 at 11:02:48 PM

Why is this published again? Is this a reference to recent events?

by Alifatisk

6/29/2026 at 12:04:15 AM

I just saw some post about it on Threads and found it interesting so decided to share!

by babelfish

6/29/2026 at 4:25:56 AM

My best guess is this is a reference to the recent accusations from Anthropic of chinese labs ¨distilling¨ on their models

by tough

6/29/2026 at 10:14:42 AM

And it’s a paper from Alibaba researchers, the company/lab that Anthropic called out by name.

by swingboy

6/29/2026 at 1:25:38 PM

I do not find the Anthropic allegations believable.

All the results presented in these distillation papers are for very small models.

In order to gain anything, Alibaba or others would need today to use the Anthropic models to improve LLMs at least one hundred times bigger than those tested in these papers.

I assume that the number of queries to the teacher LLM grows superlinearly with the size of the student model, which would mean that billions of queries would be needed. Even for a linear growth, at least hundreds of millions of queries would be needed.

I do not see how any Claude account could do so many queries without being detected. Even if the queries would be distributed over thousands of accounts, it would still be easy for Anthropic to stop any such attempts.

by adrian_b

6/29/2026 at 10:34:38 AM

“Relevant to anyone building failure-attribution systems for agent pipelines — black-box distillation techniques here could feed into causal attribution models without needing white-box access to the underlying model.”

by StreamCtx

6/29/2026 at 1:31:05 PM

That is easy when you can control the teacher model yourself and you want to transfer its capabilities to a smaller model.

If the teacher model is run by an external entity, e.g. Anthropic or OpenAI, then the number of queries to the blackbox model that is required is so great that it should be easy for the owner of the teacher LLM to detect and stop any such attempts.

by adrian_b

6/28/2026 at 10:53:54 PM

The Chinese are really going strong on destroying the American AI economy bubble. Honestly, despite the fact that I'm totally pro USA and anti China, I think we should help them crashing the American AI bubble. They are controlling everything and we can't even buy a new computer nowadays while getting no benefit from this. I wish some influential programmers stimulated coders everywhere to skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

by duendefm

6/29/2026 at 1:41:06 PM

If we programmers united we had a clouded code alternative that didn't suck :-)

But we're not far.

My requirements: - a terminal app without advanced tui, not written like "a browser running in a terminal" or a game. There is no need to overcomplicated. - ability to manage prompts per model, compress context using alternate models, and minimise token costs better - like the YouTube's Sentdex's Minion mini harness (in fact I'm building on top of his as we speak). - support for agent work fanout - support for MCP, but switchable off/on depending if needed (I use a single MCP aggregator anyway so mcp tool use doesn't eat my context) - support for lsp/tree-sitter, again switchable when needed. - support for OpenAI api and written easily enough so other ones like deepinfra are easy to add.

Nice to have: - have some sort "prompt library" that would store tweaked versions of prompts for different models so it adjusted the harness as needed depending on which model we call.

That's it.

by Roark66

6/29/2026 at 10:08:12 AM

The US government will do the job of destroying the American AI economy through their export controls.

by laichzeit0

6/29/2026 at 3:09:01 PM

"anti China", why so? have you lived there?

by addedGone

6/29/2026 at 8:21:53 AM

The US "product machine" is so strong. They really know how to do frictionless signup and vendor lock-in on the corporate side.

by anax32

6/28/2026 at 11:18:08 PM

> skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

I'm doing my part!

by nozzlegear

6/28/2026 at 11:07:01 PM

[flagged]

by cynicalsecurity

6/29/2026 at 12:03:41 AM

Dario, is that you? Is Anthropic’s next ploy to seek support via the culture wars?

by anon373839

6/29/2026 at 2:59:57 AM

Why would I care about Christian morals? In fact from what I can see of the US, you don’t have them either.

by girvo

6/28/2026 at 11:12:45 PM

Nvidia, Anthropic and OpenAI are controlling everything, and nothing is improving for everyone, quite the opposite. So I just hope they crash to the ground.

by duendefm

6/28/2026 at 11:11:52 PM

lol Christian Morals. Epstein and his best buddy running the show tells you all about this

by gmerc

6/29/2026 at 12:02:23 AM

What Epstein and buddies were doing was very... Christian...

Virgin Mary was very young in the events you know.

by big-and-small

6/28/2026 at 11:48:14 PM

"They don't have Christian morals" -- does that mean they don't commit genocide and fuck kids? Because that sounds like a point for them

by LNSY

6/28/2026 at 10:58:51 PM

Can we note that this is a 2024 paper in the title?

by linolevan

6/29/2026 at 1:15:00 PM

[dead]

by spacebacon

6/29/2026 at 2:59:50 AM

[dead]

by TimXare

6/29/2026 at 7:22:52 AM

[flagged]

by modgate