2/23/2026 at 7:03:14 PM
Just reading the headline, I say good.A) These models are trained by ignoring IP. It is hypocritical and absurd to then try to assert IP over them. And I am for the destruction of IP on all ends.
B) What this essentially means is that the Chinese labs are taking the work of these mega corporations into making it freely accessible to other labs and businesses, to serve inference, fine tune, and host privately on prem. That's clearly a good thing for competition in the market as a whole.
C) I don't see why we should have to duplicate the massive energy and infrastructure investment of building foundation models over and over forever just because we want to preserve the IP rights of a few companies. That seems a shame and it seems better to me for everything to learn from everything else for the whole ecosystem to get better by topping each other and building off each other; that's also why publishing research into the architecture and training of these models is so much better than what the proprietary labs do (keeping everything a secret), although tbf Anthropic's interpretability research is cool.
D) these Chinese models give 90% of the performance of frontier proprietary models at a 10th or 20th of the cost. That seems like a win for everyone. Not to mention the fact that this distilling also allows them to make much smaller local models that everyone can run. This is a win for actual democratization, decentralization, and accessibility for the little guy.
by logicprog
2/23/2026 at 7:22:13 PM
> And I am for the destruction of IP on all ends.While I'm not unsympathetic to the plight of creatives, and their need to eat, I feel like the pendulum has swung so far to the interests of the copyright holders and away from the needs of the public that the bargain is no longer one I support. To the extent that AI is helping to expose the absurdity of this system, I'm all for it.
I don't think "burn it all down" is the answer, but I'd love to see the pendulum swing back our way.
by spudlyo
2/23/2026 at 8:08:30 PM
Because copyright laws rarely serve small independent creatives, but rather corporations like Disney that are in the business of hoarding and monetizing culture.by paxys
2/23/2026 at 8:16:03 PM
Yeah, I would argue that, just systemically, intellectual property laws can't really do anything but overwhelmingly serve the interests of the wealthy and mega corporations. I also think they're ethically wrong and run counter to the kind of artistic and information culture that I would prefer, but those are arguments more people are likely to disagree on.by logicprog
2/23/2026 at 8:44:22 PM
I think most people would argue that dismantling intellectual property would mean the end of all new creative endeavors, as if humanity is only driven to create art for practical reasons.Schopenhauer, on the other hand, would argue that true art must serve absolutely no practical or utilitarian purpose, and that pecuniary concerns only corrupt artistic and intellectual labors leading to mediocrity and dishonesty.
by spudlyo
2/23/2026 at 8:56:45 PM
Copyright laws as we know them came into being sometime in the 18th century. The earliest recorded works of art produced by humans are from 40,000-45,000 BC. So it's hard to take the "we'll never have creative output without strict copyright!!" extremism seriously.by paxys
2/23/2026 at 9:42:20 PM
> Schopenhauer, on the other hand, would argue that true art must serve absolutely no practical or utilitarian purpose, and that pecuniary concerns only corrupt artistic and intellectual labors leading to mediocrity and dishonesty.As, similarly, would Bataille, one of the philosophers I'm interested in!
by logicprog
2/23/2026 at 10:21:35 PM
While true, 'rarely' ought not be conflated with 'never.' I am a small, independent creator, and I've used copyright laws many times over the years to stop larger entities from raiding my catalog for content. Of course now Anthropic et al. are gobbling up such catalogs for indirect misappropriation, with no sign of consequences, so perhaps copyright has truly shrunk to a one-way street favoring the major players.by DamnInteresting
2/23/2026 at 7:30:47 PM
They're trying to kidnap what Anthropic has rightfully stolen!Jokes and complete lack of sympathy aside, it does complicate the narrative that these small labs are always on the heels of the big labs for pennies on the dollar, if they rely on distilling the big labs models. That means there still has to be big bucks coming from somewhere.
by jsheard
2/23/2026 at 8:17:17 PM
I don't see Z.ai (GLM 5) in the list though. I consider Qwen/Kimi to have a close relationship so I might not be sure but Qwen might be using Kimi data (I have written another comment in more depth)I still prefer kimi fwiw. It's one of the best models I have witnessed open source and when I tried GLM 5, it really was lacklustre for me on its launchday but I will have to see it for myself now comparing the two maybe as I do see GLM 5 do some good things in benchmarks but we all know how benchmarks should be less trusted.
I still think that there is still some hope in chinese models even after this ie. they aren't completely dependent on the large models seeing GLM 5.
I am seeing an accusation of GLM 5 doing Distillation[0] but I am not seeing any hard evidence of it.
[0]: https://mtsoln.com/id/blog/wawasan-720/the-temu-fication-of-...
by Imustaskforhelp
2/23/2026 at 9:02:42 PM
It's greed, now that they have all the data and infrastructure they are pulling up the ladder.Why do you think not a single one of these labs have released an open source models distilled on their own SOTA model?
They are all preaching they want to provide AI to everyone, wouldn't this be the best way to do this? Use your SOTA model to produce a lesser but open source model?
by impulser_
2/23/2026 at 7:48:08 PM
A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model.B) I don't think the story is so clean. The distilled models often have regressions in important areas like safety and security (see, for example, NIST's evaluation of DeepSeek models). This might be why we don't see larger companies releasing their own tiny reasoning models so much. And copying isn't exactly healthy competition. Of course, I do find it useful as a researcher to experiment with small reasoning models -- but I do worry that the findings don't generalize well beyond that setting.
C) Maybe because we want lots of different perspectives on building models, lots of independent innovation. I think it's bad if every model is downstream of a couple "frontier" models. It's an issue of monoculture, like in cybersecurity more generally.
D) Is it really 90% of the performance, or are they just extremely targeted to benchmarks? I'd be cautious about running said local models for, e.g., my agent with access to the open web.
by ashertrockman
2/23/2026 at 10:29:07 PM
> Maybe because we want lots of different perspectives on building models, lots of independent innovation.That’s only really possible if the front runner don’t buy up all of the chips on the market.
by _aavaa_
2/23/2026 at 8:14:34 PM
Fair points, and worth responding to for a more nuanced discussion! I hope you take these responses in that light :)A) Well, sure, yes, it's different specific IP being distilled on versus what was trained on. But I don't see why the same principles should not apply to both. If companies ignore IP when training on material, then it should be okay for other companies to ignore IP when distilling on material — either IP is a thing we care about or it isn't. (I don't).
B) I'm really not sure how seriously I take the worries about safety and security RLing models. You can RLA amodel to refuse to hack something or make a bio weapon or whatever as much as you want, but ultimately, for one thing, the model won't be capable of helping a person who has no idea what they're doing. Do serious harm anyway. And for another thing, the internet already exists for finding information on that stuff. And finally, people are always going to build the jailbreak models anyway. I guess the only safety related concern I have with models is sychophancy, and from what I've seen, there's no clear trend where closed frontier models are less sychophantic than open source ones. In fact, quite the opposite, at least in the sense that the Kimi models are significantly less psychophantic than everyone else.
C) This is a pretty fair point. I definitely think that having more base frontier models in the world, trained separately based on independent innovations, would be a good thing. I'm definitely in favor of having more perspectives.
But it seems to me that there is not really much chance for diversity in perspectives when it comes to training a base frontier model anyway because they're all already using the maximum amount of information available. So that set is going to be basically identical.
And as for distilling the RL behaviors and so on of the models, this distillation process is still just a part of what the Chinese labs do — they've also all got their own extensive pre-training and RL systems, and especially RL with different focuses and model personalities, and so on.
They've also got diverse architectures and I suspect, in fact, very different architectures from what's going on under the hood from the big frontier labs, considering, for instance, we're seeing DSA and other hybrid attention systems make their way into the Chinese model mainstream and their stuff like high variation in size, and sparsity, and so on.
D) I find that for basically all the tasks that I perform, the open models, especially since K2T and now K2.5, are more than sufficient, and I'd say the kind of agentic coding, research, and writing review I do is both very broad and pretty representative. So I'd say that for 90% of tasks that you would use an AI for, the difference between the large frontier models and the best open weight models is indistinguishable just because they've saturated them, and so they're 90% equivalent even if they're not within 10% in terms of the capabilities on the very hardest tasks.
by logicprog
2/23/2026 at 9:03:45 PM
Yeah of course, I've been thinking about this a lot and I'm updating my beliefs all the time, so it's good to hear some more perspectivesA) I see what you mean. But I'm more so thinking: companies consider their models an asset because they took so much compute and internal R&D effort to train. Consequently, they'll take measures to protect that investment -- and then what do the downstream consequences look like for users and the AI ecosystem more broadly? That is, it's less about what's right and wrong by conventional wisdom, and more about what consequences are downstream of various incentives.
B) I don't really care about AI safety in the traditional sense either, i.e., can you get an LLM to tell you to do some thing that has been ordained to be dangerous. There's lots of attacks and it's basically an insoluble problem until you veer into outright censorship. But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me. I don't want my agent to read an HN post with a social-engineering-themed prompt injection attack and mail my passwords to someone. (If this sounds absurd, my Clawbot defaulted to storing passwords in a markdown file... which could possibly be on me, but was also the default behavior.)
C) This is a completely fair point, there's amazing work coming out of these smaller labs, and the incentives definitely work out for them to do a distillation step to ship faster and more cheaply. I think the small labs can iterate fast and make big changes in a way that the monolithic companies cannot, and it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying. Which is not to say they're doing none of that, GRPO for example is a fantastic idea.
One way you could have a change in perspective is not just in the architecture/data mix, but in the way you spend test-time compute. The current paradigm is chain-of-thought, and to my knowledge, this is what distillation attacks typically target. So at least, all models end up "reasoning" with the same sort of template, possibly just to interlock with the idea of distilling a frontier API.
D) Interesting to hear. In my research, I find these models to be quite a bit harder to work with, with significantly higher failure rates on simple instruction following. But my work also tends to be on the R&D side, so my usage patterns are likely in the long-tail of queries.
by ashertrockman
2/23/2026 at 9:50:39 PM
Thanks for the response!> it'd be nice to see that effort routed into creating new data-efficient RL algorithms or something that pick up all the slack that distillation is currently carrying
It seems to me like they're already doing that. Some of the most fun I've had actually is reading their papers on the different R.L. environments, especially Egentic ones they set up and the various new algorithms they use to do RL and training in general. Combine that with how much they are innovating with attention mechanisms and I feel like distillation doesn't seem to be really replacing research into these means as just supplementing it — and maybe even making it possible in the first place, because otherwise it would be simply too expensive to get a reasonably intelligent model to experiment with!
> But now that people are actually using LLMs as agents to _do things_, and interact with the open web, and interact with their personal data and sensitive information, the safety and security concerns make a lot more sense to me.
Ah, I see what you mean. Can you point me to any benchmarks or research on how good various models are out of waiting social engineering and prompt injection attacks? That would be extremely interesting to me. Fundamentally, though, I don't think that's really a soluble problem either, and the right approach is to surround an agent with a sufficiently good harness to prevent that. Perhaps with an approach like this:
https://simonwillison.net/2023/Apr/25/dual-llm-pattern/
Or this, which builds on it with more verifiable machinery, if you're less bitter-lesson pilled (like me):
https://simonwillison.net/2025/Apr/11/camel/
> That is, it's less about what's right and wrong by conventional wisdom, and more about what consequences are downstream of various incentives.
Ahhh, I see. Yeah, that could be negative. That's worth thinking about.
by logicprog