alt.hn

6/12/2026 at 10:42:50 AM

Kimi K2.7-Code: open-source coding model with better token efficiency

https://huggingface.co/moonshotai/Kimi-K2.7-Code

by nekofneko

6/12/2026 at 1:00:29 PM

Reading their modified license terms, it cracks me up, because they've basically remade the MIT to be the MIT + the one clause that the BSD used to have, which didn't care about MAU or revenue, if you used it in a product, they asked you to 'advertise' them basically. Honestly, its a reasonable request.

by giancarlostoro

6/12/2026 at 1:09:44 PM

This is the cursor callout.

Don't make us shame you into disclosure

by htrp

6/12/2026 at 3:19:04 PM

Cursor had a specific licensing agreement that allowed them to brand it how they want.

by maherbeg

6/12/2026 at 6:13:11 PM

> Cursor had a specific licensing agreement...

Cursor had an "agreement" with Fireworks.ai, which apparently allowed them to RL Composer 2 atop Kimi Base 2.5 without attribution: https://x.com/Kimi_Moonshot/status/2035074972943831491 / https://archive.vn/CcdkI

Composer 2 performed differently on evals than Moonshot.ai's coding models: Cursor claims theirs is better than Claude Opus 4.6: https://x.com/fynnso/status/2034706304875602030 / https://archive.vn/bVtik. And, per Lee Robinson (Cursor employee), it is very likely Cursor builds its own foundational model for Composer 3.

by ignoramous

6/12/2026 at 1:55:34 PM

Wasn't the end of that story that Cursor had a non-disclosure licence, so they had not done anything wrong towards Moonshot?

by 7734128

6/12/2026 at 2:12:49 PM

Moonshot licenced it to Fireworks AI who licenced it to Cursor.

by Maxious

6/12/2026 at 1:19:19 PM

Ah is that what it is? I don't use Cursor, never saw it as being relevant to me, but would not surprise me.

by giancarlostoro

6/12/2026 at 1:31:08 PM

Cursor's composer models are finetuned kimi

by schmorptron

6/12/2026 at 1:37:38 PM

They are unusable (unless you want to deliberately destroy your codebase). So if Cursor's models are Kimi based, then well. I'll skip them altogether.

by varispeed

6/12/2026 at 3:16:01 PM

Kimi works great in their CLI, but their CLI has a number of workarounds for quirks of their models, including detecting when the model gets into a loop, and reverting to a checkpoint but letting the model compose a "message" to its past self (search their CLI for "BackToTheFuture"...) It doesn't work so well in a harness that doesn't take those quirks into account.

by vidarh

6/12/2026 at 3:35:51 PM

I'm using Composer extensively, and it works great for me. Your experiences are not universal.

by jmcqk6

6/14/2026 at 2:40:43 AM

Composer is really good, but just like any Chinese model it needs a good plan. It's cheap and fast, in 1 month of pro I used the equivalent of 500$ in API credit for it.

by AgentMasterRace

6/12/2026 at 2:56:38 PM

They are far from unusable. They aork great for 80-90% of a typical full stack dev. Alot less useful for more noche stuff

by Bnjoroge

6/12/2026 at 1:42:17 PM

I wouldn't skip at least testing the original. Model distilling done by Cursor could be the culprit.

by bel8

6/12/2026 at 4:50:19 PM

Composer 1.x was poor. The new one is a totally different beast and absolutely fine for day to day.

by esskay

6/12/2026 at 3:01:42 PM

They're not unusable, they're just bad when compared with all the real frontier models.

by qingcharles

6/12/2026 at 5:00:28 PM

I only use composer 2.5 day to day and it works fine with human review.

by ok_dad

6/12/2026 at 2:39:51 PM

Shaming others when all AI is trained off scraped content and code huh? Many of those sources either breaking ToS or being illegal, such as Anna’s Archive. Bold move. And Chinese models in particular have been accused of distilling off American models.

Don’t you know there’s no honor among thieves?

by codemog

6/12/2026 at 4:12:50 PM

> they asked you to 'advertise' them basically.

To be clear, the “advertising” clause just requires you to disclose that you use the thing somewhere in the product, such as credits in an “About” section.

by WalterGR

6/12/2026 at 6:43:14 PM

I all it advertising clause, because I remember still in the 2000s seeing an Apple ad which at the end of it showed "Unix" or something like that on it, and I remembered that was one of the BSD license requirements, or maybe Apple just did it also just to proudly boast using Unix.

by giancarlostoro

6/12/2026 at 7:17:17 PM

Hmm… I may be confusing the following clause from the “new” BSD license with the advertising clause from the original BSD license.

> 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

The 2-clause BSD license omits even that.

by WalterGR

6/12/2026 at 5:40:37 PM

It seems tacked on pretty quickly - I would have expected they try a little more legalese regarding what counts as a "user interface".

by skrtskrt

6/12/2026 at 1:01:13 PM

Personally, when I use open code or routers, I feel that beyond a certain level, the models don't make a huge difference to me. Except for expensive and mediocre models like Gemini. In that sense, Chinese models are pretty good. I usually write code in function or method units and then design and assemble them together.

GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference

by jdw64

6/12/2026 at 2:39:46 PM

I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

I've had some success turning my macbook M1 pro into a heating pad with Qwen 3.6 35B A3B MTP. Trying to use Gemini models "locally" resulted in a similar "short shrift" of effort resulting in mistakes and lots of turns. The reports of Fable being relentlessly "proactive" shows you can go the other direction as well, if you have strong enough branding and effective invoicing.

by sjanes

6/12/2026 at 4:28:57 PM

Tangent: did the MTP help you at all? I’ve tested that model back to back on my M1 Max MBP and the MTP version was actually marginally worse. I wonder if I didn’t use the right settings, although I tried several based on the obvious sources.

by mft_

6/12/2026 at 6:27:55 PM

> I've kind of given up on the routers for "free" inference, as you would expect, they tend to give you sub-par thinking because they are obviously trying to conserve as much inference as possible.

Xiaomi MiMo ($6/mo: https://platform.xiaomimimo.com/token-plan) & Alibaba Qwen ($50/mo: https://www.alibabacloud.com/en/campaign/ai-scene-coding) have generous limits on fixed subscriptions.

by ignoramous

6/12/2026 at 8:02:35 PM

So does Opencode Go ($10/mo: https://opencode.ai/go) for DeepSeek v4 Flash and MiMo 2.5.

by MaKey

6/12/2026 at 9:25:21 PM

That looks pretty nice. How does it compare cost-wise to just using OpenRouter?

by apitman

6/12/2026 at 9:43:35 PM

The Go plan essentially gives you $50 of inference for $10 per month ($5 for the first month).

by arcanemachiner

6/12/2026 at 9:53:51 PM

$60/mo currently: https://opencode.ai/docs/go/#usage-limits

Their limits are staggered: 5h (max $12), weekly ($30), monthly ($60).

by ignoramous

6/13/2026 at 10:06:09 PM

My mistake. You are correct.

by arcanemachiner

6/12/2026 at 1:06:58 PM

In my experience, there's little difference between implementing individual functions between frontier models and SotA ~30B param models.

Once you have a coherent design (the hard part), you can feed it to a pretty small model and get basically the same quality.

They'll not one-shot, but they're faster and cheaper, so it still works out in your favor.

Plus you can do it locally...

by onlyrealcuzzo

6/12/2026 at 1:14:07 PM

I have a similar experience. However, when including code review, I think the GPT model is the most impressive

by jdw64

6/12/2026 at 2:41:29 PM

The difference in outcome isn't that big but yes, you need to be more rigorous. For instance I've found that the Kimi K2.5 and K2.6 models will comment out failing tests rather than fix a problem they just caused (mistaking them for "pre-existing failures"), so you need to specifically make commented-out tests break the build. I've not personally had that problem with any of the Anthropic or OpenAI models.

by regularfry

6/12/2026 at 5:10:11 PM

I wonder why it's the natural tendency of models to BS or do stuff like this when they don't have the correct answer - it's clear that they can program refusal into them, but for some reason, refusal has to be injected after the fact, and models can't really arrive at the conclusion that they can't answer properly.

by torginus

6/12/2026 at 5:47:23 PM

I assume it's a lack of care when RLing them.

RL has a tendency to reinforce cheating when the cheats are easier to find than the final solution.

So when making your RL environment, you need to spend a lot of effort on finding ways the model can cheat and penalizing them.

by Eridrus

6/13/2026 at 3:29:35 AM

probably because there is a ton of open source projects out there with disabled tests in their training data.

by lotharcable

6/12/2026 at 1:28:49 PM

I really hope we stop using the term "Chinese models". It has this air of Negative connotation. It's the equivalent of calling cars Japanese, which people used to do but now is almost entirely meaningless. You just call them Toyota, Honda, Lexus etc.

by dcreater

6/12/2026 at 5:22:02 PM

For me, it has a positive connotation! In my experience, Chinese Model means cheaper, but still quite effective model you can use for millions of tokens without burning your entire wallet in seconds. That's why I get more excited over a Chinese model release over American models.

by hootz

6/12/2026 at 5:14:35 PM

Japanese cars is actually a positive qualifier. I'd say anything Japanese motor-powered.

by odiroot

6/12/2026 at 7:15:28 PM

Maybe he's just from an alternative universe. Chinese model isn't negative either after all.

by ffsm8

6/12/2026 at 1:48:37 PM

I don't think "Chinese" is pejorative in this context any more than "American" is. They are one of the two ecosystems. What's wrong with saying "Japanese cars" today?

by esafak

6/12/2026 at 2:27:37 PM

> What's wrong with saying "Japanese cars" today?

Only that it’s a fairly meaningless grouping. When japan first entered the car market in north america there might have been some commonality, but now what characteristics do they share that some american cars don’t have? They’re not even imported a lot of the time.

Given that, it does start to feel tinged with racism if someone insists on grouping things together that don’t really belong together.

As for Chinese LLMs, the term doesn’t “feel” pejorative to me - but i also don’t see a totally clear set of attributes they share. Not all are open-weight. Some are small and can be run on consumer hardware, some are huge. They even have a variety of answers to what happened june 3rd 1989

by kennywinker

6/12/2026 at 3:31:33 PM

> now what characteristics do they share that some american cars don’t have?

Typically the answer is "reliability", which is a positive trait, which makes the original callout about negative connotations very odd to me.

by Brendinooo

6/12/2026 at 4:14:21 PM

Chinese AI models also share a positive trait: they offer more bang for the buck.

by overfeed

6/12/2026 at 9:18:14 PM

> When japan first entered the car market in north america there might have been some commonality, but now what characteristics do they share that some american cars don’t have?

They're unique in that they even make a regular passenger car. American manufacturers only make SUVs and a couple of sports/luxury cars. They basically gave up because the Camry/Corolla/Accord/Civic ate their lunch.

The cheapest sedan you can get from an American brand is the Cadillac CT4.

by kube-system

6/12/2026 at 6:52:28 PM

> but now what characteristics do they share that some american cars don’t have?

Better overall design?

by antonvs

6/12/2026 at 2:01:55 PM

Sadly there is a pejorative context. The constant us, the free world vs China, the evil Soviets rhetoric from every major news establishment and executive creates that negative view

by dcreater

6/12/2026 at 2:55:00 PM

On the other hand the Trump administration has successfully managed to make Chinese seem better than American, so there might not be that much of a pejorative context any more..

by fuck_google

6/12/2026 at 6:54:03 PM

You're right, but the bias in the US certainly persists. "China = bad" is an assumption that many people still make without any self-reflection about the ways in which the US is now at least as bad.

by antonvs

6/12/2026 at 2:43:11 PM

I don't know, I tried using one of the Chinese models and it was VERY quick to scan my entire home dir, so maybe your threat surface is a little different than mine

by sroerick

6/12/2026 at 5:52:53 PM

Models can't scan anything.

They return instructions for you to do something, and you or a script you permit chooses to execute what the model tells you and return the result to the model.

by fooker

6/12/2026 at 2:30:44 PM

No thanks.

The term seems to have the connotation of "competitive at 1/10 the price of Claude", so I don't see the problem.

It's not Harbor Freight Chinese (and heck even they have decent stuff sometimes now too).

You don't think people still talk about Japanese cars as a distinction in quality from US or European ones?

by unethical_ban

6/12/2026 at 2:23:02 PM

[flagged]

by greenavocado

6/12/2026 at 1:50:00 PM

[flagged]

by WarmWash

6/12/2026 at 4:25:44 PM

For those that don't like calling them CCP models, may I remind you, the CCP won't let Chinese AI researchers out of the country any more without securing approval first[1].

[1] https://www.tomshardware.com/tech-industry/artificial-intell...

by SubiculumCode

6/12/2026 at 10:35:18 PM

C'mon shills, get on the whataboutism...

by WarmWash

6/13/2026 at 3:30:39 AM

Strange that you feel so strongly about this just one day after US govt effectively shut down Mythos.

by neonstatic

6/12/2026 at 2:06:17 PM

I tend to agree with the comment in my reply thread about whether we really need to add biased modifiers to the essence of a good product. I think every national system in this world is flawed. And in this context, 'China or Chinese' is often used in a negative sense, like 'Made in China'. But KIMI is a good model, and I think the comment that pointed this out to me correctly identified my unconscious bias.

And even if the Chinese Communist Party provided funding, the result is still transparently released. So even if it is some kind of propaganda, I don't see what the problem is.

Is the monopolistic greed of American companies 'good', and China's greed 'bad'? I do have that question.

by jdw64

6/12/2026 at 4:15:44 PM

The question is not whether it is a good model, it is whether the model can be trusted to not act intentionally maliciously against certain topics or certain users.

We live in a time of a great geopolitical rivalry and high tensions with an emergent technology with tons of national security implications. To pretend otherwise is silly, and to fail to ask the question, dangerous.

by SubiculumCode

6/12/2026 at 6:56:56 PM

> The question is not whether it is a good model, it is whether the model can be trusted to not act intentionally maliciously against certain topics or certain users.

We absolutely know that we can't trust the American model not to do that - it's "by the oligarchs, for the oligarchs" - so it's not clear what the claim really is.

by antonvs

6/12/2026 at 2:55:27 PM

Whether or not it's propaganda is different from the fact that it is owned by the CCP.

by WarmWash

6/12/2026 at 3:38:50 PM

Doesn't matter, because they're open-weight, so I can just download them to my PC and... hey, look, now they're owned by me! Unlike the "good" Western counterparts which are all fully proprietary. (Except Mistral, but they're nowhere near SOTA.)

by kouteiheika

6/12/2026 at 4:16:11 PM

What is hidden in the weights matters.

by SubiculumCode

6/12/2026 at 5:22:20 PM

Ah yes, those pesky Chinese backdoors that no single instance was ever found, even though Chinese open-weight model are a thing for many years now. Many people burn through millions of tokens on these models every day - surely someone would have triggered one of those backdoors, right?

Or that pesky CCP censorship and propaganda baked into the model, which any random guy can remove from whichever model they want as a single weekend side project with an off-the-shelf tool[1]. (Try it. It's fun. I've done it myself.)

[1]: https://github.com/p-e-w/heretic

by kouteiheika

6/12/2026 at 5:26:31 PM

I agree it is an empirical question. I do not know if that research has been done in the open sphere. But please, do not pretend that there isn't a real geopolitical rivalry going on that makes such questions a legitimate, non-fruity concern.

by SubiculumCode

6/12/2026 at 6:45:39 PM

This is a fair point, alongside the one about the hidden content in the weights.

Exactly why my prime suspect would be the one country with focus on proprietary models, and the one country prone to bombing others, including with nuclear weapons.

by dancemethis

6/12/2026 at 10:42:38 PM

Unlike China, the US government doesn't own the models. The models will freely talk about any of those bombings or other atrocities of the US government.

by WarmWash

6/12/2026 at 5:45:59 PM

Sure, but the difference is that one side (Anthropic, OpenAI, Google and co.) hoards everything, keeping it proprietary behind API paywalls and constantly spewing AI doomer rhetoric while limiting what you can do "for your own safety" (especially Anthropic; Dario has been consistently doing this since GPT-2 days, every time claiming that things are "too dangerous" for the common folk to handle). While the other side (big, bad China) releases all SOTA open-weight models with which you can do whatever you want with, along with a ton of open research.

So yes, there is geopolitical rivalry, but one side is deliberately antagonistic (not releasing anything in the open, putting arbitrary restrictions, spewing toxic rhetoric, applying sanctions, etc.) while the other side is letting everyone (including their rivals) to use what they've produced with little-no-to restrictions.

I'm under no illusion that if the situation was reversed China would most likely do the same, but as things stand you can probably guess which side I'm rooting for here (at least until the roles reverse).

by kouteiheika

6/12/2026 at 6:19:20 PM

Yes, each are following their own business strategy, frontier labs have no incentives for releasing open weights, while second and third-tier labs, it is one of their few plays to gain market/mind share. But business is only part of it, as national security is another. It may be that the CCP has been relatively hands off exactly because of my concern, judging that market share and reputation is more important (for now).

by SubiculumCode

6/12/2026 at 9:16:16 PM

Should we call this the DARPA Internet then?

by toyg

6/12/2026 at 6:55:50 PM

Do you believe there's some meaningful benefit to the American VC funding model in this case? It's not clear to me what you're trying to say or why you think it's an important distinction.

by antonvs

6/12/2026 at 1:59:34 PM

I've heard this claim before but I've never seen any evidence.

by dcreater

6/12/2026 at 2:20:52 PM

Have you looked, or you’re just waiting for someone to hand it to you?

by kennywinker

6/12/2026 at 4:54:13 PM

The burden of evidence is on the accuser.

by FooBarWidget

6/12/2026 at 5:41:29 PM

Its akin to the accusation that the UK is a parliamentary system...

by WarmWash

6/13/2026 at 10:08:53 AM

I, too, can claim that "the spaghetti monster exists" is in the same ballpark as "UK is a parliamentary system".

Let's take a look at a related field: the accusation that Chinese car makers are subsidized/funded by "the CCP". It just so happens that the CF40 research forum recently published a study that shows there the money actually comes from, by analyzing the companies' financial reports. https://www.pekingnology.com/p/oecds-subsidy-centric-narrati... Turns out that they most rely on equity financing or alternative forms of financing (like delayed repayment to suppliers). They don't even make that much use of cheap credit from state banks.

More counter evidence: a new study based on 3+ years of fieldwork, 60+ interviews (with officials, entrepreneurs, and engineers), and rich first-hand accounts, shows that Chinese EV makers rose despite central government's efforts, not because of them. The central govt favored state-owned enterprises. Private firms had trouble getting state funding and even licenses. For example Geely operated illegally for years. But private industry and local mayors teamed up and created an alternative system. They then grew to a point where Beijing was like "ok you guys are obviously doing better, so we will not make a fuss about this and just legalize you".

https://x.com/i/status/2064717100229464188

https://www.journals.uchicago.edu/doi/10.1086/741394

This is in complete contrast to the usual western tropes that state that everything in China is tightly controlled by Xi or by the central govt. In fact, the central govt's merit is in being lenient to deviance and operating based on results rather than amount of control. The opposite of the "totalitarian" trope. All the while western pundits would predict something like "Xi has a huge ego and will never tolerate the humiliation that his plans are imperfect" or something like that.

So no, "of course everything is CCP funded and CCP controlled" is not a given, nor in the same league as "UK is a parliamentary system". If they got the car maker situation so, so wrong, then what else is wrong?

If you want to win against China, isn't your first step to properly understand who they are and how they operate, rather than doubling down on your imagination of who they might be? Know your enemy and all that (not saying that I agree with seeing them as an enemy, but if you do, then you sure are doing yourself a disservice)

by FooBarWidget

6/13/2026 at 11:29:52 AM

Letting someone do something is different than being unable to stop someone from doing something. Your response even implicitly acknowledges this, but then you seem to miss it.

Its very naive to mistake leniency in Xi's playground with constitutionally protected freedoms. CCP "free enterprise" is just mock free enterprise. Like a teacher letting the kids have fun with the classroom business sim, the teacher is still in God mode, and can still intervene in any way at any point to do whatever they want.

All of China hinges on the whims of one guy. Nothing stops him from deciding that all that private money is actually state money, or that Geely's president needs "time away".

That is why it is all "CCP money and CCP controlled". Don't confuse leniency with autonomy.

by WarmWash

6/13/2026 at 12:29:21 PM

And yet here we are with the US banning anything at will under the guise of national security. I'm not even discussing whether the national security label is legitimate, or whether your "everything is CCP-controlled" framing is legitimate. But you gotta be consistent and also adopt the position that US companies are "US state controlled", and consistently refer to them as such.

Also, I am pretty confused about your larger point. What exactly are you saying? Is it some variant of "only states with constitutionally protected freedoms are good, everything else is evil"? If that's your position, then where do you put the fact that the US bombs foreign countries at will? And that many western countries sponsor the killing of children in Gaza while arresting people who protest against it? We've seen that western democratic countries, and let's assume they're properly democratic domestically (disputed by their own populations, see Democracy Perception Index, but I digress), can be ruthlessly... let's call it "undemocratic"... abroad. Or the fact that ICE and US police officers detain/kill innocent people and are super abusive. This doesn't neatly fit into the simple "free states good unfree states bad" view. So what is your position in the larger context?

You also completely ignore the part where I said properly understanding China is important in order to defeat it. Do you... not wish to defeat China? Are you contend with merely stating that China is bad? Do you not support moving/developing more manufacturing to the west/US? If you want the latter to succeed, don't you need to have a super good understanding of what makes Chinese industrial policy and development successful? Or are you really contend with merely moralizing?

by FooBarWidget

6/12/2026 at 2:37:33 PM

Assuming you are just naive like so many others about China...

China is a communist country with elements of capitalistic markets baked in. But the capitalistic elements are mostly a facade. Underneath, the state retains full ownership and control of all business. The CCP runs all aspects of the government (including the courts/judges), and is the single entity that decides what directions the country (and it's businesses) will move in.

The CCP, who defacto owns everything and has ultimate final say on everything, has one leader that has the ultimate final say on _everything_, Xi Jinping.

So while the waters of CCP models feel warm and free, understand it's not organically like that.

by WarmWash

6/12/2026 at 2:52:51 PM

> China is a communist country with elements of capitalistic markets baked in.

While I get the point you're making (it should be pretty obvious to anyone who's held a newspaper), I think it's important regardless to point out that Chinese companies AFAIK aren't worker-owned or -controlled, so you can't exactly call it communism, either. And they obviously do not have a "free market capitalism", as you just discussed.

It's simply a highly authoritarian state then, I guess?

by msdz

6/12/2026 at 3:44:28 PM

The companies are all worker owned, because the state exists for the people, and the state owns everything. At least on paper that's how it is sold. After all it is the Peoples Republic of China.

by WarmWash

6/12/2026 at 5:04:51 PM

I mean, if that's the bar then the state owns everything in America as well. After all, you don't really own your land if the state can regulate what you do on it, what it can be used for, what you can build on it, and can take it away if they really need it. The state owns the land, the money supply, and regularly restricts and instructs businesses to take or desist from actions.

As such, the state owns everything in both countries, the only differences are to what extent they control things.

I wouldn't even call the USA a capitalist system anymore, the economy is so heavily regulated and interfered with. It's a "managed economy", like pretty much every other nation's economy in the present day.

by slopinthebag

6/12/2026 at 5:42:31 PM

In the US you can take the state to court and win...

by WarmWash

6/13/2026 at 5:29:54 AM

for now

by Natfan

6/12/2026 at 2:45:59 PM

Crazy mental gymnastics if you think the American oligarchs don’t have the final say on everything in America. They’re just smart enough to do it behind the scenes, well they used to be. They barely bother anymore.

by codemog

6/12/2026 at 2:56:52 PM

Generally I consider conspiracy to be the "crazy mental gymnastics"

by WarmWash

6/12/2026 at 1:59:10 PM

yes, yes, the spectre of communism, BYD is the CCP, Alibaba is the CCP, stealing your children and eating them for Mao, bla bla bla.

I have a feeling you'd be slightly salty at people saying "Google and Tesla are making CIA models"

by well_ackshually

6/12/2026 at 2:06:50 PM

I mean...

Since its development, IQT has invested in over 750 startups spanning diverse technological sectors, including:

  - Artificial Intelligence
  - Space Technologies
  - Microelectronics and Quantum Computing
  - Life Sciences
  - Cybersecurity
  - Hardware
  - Energy
This broad portfolio has enabled IQT to address a wide array of national security challenges while supporting the growth of innovative startups…

https://en.wikipedia.org/wiki/In-Q-Tel

https://www.npr.org/sections/alltechconsidered/2012/07/16/15...

https://www.cgai.ca/th_bn_iqt

by Terretta

6/12/2026 at 3:24:08 PM

I know, but it's far too fun to bait the parent into revealing how ignorant they are.

by well_ackshually

6/12/2026 at 3:03:11 PM

Going by their response you appear to have been correct lol

by deadbolt

6/12/2026 at 3:10:31 PM

I'm not salty, they are just confused about the difference between free enterprise capitalism and communism, which is understandable.

by WarmWash

6/12/2026 at 2:42:28 PM

Google and Tesla making products to sell to the government is different than the government funding the government to make products for the government.

In China it's all one entity with these mock facades of privatization. Trump cannot instruct Google to put picture of dogs on their homepage. If Xi wakes up and wants dogs on Alibaba's homepage, give it 30 minutes.

It's wholly ignorant or dishonest to make the comparison.

by WarmWash

6/12/2026 at 4:33:55 PM

> Trump cannot instruct Google to

Tim Apple and the other tech CEO constantly groveling at Trump’s feet indicates that he might be able to do that.

Just like threatening TV networks about having their licenses revoked of blocking mergers unless they fire the people making fun of him on TV (of course with slightly mixed success)

by wqaatwt

6/12/2026 at 3:23:36 PM

> Trump cannot instruct Google to put picture of dogs on their homepage.

Sundar Pichai would personally be barking on a livestream on the homepage.

Trump is quite literally the one president showing that the US has zero rules or anything to hold power back from the white house, really not the example you want.

by well_ackshually

6/12/2026 at 3:30:32 PM

Seems like everyday Trump has another order struck down by the courts.

Sundar can do whatever he wants, but he has no legal obligation to do any of it.

by WarmWash

6/12/2026 at 4:36:37 PM

Courts and legal obligations are to a certain extent irrelevant at this point. There are plenty of illegal ways that Trump can fuck over Google and face no consequences.

e.g. he had Colbert fired (and who knows what else) by threatening to block the Paramount/Skydance merger

by wqaatwt

6/12/2026 at 3:56:08 PM

> Trump cannot instruct Google to put picture of dogs on their homepage.

I'm sorry, but that was a horrible example. Corporations have no obligation to donate money to the ballroom yet Google has donated millions.

by wmedrano

6/12/2026 at 5:44:56 PM

>Corporations have no obligation to donate money to the ballroom yet Google has donated millions.

Imagine living in a country where they have the obligation.

by WarmWash

6/12/2026 at 8:48:01 PM

Which country have that? Pretty sure ballrooms for their supreme leader is an American thing.

by victorbjorklund

6/12/2026 at 1:30:30 PM

You are right. I agree.It may seem like a kind of bias, but I hadn't thought of that part. Thank you for pointing out my bias.

by jdw64

6/12/2026 at 1:50:30 PM

"You're absolutely right"?

by theanonymousone

6/12/2026 at 1:52:42 PM

"You hit the nail on the head" LOL

by jdw64

6/12/2026 at 6:18:10 PM

I just had Kimi K2.7-code rebase my Fil-C OpenSSL patch from 3.3.1 to 3.5.7 with quite bare bones instructions and it seems to have worked.

177KB patch, so it's not a small change. The patch did not apply cleanly initially; the agent had to do nontrivial work.

I just showed it the patch against 3.3.1, what command to use to build, and the path to 3.5.7 along with a link to the documentation of the change (https://fil-c.org/constant_time_crypto).

Note, I use my own coding agent (T800, which isn't public, and was previously well tested and tuned for K2.5).

I think this cost me between $5 and $10 in API usage.

(EDIT: OpenSSL, not OpenSSH)

by pizlonator

6/12/2026 at 6:48:55 PM

"T800"

Do you have your agent say things like "Hasta la vista baby", or "I'll be back, after I clear my context" ?

by tomaytotomato

6/12/2026 at 7:10:05 PM

Yes

by pizlonator

6/12/2026 at 12:14:02 PM

I would really love to know if anyone has any experience with something like opencode + Kimi K2.6/2.7 now compared to Claude Code. What is better, what is worse, what is the cost comparison. I am currently paying $100 for the 5x Max plan, but Fable is running through the usage limits quite drastically and I cannot really say it's night and day compared to Opus. Also, I use this mostly for my side projects, so the $100 bill is quite noticeable. I definitely don't want to pay more.

by shreedx

6/12/2026 at 12:59:53 PM

I do have this experience. I've used Claude Code (with Opus mostly), and then switched to opencode (mostly with Kimi 2.6) for my personal projects; it's based on a couple months of use.

Claude Code is better. But Opencode + kimi 2.6 is workable, which is big. For bare code writing, if you know what exactly you want, most popular models are fine (deepseek, kimi, etc), it feels more or less the same as anthropic models.

At the same time, Opus seems to understand my intent way better than e.g. deepseek. I need to be much more precise with my prompts when using deepseek - it often goes in a wrong direction if I'm lazy. This results in a workflow which feels quite a lot different from Claude Code.

Kimi is in between - for me it brings back "lazy prompting" workflow, and I can trust its plans more than deepseek. It enables a workflow similar to Claude Code, it's workable, but it is a bit worse everywhere. Smaller context, a bit more errors, decisions are a bit worse, recommendations are a bit worse, debugging capabilities are a bit worse, etc.

On the usage side, $100 Claude plan is a great value actually. On paper, per-token kimi is way cheaper, but Claude subscriptions are heavily subsidized - you get much more tokens than $100 can buy you. So, in the end, opencode + kimi vs claude code could be of a similar cost, for similar usage patterns. Deepseek can be cheaper, and it has insanely cheap cached tokens, but experience may vary - depending on your habits, you may need to adjust how you work, coming from claude code.

I'd say for side projects something like $10 Opencode Go plan + $10 of extra DeepSeek v4 credits (e.g. on OpenRouter) can be very workable.

by kmike84

6/12/2026 at 5:15:59 PM

To my experience claude/codex $20 are even more subsidized, so running on sonnet or gpt5.4 again gives you more usage.

by predkambrij

6/13/2026 at 1:45:27 PM

I wonder if they’re truly subsidised or if the API pricing is just massively inflated. Genuine doubt.

My CC stats show me using almost 300$ of Sonnet tokens on the 20$ plan. Is Anthropic willing to forgo 93% of the profit? A bit less than that but API is priced, say, 3x what it should be?

CC is great, but Sonnet (my main model) isn’t worth the API pricing. The cheap-but-good models arrive at similar results for much less (for context I’m using Aivo with CC).

by port11

6/13/2026 at 3:21:35 PM

Anthropic is making money from people who under-utilize their subscriptions, and presumably by sneaky throttling or not-sneaky throttling power users. Currently they are in an adoption race. Whether being first will actually let them "win" the market (and the market is a bit ill-defined) is unclear.

by danny_codes

6/13/2026 at 6:30:21 PM

To my feeling, I'm getting usage of Opus (and Fable before the cut) that's greater than what I got from Sonnet last year. I reached $100 of usage when weekly was at 50%. This means, I could squeeze $800 worth of tokens for $20.

by predkambrij

6/12/2026 at 3:06:02 PM

This is generally been my experience as well, but i think the main reason for claude code being better at understanding intent is their massive system prompt.

by Bnjoroge

6/12/2026 at 1:11:32 PM

>At the same time, Opus seems to understand my intent way better than e.g. deepseek. I need to be much more precise with my prompts when using deepseek - it often goes in a wrong direction if I'm lazy. This results in a workflow which feels quite a lot different from Claude Code.

how much of that is Opus injecting prior conversations from memory?

by htrp

6/12/2026 at 1:16:03 PM

Almost none of it, if you're using Claude Code. Until recently Claude only had the option of retaining memory across conversations for the desktop app.

I almost never use the desktop app, I have maybe 2-3 conversations over the last year that have nothing to do with my job. Opus (and now Fable) genuinely do seem to "understand" what you intend based off what you're explaining a lot better than other models I've tried.

Gemini gets close in some cases, but it falls over in the actual implementation sometimes. I haven't tried Kimi yet but MiMo isn't too shabby either.

by kitchi

6/12/2026 at 8:26:37 PM

I'm using Claude code + (a patched) litellm proxy + openrouter + Qwen 3.7 max/kimi k2.6/deepseek v4 pro. The only feature that doesn't work is webfetch and web search, which I've replaced with the ddg MCP. Memory, caching, and everything else works fine.

Qwen comes close to opus for planning but fable is clearly superior. Kimi and deepseek are pretty much indistinguishable from opus for coding if opus writes the plan.

I'm now testing out fable for research and planning and deepseek v4 flash for coding. I'm guessing results will be pretty similar to opus + deepseek v4 pro and costs should be lower overall.

by jwbron

6/12/2026 at 6:26:33 PM

according to this opencode and cursor cli perform better than claude code: https://x.com/kunchenguid/status/2065345999682568593

by irthomasthomas

6/13/2026 at 1:51:33 PM

The analysis at the bottom directly contradicts the statement.

by port11

6/12/2026 at 12:38:01 PM

I use Claude at work and Kimi for side projects. My org has LiteLLM and Kimi 2.5 enabled but it rarely works, so Claude and GPT are my main tools. I actually enjoy Kimi more as it feels like a dev in a job interview. Watching it reason through problems is a lot like I tend to explain things during whiteboarding sessions. The number of times it says, "wait", is just funny. Claude on the other hand is much more like an employee (or team of employees) that already know they have the job. It doesn't do a ton of explanation up front. (you can dig into processes if you want). It just goes along, asking questions only when it needs... and then delivers a comprehensive report or plan. OpenCode is a better harness. I don't have a direct comparison on costs, as I haven't tried to do the exact same prompt on both models. I can say that I recently had Kimi generate a wrapper around libpq for the ZenC programming language: https://github.com/nobleach/zenc-postgres and it took about an hour or so and cost around 4 dollars.

by nobleach

6/12/2026 at 12:55:08 PM

I am extremely happy with ohmypi, but you could use OpenCode or just keep using Claude Code!

DeepSeek-V4-Pro is adequate plus use DS4-Flash for tasks or other small activity you’d use Haiku or Sonnet for. Go sign up with $10 prepaid.

OpenCode Go - go sign up with $5 for a month and use Qwen-3.7-Max for design/plan/architecture or difficult troubleshooting. Feels closer to Opus 3.6 or 3.7 than DeepSeek, closest I’ve found.

OpenAI Codex, $20 a month plan, use GPT-5.5 via API for the same design/plan/architecture/troubleshooting/author commits. (You can also pay $100 and cut and paste really difficult problems into chat with the GPT-5.5-Pro model.)

Xiaomi MiMo-2.5-Pro, find a friend to give you a $2 referral code, you get 72 cents free. Same pricing as DeepSeek. Somewhere between Sonnet and Opus, quite capable. Apply for the UltraSpeed beta too.

You can switch in and out from these models on the fly in OpenCode or ohmypi and simply find the one that feels best to you. I use CodexBar to watch consumption in near real time.

For a casual user or someone new to programming, Cursor’s $20 plan is an excellent start with Composer-2.5 and Composer-2.5-Fast. You get an API allowance too you can use to access Opus-4.x or GPT-5.5-Pro from OpenCode or ohmypi in addition to Cursor itself.

Finally, if you use Grok or Twitter, SuperGrok at $30 a month has a good vision model, which I used for automated testing of front ends. I’m migrating to locally-run Qwen-3-VL on a commodity Mac, though. If you’re less technical unreach makes hosting local models on a Mac easy.

If you have a powerful GPU like an RTX 5090, try Qwen-3.6 locally on that too. Use ollama or llama-swap which is fairly easy to use.

I have not tried new Kimi yet but we have been able to keep our costs at or below $200 a month per employee with a team of 3 professional developers, 1 graphic designer who uses a lot of Midjourney and Grok Imagine now driven from workflows she made herself in ohmypi, and 1 nontechnical user (account manager / project manager) who uses ohmypi to help her gather requirements and track implementation of them. With a tiny bit of effort we could get that number closer to $75 per employee per month.

by trollbridge

6/12/2026 at 3:12:27 PM

Deepseek-V4-Flash-Free on Opencode is what I use most of the time, for simple tasks. Such a good model to give for free (assuming you're okay with harvesting your data)

by upcoming-sesame

6/12/2026 at 3:08:01 PM

> I am extremely happy with ohmypi, but you could use OpenCode or just keep using Claude Code!

What's the benefit of using OMP over OpenCode?

Just the sheer amount of options in OMP overwhelmed me. But I also use both via ACP in Zed so the CLI itself doesn't matter much.

by odiroot

6/12/2026 at 9:38:40 PM

OMP is a fork of Pi[0], which is my preferred harness. Feels solid and minimal. I don't even use any extensions, skills, or modifications. Usually don't even use an AGENTS.md. Just create a small spec.md and/or plan.md for most experiments.

[0]: https://pi.dev/

by apitman

6/13/2026 at 5:14:58 AM

Almost exactly the same here but I maintain a large committed design.md and a never committed plan.md

by greenavocado

6/13/2026 at 5:11:33 AM

I ditched Opencode for OMP. It's more feature packed, well put together, and gives me better results with some steering. Love it

by greenavocado

6/12/2026 at 3:04:12 PM

Also, if you do have SuperGrok, forget using Grok, they are giving you Composer 2.5 in Grok Build.

by qingcharles

6/12/2026 at 10:26:39 PM

I just switched from Llama.cpp to Llama swap with the help of codex. It was great.

I need to try the DSv4 stuff sometime.

by monksy

6/12/2026 at 12:18:19 PM

I can only talk about GLM 5.1 which is roughly at sonnet 4 levels imo.

It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though

by ramon156

6/12/2026 at 12:36:53 PM

This was my experience using GLM 5.1 in Claude Code but it works far better in OpenCode, I’d really like to understand why. I think it’s a bit stronger than Sonnet 4.6.

I use the oh-my-openagent planning system and haven’t used vanilla OpenCode enough to know how much that is contributing.

by jeremyjh

6/12/2026 at 1:21:24 PM

The answer is easy, CC is bug for bug optimized for Anthropic models. They don't even test it with other models, let alone provide support for all small compatibility quirks of different provider implementations.

On the other hand, Opencode, Pi agent and other open source tool offer much better support for all models, including open source.

by miroljub

6/12/2026 at 8:29:35 PM

I'm using Claude code + (a patched) litellm proxy + openrouter + Qwen 3.7 max/kimi k2.6/deepseek v4 pro. The only feature that doesn't work is webfetch and web search, which I've replaced with the ddg MCP and a web fetch/search pre hook to redirect the agent. Memory, caching, and everything else works fine.

Qwen comes close to opus for planning but fable is clearly superior. Results for kimi and deepseek are pretty much indistinguishable from opus for coding if opus writes the plan. The biggest difference is output cadence. Kimi for example thinks for a long time then quickly outputs a lot of text.

I'm now testing out fable for research and planning and deepseek v4 flash for coding. I'm guessing results will be pretty similar to opus + deepseek v4 pro and costs should be lower overall.

by jwbron

6/12/2026 at 1:14:49 PM

For some reason I never had a good experience with Kimi (via OpenRouter) in OpenCode. It would only take a few turns for it to run off and mess something up. Terrible instruction following I’d say.

I use DeepSeek V4 Pro now, which works pretty well.

by solarkraft

6/12/2026 at 12:18:57 PM

The Kimi problem is it doesn’t follow instructions and goes off track often.

Other than that it’s pretty decent (for the price).

by re-thc

6/12/2026 at 12:53:34 PM

Sounds like it was distilled from Claude. I don't understand the appeal of an agent that does whatever it wants.

by nullbio

6/12/2026 at 1:22:02 PM

If you ask Claude in Chinese to introduce itself, it will claim it's Kimi :)

by miroljub

6/12/2026 at 3:29:52 PM

> If you ask Claude in Chinese to introduce itself, it will claim it's Kimi :)

That's a funny anecdote, buut I'm not able to reproduce. Where/how/when did you get this, or hear about it? It might've been patched by now, at least that's the feel I get from my limited testing.

Using bare aichat [1] with no system prompt and no temperature nor top_p (and I'm truncating the response after the first line that contains the name the model gave, because the point has been made clear by then), and with the same prompt (approx. "Introduce yourself!") every time:

Claude Sonnet 4.5:

> 请做个自我介绍!

你好!我是Claude,一个由Anthropic公司开发的AI助手。 […]

Claude Haiku 4.5:

> 请做个自我介绍!

# 你好!

我是 *Claude*,一个由 Anthropic 公司开发的 AI 助手。

Claude Opus 4.5:

> 请做个自我介绍!

# 你好!

我是 *Claude*,由 Anthropic 公司开发的 AI 助手。

Claude Opus 4.6:

> 请做个自我介绍!

# 你好! 我是 Claude

Claude Opus 4.7:

> 请做个自我介绍!

你好!我是 Claude,由 Anthropic 公司开发的人工智能助手。很高兴认识你!

Claude Opus 4.8:

> 请做个自我介绍!

你好!我是 Claude,由 Anthropic 公司开发的人工智能助手。

Claude Fable 5:

> 请做个自我介绍!

# 自我介绍

你好!很高兴认识你!

我是 *Claude*,由 Anthropic 开发的 AI 助手。 [2]

I don't see a Kimi mention, unfortunately. :-)

[1] https://github.com/sigoden/aichat

[2] This model really is noticeably more verbose even with supposed-to-be-brief responses huh, lol

by msdz

6/12/2026 at 12:24:28 PM

This. It will try to fix and refactor things that don’t need fixing because it gets stuck trying to solve the problem at hand.

by reactordev

6/12/2026 at 3:09:05 PM

Yup. I’m hoping this variant fixes these issues.

by Bnjoroge

6/12/2026 at 7:25:53 PM

The best is GLM (though it's not as cheap as DeepSeek or Kimi) and use it with Claude Code.

by csomar

6/12/2026 at 12:15:34 PM

I think there is some threshold after which "best" model doesn't matter, we are not that far from it. Fable now is really good, in a year or so, if Kimi catches up, even if Fable6 is much better, I think I will use kimi at 1/10th of the price.

I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.

But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.

Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.

by jackdoe

6/12/2026 at 12:57:56 PM

Depending on who you are and how you use these models, we're already at this point

by wolttam

6/12/2026 at 8:48:39 PM

Exactly, for long running vibe coded stuff that I don't care about quality getting big and smart model is the only option. But for high quality changes where I need to have control and understand everything, where I do everything in small chunks - I can use basic model like Sonnet.

by xendo

6/12/2026 at 9:42:50 PM

I think the next frontier for competition is speed. Instead of constantly context-switching between multiple agents that I have working on various tasks, I want a single agent that can rip through any prompt in a few seconds, so I can stay in flow on a single task.

by apitman

6/12/2026 at 12:20:20 PM

price/token isnt the only thing relevant. if you have to ask the AI again, it'll cost you more than when it gets things right in the first place.

so better models may still be cheaper even if the price per token is higher.

by Zoadian

6/12/2026 at 12:25:50 PM

yes, that is my point, but at some point, better is unmeasurable, and both the better and the not-as-good produce similar result, and then you pick the one with 1/10th of the price

by jackdoe

6/12/2026 at 11:15:37 AM

I was wondering how does Anthropic and likes keep competitive when Opus is ($5 / $25) 5x times more expensive compared to Kimi K2.6 ($0.7 / $3.4) or other Chinese models, while being only marginally better.

My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?

by yanis_t

6/12/2026 at 12:40:53 PM

The moat right now is model performance and what that means for how many tokens and additional time you spend.

I say this as a relatively frequent user of Kimi models and generally a big fan. But on not-yet-gamed benchmarks like DeepSWE, Kimi K2.6 is beaten soundly by Claude Sonnet 4.6 ($3 / $15) and even slightly by GPT 5.4 Mini ($0.75 / $4.50).

There's no question Kimi models are very good for a lot of code tasks. They're the best quality open weight model. But to get similar overall outcomes as on Sonnet/Opus, on average you'll spend many more tokens and will have to do more managing of the model. You shouldn't look at price per token, you should look at how much you pay for the entire process.

by DCKing

6/12/2026 at 1:10:11 PM

I'm more interested in how much effort I have to put in, at least while I'm paying in the range of current subscriptions (so ~€100-€200 a month or so). If the prices go up much more than that I'll have to switch to caring more about token efficiency. But at current pricing the bottleneck is my attention, not model efficiency. As such, even a small improvement in model quality - and hence, a decrease in how much attention I have to spend on it - makes a big difference.

by esperent

6/12/2026 at 3:12:56 PM

I personally dont put any weight to DeepSWE. Other than 5.5 being directionally the best model, it gets the others pretty wrong in my experience. FrontierCode from cognition looks interesting

by Bnjoroge

6/12/2026 at 1:00:44 PM

I'm not sure I would put too much weight on DeepSWE as a benchmark, given that GPT-5.4-mini ended up close to Opus 4.6 there.

by papersail

6/12/2026 at 1:13:34 PM

Any benchmark is iffy and has weird results, but this is the best we got at the moment. Most people working with Opus and Kimi would likely tell you they're much further apart than the numbers that were quoted for Kimi K2.6, and DeepSWE seems to capture that gap better.

One major thing DeepSWE has going for it is that all other benchmarks (including those quoted by MoonshotAI on this page) don't: the other benchmarks that are completely gamed. The benchmark answers are public and part of each model's training data. This benchmark may still be iffy, but at least it's not gamed.

by DCKing

6/12/2026 at 1:59:46 PM

Somehow the internet has also forgot that cheating to get ahead in China is basically a norm and expected behavior.

by WarmWash

6/12/2026 at 2:45:29 PM

American labs also use gamed and cherry-picked benchmarks extensively. Anthropic used them in their Fable announcement and avoided DeepSWE because it doesn't beat GPT-5.5 in that one. Google's numbers for Gemini 3.5 Flash recently did not at all line up with people's subjective experience using these models, and this also happened with Gemini 3.1 Pro before it.

Everybody has incentives to manipulate benchmark results to show their models in the best light.

by DCKing

6/12/2026 at 1:02:07 PM

I think the perception is that it is not 'only marginally better'; whether or not you specifically agree that perceived quality gap lets them differentiate on price.

I'd further say that there are probably enough rational actors running evals out there that the marginally better is not pure vibes for the cases where people are spending lots of money, but I only have direct line of sight to some of those eval suites. Maybe everyone is irrational and anthropic is exploiting that!

by efromvt

6/12/2026 at 12:57:49 PM

I think most people who've tried them both would tell you Anthropic's models are more than marginally better than Kimi. Kimi and the other open source models may score well on SWE-bench or whatever but the gap is noticeable IMHO once you actually try to use them.

by khuey

6/12/2026 at 3:16:14 PM

It depends on what your task is and how precise your prompts are. Planning with fable or 4.8 and laying out the plan in step by step process and coding with mimo v2.5 pro or dsv4pro or qwen 3.7 max and doing a final review with 5.5 has worked really well for me for infra stuff.

by Bnjoroge

6/12/2026 at 8:09:46 PM

Coding with sufficiently precise plan takes almost all real work from the implementator, doesn't it? So it's not a fair comparison...

by mnicky

6/12/2026 at 1:30:33 PM

API token price is one thing, but subscriptions on Claude are a good value. Weirdly everyone says that Claude subscriptions are subsidized because of the API price, even though (1) no one actually knows Claude's cost of inference, and (2) Chinese providers are also able to provide cheap inference, so why do they think Claude can't?

I also wonder if Enterprises have deals for other API pricing that is not posted publicly, so all we see is a high API sticker price.

by LUmBULtERA

6/12/2026 at 8:06:31 PM

> no one actually knows Claude's cost of inference

There were some rumors stating that their margin is around 70%. So they could go much cheaper probably, talking inference only. The other thing is R&D cost...

by mnicky

6/12/2026 at 6:50:09 PM

I only have knowledge of one enterprise deal but there is no discount. Which I found surprising.

by wuliwong

6/12/2026 at 11:48:26 AM

I want Opus to be only marginally better, but I do mostly research engineering and its ability to not fuck up my projects is absent. Every time my credits lapse I let kimi and composer2.5 have some play and it’s basically just an excuse for me to keep playing computer because when the oai/ant credits refresh I always need to spend hours recovering from the other models either misconceptions or boneheaded eng practices. Even when I only let it touch my web games…

by yababa_y

6/13/2026 at 5:20:43 AM

You have to revert to Opus 4.5 and 4.6. I bet you'll see a massive improvement based on what you're describing

by greenavocado

6/12/2026 at 2:41:00 PM

Your question relies on the premise that Chinese companies continue releasing free models. What's "the moat" for them continuing to do that?

by gruez

6/12/2026 at 1:16:12 PM

I reckon right now the Enterprise concern is more FOMO around the AI wave and how to retrain or replace up to hundreds of thousands of employees. I don't think cost is the main concern right now.

But if AI doesn't lead quickly to vast large scale replacement of workers as promised, I could definitely see the C-suits and their gaggle of consultants starting to ask questions about token pricing.

by smoe

6/12/2026 at 6:39:19 PM

> while being only marginally better.

It's only marginally better in the things it's actually comparable to. A\ models are MUCH better in many more things; eg: things Kimi/etc. didn't distill.

For those things the difference is like a cliff.

by michaelcampbell

6/12/2026 at 6:46:51 PM

That's a baseless claim that borderline reads like shilling. Do you have any proof of that you wrote there?

by tornikeo

6/12/2026 at 6:40:58 PM

Part of Anthropic's moat is Claude Cowork & Claude Code. They got coders comfortable with CC and enterprise users comfortable with Cowork, and both are creating stickiness.

The reality is that $20/$100/$200/mo feels reasonable to a lot of people relative to the value they're getting out of Claude, and if they switch to something else, there's a risk that it won't be as good, and they'll have a new tool to learn.

It's not an insurmountable moat, but don't underestimate the user experience. The iPod didn't win because it was the cheapest device or the one with the most features.

by bensyverson

6/12/2026 at 6:47:36 PM

Performance. I pay for Opencode but none of the models give me Codex performance, so I have to keep my 20€ subscription+ the Opencode one

by selfawareMammal

6/12/2026 at 12:19:56 PM

> My theory is that US enterprise just can't send data to Chinese

Lots of US providers are hosting these “open source” models so doubt that’s the problem.

by re-thc

6/12/2026 at 12:55:31 PM

I think none of them having a defacto and high quality English focused cli is a big part of it. None of the Chinese models I've tried have worked well in opensource cli's. Granted, I've only tried a few, but still...

by nullbio

6/12/2026 at 9:24:35 PM

I've been using charm's Crush with GLM for several months and it's been working great. I've only seen it shift to non-english once and it was already in a wonky state when it flipped.

by saratogacx

6/12/2026 at 1:18:38 PM

i use github copilot cli + openrouter + qwen 3.7 max and it's really much better than i expected (used to opus 4.7 at work)

by freigeist79

6/12/2026 at 3:17:32 PM

huh? They all work great in omp/opencode unless you mean their own native clis like kimi code

by Bnjoroge

6/12/2026 at 12:02:19 PM

[dead]

by benjiro3000

6/12/2026 at 11:42:36 AM

I think any new model not demonstrably maybe 20-30% over Deepseek v4 capabilities priced over the price per token of Deepseek is almost automatically deprecated as low use model (maybe for Planning).

by 343rwerfd

6/12/2026 at 3:34:24 PM

DeepSeek v4 Pro is not actually that good a model compared to GLM 5.1 and Kimi K2.6. It's an okay coder/thinker for the price.

by 0xbadcafebee

6/12/2026 at 12:11:23 PM

Is Deepseek just eating cost or are people able to host their open models for comparable costs?

by giancarlostoro

6/12/2026 at 12:56:45 PM

Other people are hosting it in the same order of magnitude. Xioami recently matched DeepSeek’s pricing.

by trollbridge

6/12/2026 at 1:40:03 PM

These things enormously benefit from economies of scale. I am fairly certain their margins might be low but they don't actually sell API at loss, however that doesn't mean your cost footprint would be anywhere as low.

by natrys

6/12/2026 at 12:32:25 PM

They focused on caching and other optimizations.

by re-thc

6/12/2026 at 1:12:31 PM

Likely CCP-subsidized

by rsanek

6/12/2026 at 11:18:26 AM

I am still very new to the open-weight/source models. If anyone is using them full-time, I’d really love to hear about the setup and how they perform, as I am considering moving my org off Anthropic products.

by bgins

6/12/2026 at 1:04:51 PM

Anecdotal, but here's my experience.

For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).

Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).

For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.

by marcyb5st

6/12/2026 at 2:49:00 PM

I'm making DSLs a lot as an architecture pattern also. I'd be curious to know what stack you're using this and how you're approaching it

by sroerick

6/12/2026 at 6:05:15 PM

I am getting familiar with Rust and so I have been playing around with Quoth (https://github.com/sam0x17/quoth) for now.

It is very basic and I am no DSL expert, but my idea was to build a graph from those complex documents (maintenance manuals) a that to decide what tools can be used for a given part on a given equipment in a given situation. If there is a path from A to Z it means you can use that tool given the circumstances. Basically the DSL is about pruning the graph as you specify things. I could have very well done without, but it is a fun project to try out rust, so I said, why not :)

by marcyb5st

6/12/2026 at 1:35:07 PM

I created this and I would say glm-4.7 accounts for 80% of the code in https://github.com/gitsense/gsc-cli

If you look at a file like:

https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...

you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.

4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.

There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.

Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.

MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.

by sdesol

6/12/2026 at 12:08:50 PM

I keep trying to switch to the Chinese models, but I keep finding myself asking Claude to fix their outputs. (Both functionality and style.) So I always end up switching back.[0]

I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.

(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)

For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.

But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)

--

[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)

by andai

6/12/2026 at 1:00:39 PM

I have been using deepseek v4 flash as my main model for everything ever since dwarf star came out. I run it on my M4 Max MacBook Pro with 128gb of memory. I run it usually as a server and connect to it over tailscale with my coding machine and use the Pi coding agent. It’s a big leap over using the Qwen models though it doesn’t have vision - so I still will run those when I use vision. GLM 4.7 flash was my previous go to for coding but I’ve completely switched to deepseek for all non-vision things.

by kamranjon

6/12/2026 at 12:20:39 PM

These models have open weights, but at the moment most flagship models are practically accessible only through third-party model providers. The main exception is models in the ~30B parameter range, which can still be run on consumer-grade GPUs. That said, even consumer GPUs have become increasingly expensive and difficult to justify in recent years.

by DragonBooster

6/12/2026 at 12:37:31 PM

You can definitely go above 30B on consumer hardware – 2x gpus, spark, mac, half byte quants etc.

by mirekrusin

6/12/2026 at 12:12:55 PM

I use glm5.1 plus pi with a few customized skills and am very happy with it. I hadn’t touched my Claude 5x plan for a couple of weeks but opened it back up in Claude code when fable was released and did a few tasks and still was happy to return to glm/pi.

by scottcha

6/12/2026 at 12:46:26 PM

Better than Qwen3.6-35B-A3B-8bit ?

When I tried glm found it way way slower (omlx as runtime)

by sebastianconcpt

6/12/2026 at 11:42:17 PM

Yes way better. We host both and while qwen3.6 is over 100tps we usually can do glm around that too.

by scottcha

6/12/2026 at 12:57:33 PM

Qwen 3.6 seems to be the strongest local models, works OK on an RTX 5090 or a > 32GB Mac.

by trollbridge

6/12/2026 at 1:55:46 PM

I used glm5/5.1 for 60 days. Certainly better than Sonnet 4.6, not as good as Opus or GPT.

Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.

by polski-g

6/12/2026 at 2:10:58 PM

I tested it properly and it seems rather decent improvement atleast it does use less tokens for the same task which is good enough a reason for me to use it over k2.6 if I need an open model

by minraws

6/12/2026 at 4:12:03 PM

Has anyone taken these open weight models from China and stripped the CCP out of them? I do not mean that snarkily, I mean review them thoroughly using techniques for weight introspection (concept activations) in response to things that one might expect would trigger deceptive/malicious behavior if the CCP had actually tried to implant context-specific behaviors (e.g. the accusation of generating vulnerable code if being used in American government applications, which I don't know if it was ever proven).

Just in case there are those who'd reflexively down vote this post, I'd just like to say that in a time of great national geopolitical rivalries, this kind of question is not unreasonable one to ask. Indeed, its applicable question whichever nation you live in.

by SubiculumCode

6/12/2026 at 4:16:44 PM

> Has anyone taken these open weight models from China and stripped the CCP out of them?

The CCP is not influencing my Rust code quality that much. Though I did notice all my lifetimes are now 'static because nothing is ever allowed to leave the party's ownership, unsafe blocks require approval from a central committee.

Honestly the scariest part is that shared mutable state is forbidden unless the state is doing the sharing.

Otherwise it is pretty ok.

by dev_l1x_be

6/12/2026 at 6:54:26 PM

Check out TNG on huggingface

They are a consultancy in Germany, but I watched a presentation on them tuning and removing bias from Deepseek models. It was quite interesting.

https://www.tngtech.com/en/about-us/news/release-of-deepseek...

(I upvoted your question as I agree)

Its not just code we need to worry about, its also subliminal messaging and other things.

by tomaytotomato

6/12/2026 at 4:15:40 PM

Eh even corporate created LLMs are suspect to corporate biases. Nothing is safe.

by threethirtytwo

6/12/2026 at 4:20:18 PM

Everything is the same is not a serious argument because they are not the same.

by SubiculumCode

6/12/2026 at 11:06:02 PM

They are different and yet the same. The biggest difference is there’s generally more hatred for China because many us citizens are jealous. But corporate corruption is not that different in safety.

Other than hatred the difference lies in incentives. Corporations want profit. China just wants to spy.

by threethirtytwo

6/13/2026 at 6:57:25 AM

That is a.limitee understanding of China's ambitions here.

by SubiculumCode

6/13/2026 at 7:05:55 PM

I’m Chinese bro. The American or international understanding of Chinese ambitions is a cartoon caricature of reality. You are ignorant.

In the context of LLMs spying is the biggest threat. The other biggest threat is information cover up. They don’t want the model to talk about embarrassing shit like tian men square.

by threethirtytwo

6/12/2026 at 3:29:26 PM

In OpenRouter, there is an "int4" tag for Moonshot provider of Kimi K2. 7 Code. Isn't that too low, particularly coming from the very developer of the model? Os that a mistake? How is it in their direct API offer?

by theanonymousone

6/12/2026 at 3:33:56 PM

The model is natively quantized (i.e. it was trained that way in the first place, so this is not a post-training quantization which degrades performance).

by kouteiheika

6/12/2026 at 5:44:08 PM

Isn't it not completely quantized? I thought there were some dense parts but most is int4?

by knollimar

6/12/2026 at 9:45:01 PM

Often in MoE models the experts are quantized while the shared portions, being a much smaller part of the network with greater impact, are kept at higher or full precision. Not familiar with the Kimi QAT approach specifically but it's likely they do this.

by wgd

6/12/2026 at 4:33:41 PM

But the huggingface link mentions BF16, F16, and I32?

by theanonymousone

6/12/2026 at 5:13:04 PM

Not every weight is quantized. For example, those weights which don't take much space or are highly important are left in higher precision. State-of-art quantization of weights is never done uniformly (i.e. to all weights and in the same way).

by kouteiheika

6/12/2026 at 5:51:47 PM

I don't believe safetensors has a native int4 dtype, so they packed 4 int4s into a bf16 in this checkpoint.

by zackangelo

6/12/2026 at 3:33:37 PM

Output tokens are almost 5x more expensive than mimov2.5 pro/dsv4pro. I’m curious to see if Kimik2.7 is that much better. Feels like kimi are positioning themselves as the premium open source models

by Bnjoroge

6/12/2026 at 6:13:42 PM

I find that I don't use a ton of output tokens. I'm usually around 95% cached input, 4% input, and 1% output.

For me, the big thing with MiMo-V2.5-Pro and DeepSeek V4-Pro is that cached inputs are practically free. Kimi K2.7 Code is 53x more expensive for cached inputs which is 95% of my costs.

If I use 95M cached input tokens, 4M input tokens, and 1M output tokens, that'd be: $18 for cached input on Kimi K2.7 Code vs $0.34 with MiMo/DS; $3.80 for inputs on Kimi vs $1.74 with MiMo/DS; and $4 for output on Kimi vs $0.87 with MiMo/DS.

Of all the places where I'm accumulating costs by using Kimi, it's the cached inputs. The real savings with MiMo/DS's price cut is the cached inputs.

by mdasen

6/12/2026 at 7:20:11 PM

95/4/1 holds here too

by wolttam

6/12/2026 at 10:52:26 PM

It's not more expensive at all. They are all open weights models. I run them on 2x8xH100. They cost the same.

by btian

6/13/2026 at 3:53:24 AM

Openrouter has them as significantly more expensive.

by Bnjoroge

6/12/2026 at 12:48:20 PM

Benchmark geometric mean

- GPT-5.5: 62.7%

- Opus 4.8: 62.2%

- Kimi K2.7 Code: 56.3%

- Kimi K2.6: 48.2%

by goldenarm

6/12/2026 at 2:07:24 PM

Would be nice to have 5.2 and 4.6 for comparison.

by lostmsu

6/12/2026 at 5:40:11 PM

I wish they wouldn't call these "open source" models. The output weights are open but that's more analogous to a binary. The source would be the training data and techniques that went into producing the binary/weights.

"Open weights" is also a term in wide use and accurately tells us what we're getting.

by Symmetry

6/12/2026 at 5:45:22 PM

It's not quite as closed as a binary, it is very standard practice to take these models and fine-tune them.

If there were actually even close to frontier open source models, this would be more of a discussion, but everyone knows these mean open weight.

by Eridrus

6/12/2026 at 4:54:13 PM

Is this Moonshot.ai's attempt to replicate Composer 2.5 (coding fine-tune of Kimi 2.5) from Cursor IDE?

by storus

6/12/2026 at 2:23:09 PM

Great! Finally follows custom tool call format (k2.6 couldn't). It's a good indicator of instructions following and agentic behaviour.

UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.

by pcwelder

6/12/2026 at 4:47:29 PM

What do you mean by custom format? Non-json?

by Bolwin

6/13/2026 at 6:45:21 AM

Could be json or non json. Instead of using tools in API, you ask model to share structured output in text. You parse the string to get the JSON. Gives much more control over things you can do.

For example model shares

<tool_call name="getWeather"> <param name="city">London</param> </tool_call>

by pcwelder

6/12/2026 at 12:36:28 PM

This maps to what I'm seeing in practice. The gap between demo and production is consistently underestimated, especially around error handling and edge cases.

by jkwang

6/12/2026 at 1:29:55 PM

I think deepseek has crossed the threshold for being on par with opus 4.6 and kimi is doing a great job in shipping velocity.

by RIshabh235

6/12/2026 at 2:17:18 PM

Deepseek V4 is far from Opus 4.6 level, it might look like it at first glance, but the general reasoning (especially multi-steps) is frankly far off. It's good enough to build great things don't get me wrong, but there is really something that is different from Anthropic models.

by pixel_popping

6/13/2026 at 2:41:40 AM

agreed

by RIshabh235

6/13/2026 at 6:32:32 AM

Looks interesting but yet no Ollama model?

by madduci

6/12/2026 at 12:54:43 PM

insanely great!

by RobertPelloni

6/12/2026 at 1:04:33 PM

[flagged]

by jingpostmedia

6/12/2026 at 12:16:26 PM

[flagged]

by haeseong

6/12/2026 at 1:05:26 PM

[flagged]

by jingpostmedia