Qwen3.6-Plus: Towards real world agents

4/2/2026 at 3:00:06 PM

This is their hosted-only model, not an open weight model like they’ve become known for. They got a lot of good publicity for their open weight model releases, which was the goal. The hard part is pivoting from an open weight provider to being considered as a competitor to Claude and ChatGPT. Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.

Comparing to Opus 4.5 instead of the current 4.6 and other last-gen models is clearly an attempt to deceive, which isn’t winning them any points either.

I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper. I don’t know how successful they’ll be in the race to the bottom in this market niche, though. Most users of cheap API tokens are not loyal to any brand and will change providers overnight each time someone releases a slightly better model.

by Aurornis

4/2/2026 at 3:28:59 PM

> not an open weight model like they’ve become known for.

Right, they state that they'll release "smaller" variants openly at some point, with few details as to what that means. Will there be a ~300B variant as with Qwen 3.5? The blog post doesn't say.

by zozbot234

4/2/2026 at 8:20:43 PM

I wish they had a revenue goal to release openly, that way spending money in them would contribute to better open models in the long run.

This is how I view that the public can fund and eventually get free stuff, just like properly organized private highways end up with the state/society owning a new highway after the private entity that built it got the profits they required to make the project possible.

by dietr1ch

4/3/2026 at 6:12:48 AM

There are a lot of options for doing things this way:

https://en.wikipedia.org/wiki/Threshold_pledge_system

by JimDabell

4/2/2026 at 9:25:51 PM

As a publicity stunt, releasing a 300B open model is pretty smart. You can talk about its strong performance and it being “open” and “available,” but it’s so large that most people can’t use it themselves and might try out the cloud-based offering.

by drob518

4/3/2026 at 8:11:41 AM

I'm running qwen 3.5 397b on very standard hardware. Just use the unsloth quants, they're great. I get like 20t/s or something.

It's super not a publicity stunt, qwen 3.5 is the base of the best local models out there IMO.

by kadoban

4/3/2026 at 1:59:48 PM

Well, you didn’t post the specs on your rig. I think it’s probably more correct to say that you run it on very beefy but readily available hardware. My point was not that nobody could run a 300B model, but rather that a 300B model is not going to be runnable by a majority of people. Sure, anyone who wants to run that model and has the money to purchase the hardware can do it. But the hardware is going to be pricey and most people don’t already have it unless they were trying to run large models before this. My overarching point is that most people with average laptop specs purchased over the last 3 to 5 years are going to have to consume this from the cloud. Which is great for Qwen.

by drob518

4/3/2026 at 6:08:50 PM

I just have a 3090 and 64gb ram. Yes this is more than most people have, but calling it a "publicity stunt" is just so uncharitably weird of a characterization.

There's smaller models all the way down too.

Like this should be _exactly_ what we want companies to release.

by kadoban

4/3/2026 at 7:40:26 PM

I apologize. I didn’t mean to suggest a “publicity stunt” was a negative. Perhaps I should have said that it was a great marketing strategy. My point was, they can cite all the metrics associated with a frontier model and yet to actually get those metrics most users will have to purchase cloud-based services. That all. And sure, some people will definitely be able to run the model and benefit from it. As you say, this is what we want.

by drob518

4/4/2026 at 9:23:32 PM

Yeah it is apparently some kind of marketing strategy I guess. Tbh I can't imagine they're getting enough out of it for it to make sense for them. Personally, I'm not looking the gift horse in the mouth too closely, I'm just happy that the current insane rush to make better models means we get some decent "open" ones to play with.

by kadoban

4/3/2026 at 2:28:34 PM

I can run a 300b model, but I don't do it. We need the H100's for training

by rurban

4/2/2026 at 9:37:04 PM

The large models are actually MoE these days so they're usable on ordinary hardware with weights streaming from SSD, just very slow. You're nonethess right that it makes the cloud-based offering more popular, since you can use that for convenience after testing a few inferences locally.

by zozbot234

4/3/2026 at 5:16:19 AM

“Usable but slow” is how you could run regardless of MoE or not, the model architecture has nothing to do with it. MoE might run N times faster than non MoE where N is the number of experts but that’s it.

by vlovich123

4/3/2026 at 3:07:05 AM

There are plenty of model providers that can serve them though at cheaper prices and cannibalize Alibaba revenue.

by mogili1

4/3/2026 at 6:44:12 AM

You can run it on runpod for example.

by Bombthecat

4/2/2026 at 6:13:06 PM

I'm not interested in adopting an inferior closed source weight from a geopolitical rival. The open source weights argument was the one thing China had going and that I was seriously cheering them on for. They could have been our saviors and disrupted the US tech giants - and if it was open, I'd have welcomed it.

Now they show their true colors. They want to train models on our engineering to replace us, while simultaneously giving nothing back? No thanks. I'd rather fund the shitty US hyperscalers. At least that leads to jobs here.

If there's a company willing develop and foster large scale weights in the open, I'll adopt their tooling 100%. It doesn't matter if they're a year behind. Just do it open and build an entire ecosystem on top of it.

The re-AOLization of the internet into thin clients is bullshit, and all it takes is one player to buck the rules to topple the whole house of cards.

by echelon

4/2/2026 at 8:02:26 PM

> I'm not interested in adopting an inferior closed source weight from a geopolitical rival. The open source weights argument was the one thing China had going and that I was seriously cheering them on for. They could have been our saviors and disrupted the US tech giants - and if it was open, I'd have welcomed it.

Qwen is not the only Chinese lab, and the others have shown no change in their commitment to open source. Allegedly Qwen hasn't either if their recent statements are to be believed. They're just hoping to capture market share with *-claw customers before releasing an open weights version. We'll have to wait and see how before they decide to release that.

by Zetaphor

4/2/2026 at 8:23:58 PM

> the others have shown no change in their commitment to open source

I wouldn't call this totally accurate, especially as of late. What's closer to the truth however is that there's lots of second-rate players in China doing open models, that will be getting a lot more attention from local AI proponents if the big names seriously slow down their AI releases. The local AI scene as a whole is quite healthy.

by zozbot234

4/5/2026 at 2:26:21 AM

> I wouldn't call this totally accurate, especially as of late.

What exactly has changed? Alibaba just released a bunch of new models and have said 3.6 weights are coming soon. The others labs have shown no signs of slowing down their releases either. Whatever you're referring to is news to me and likely most others.

by Zetaphor

4/2/2026 at 7:02:33 PM

Whereas I as a Canadian am absolutely eager to see a serious competitor from a rival to the US because sending money south to Anthropic and OpenAI who think it's ok to spy on (or worse) their non-American customers, and are headquartered in a country that is trying to crush my country's economy, interfere in our domestic politics, and put us out of work and making threats on political allies.

I'd prefer them to be open weight, but I'd love to sub a decent competitive coding plan from a European or Chinese provider. Right now they're not quite there. If closing it and charging for it brings them closer to competitive, that's ok.

If the US tech and AI industry long term wants customers and a broad market outside of their own domestic base, they need to reconsider who they are bending the knee to, and how they are defining their policies in relation to the Trump administration.

Bring on the Chinese competition.

by cmrdporcupine

4/2/2026 at 7:56:16 PM

China (meaning the Chinese government specifically, not the people of course) is widely considered to be a low-key geopolitical rival to the developed West in general including Canada and Europe, not just the U.S. I don't exactly like this and would certainly prefer that this wasn't the case, but we can't exactly ignore the facts. This matters when we choose whom to rely on for things like certain hosted third-party services, including AI inference. GP's stance actually makes a lot of sense from this POV, even though it's just as true that many Chinese folks are doing wonderful work on open-weight local AI.

by zozbot234

4/2/2026 at 8:00:13 PM

China has never threatened war against my country; America has. Between the two, it’s clearly safer to lean towards the Chinese options if EU ones aren’t available.

by Filligree

4/2/2026 at 9:23:46 PM

That’s incredibly naïve.

by andsoitis

4/2/2026 at 11:35:24 PM

More naive than blithely blowing off threats of war?

by whimblepop

4/2/2026 at 10:45:38 PM

Meh, people have their own interests and values. And you can't force people to spend money no matter how much you may disagree with them

Bring on the Chinese, fuck the Americans.

by 3acctforcom

4/3/2026 at 12:49:51 PM

Americans hating on the Chinese for doing to them what they did to the rest of the world for 50 years.

Just without the bombs part.

by cmrdporcupine

4/3/2026 at 8:46:00 AM

Nothing but Love for China.

by kiviuq

4/2/2026 at 9:35:15 PM

How so?

by NonHyloMorph

4/3/2026 at 11:54:48 AM

"the developed west" is not really a thing. its not a alliance like the eu just a description for some countries. a lot of them are split over china and its a major political issue in places like poland or germany. the only country where all parties treat china like the enemy is the united states, and thats just because theres only two major ones and both listen to corporations instead of their voters. the eu as a organization is a rival to china (not enemy) with all the special duties and import restrictions but thats just economic self interest and not every member is on board. when you ask the average person i think like 80 percent either dont care or think relations should be more friendly. if you ask people under 25 its basically everyone.

by tancop

4/4/2026 at 8:53:37 AM

I hate to be the one to tell you this but things changed rather quickly when they elected a clown dictator, and now the US is widely considered to be a low-key geopolitical rival to the 'developed West in general' (blergh) including Canada and Europe ...

Seriously, even if you manage to elect someone capable ever again, the US can't be trusted to not elect someone worse than a toddler again in 4 years. In fact, you can't even be trusted to not elect someone even worse than your current dictator (if you thought it couldn't get worse, Trump isn't the bottom of the barrel by far).

by tripzilch

4/3/2026 at 3:29:25 AM

I've been using z.ai and codex latest models since last September. Each release has been an improvement.

codex handles longer sessions but the quality seems to decline and it tends to over engineer and lose focus. It will happily add slop on top of slop...which may pass immediate tests of "code works" but doesn't pass my criteria of "code as craft"

I'm using z.ai GLM with opencode. It's obvious when GLM loses its mind when the session gets too long.

I've been using AI to support programming for around 3 years now. The models have gotten amazing. However, unless there is a significant breakthrough I have determined that it's best for me to focus on short sessions.

I a) organize my work, b) improve my AGENTS.md, ensure source has appropriate comments to guide the models to the patterns and separation of concerns c) use shorter sessions d) review and test without AI. This approach means I still own my code. The AI is just an assistant.

With this approach GLM-5.1 is an excellent model. I never run out of token allotment on z.ai or codex plans. At this point, I only keep my OpenAI subscription as the ChatGPT desktop app is excellent at long web research tasks and I get codex with it.

by jhancock

4/3/2026 at 12:30:19 AM

You're giving up the rest of your country to a geopolitical rival from a separate region, in a separate hemisphere with smiling expansionist goals, even allowing armed Chinese security to protect Chinese installations in country. So why not give the rest of your country to China.

It will help them get a good flank on the USA such that even when that temporarily embarrassed country gets a leader you, and the rest of the world do like, it will be too late to do anything.

A perfect definition of cutting off your nose to spite your face laid bare for all to see.

by mikrotikker

4/3/2026 at 1:21:31 AM

"Temporarily embarassed" doesn't even begin to describe what's happening down there.

We have an American neighbour actively funding and amplifying a formerly extremely fringe separatist movement in Alberta -- shades of the Donbas, North American edition --and a US "ambassador" who has the behaviour of a 4chan troll.

The bridge has been blown up. Americans might think they are a midterm election away from salvation, but we're on the whole not so naive.

by cmrdporcupine

4/3/2026 at 8:59:32 AM

No, a rational decision based on a crazy man in the US. The US needs to learn, that if it threatens its traditional allies, they go to work with china, the main competitor of the US. If the US wants it allies back, the tariffs have to go, and the childlike rhetoric and threats as well. If not, china _deserves_ the business of the US former allies.

by abc123abc123

4/3/2026 at 2:24:01 PM

Right, and we're not just watching the behaviour of the US administration, we're watching the behaviour of the electorate / populace. At the polling booths but also in online comment sections, as tourists, consumers, etc.

And mostly not liking what we see. Encouraged by the No Kings protests, but unless that boils over into a hegemonic and stronger opposition, it still seems like there's a 40% population there that can't deal rationally with the world inside their own border, let alone outside.

Also... When Biden took over after Trump's first term most of the protectionist policies stayed and foreign policy didn't really budge (outside of support for Ukraine). I expect similar if (big if) the Democrats regain executive power.

by cmrdporcupine

4/3/2026 at 12:47:46 AM

The US under Trump is politically and strategically almost identical to China, and can be trusted about the same.

And then, compared to China, the US acts overtly hostile: threatening us with war, starting a war in order to collapse energy supplies outside of the US. Opportunistic beyond even China, much more hostile.

Will the US even be a democracy in two years? Is it now?

Nah man, balancing between China and the US is the only thing a smaller country can do in order not to be crushed

by zwaps

4/2/2026 at 6:53:46 PM

> I'm not interested in adopting an inferior closed weights model from a geopolitical rival.

That's a very reasonable stance. It doesn't change the fact that we do have plenty of local models (up to and including Qwen 3.5) that are still quite useful.

by zozbot234

4/3/2026 at 5:00:10 AM

looking at the other replies I'm not seeing what I consider the most important rebuttal to this argument: there is no real "adopting" in this hybrid open/close space right now, the lock-in is minimal and as much as different corporations are trying to create a lock-in effect by closing down their tools and interfaces, they are not really succeeding

I can constantly jump from one provider to another, and to my local servers which are already able to run very useful models at reasonable hardware cost, and I intend to continue doing that for the foreseeable future

the one thing I'm not going to do is tying my tooling to one provider or another or getting overtly used to the specifics of a model outside of my control

more than the weights or the training, which of course are very important, the real battle right now is for establishing some dependency mechanism so that your users won't just flee en-masse as soon as you inevitably try to abuse your market dominance and lock-in mechanisms, as is customary in everything computing these days - note that i don't explicitly talk about raking up prices, that is just one of the most difficult methods as people are very sensitive to that, when you can sneakily sell user data, get government contracts from never-disclosed conditions, or even just incorporate your intelligence to ad networks in one way or another

by muyuu

4/2/2026 at 9:23:26 PM

> I'm not interested in adopting an inferior closed source weight from a geopolitical rival.

I'm USian myself, but I don't think the site should be very US-centric.

by benatkin

4/3/2026 at 3:43:19 AM

z.ai models are open weights. GLM-5.1 is very close to Opus with obvious exception of session length.

Only academic models will be true open source as companies can't legally afford to disclose learning inputs.

In regards to "They want to train models on our engineering to replace us". Some software engineers in China can run circles around some of the best teams in Silicon Valley. Days of U.S. hegemony are over. I recommend you make peace and make friends.

by jhancock

4/2/2026 at 7:26:03 PM

This is not even the first closed weights Qwen model.

by evilduck

4/2/2026 at 5:09:31 PM

Ah, so that explains the recent wave of Qwen team-member departures.

by miki123211

4/2/2026 at 7:46:26 PM

Interesting! What is your reasoning behind that? I just learned there where closed models from the team before this so that shouldn’t have been a surprise for the employees? Or do you think the internal communication was: we will release better open models the the existing closed ones to push everything forward and now when they are getting competitive they are becoming proprietary?

by dhfs

4/3/2026 at 8:39:51 AM

> Most users of cheap API tokens are not loyal to any brand

In the exploration phase, yes. But once your setup settles down you likely want to stay on the same model for stable operation.

by jona-f

4/4/2026 at 5:52:39 AM

I feel like this is true. I don't mind being a blip behind the bleeding edge if I don't have to change my tooling every month. But the second my current provider tries to screw me over, I'll still jump ship

by 8n4vidtmkvmk

4/2/2026 at 10:10:55 PM

The business model, howver, is lobster in a bucket. Any model that starts gaining as a private model will have competitors to release comparable open models because those locked in customers will not swotch unless you demo the capabilities.

So expect every now and then a open model burp from the trailing frontiers. Afterall, its all sunk cost so once you have it and no customers, theres zero reason not to spike your competition and try again or exit.

by cyanydeez

4/2/2026 at 10:18:01 PM

I use different models in production and model's "personality" as in tendency to not go off script, not consume gazillions of tokens recursively, follow instructions etc, are more relevant than "brute" power which is okayish as a metric for agentic coding on generous token plans.

Chinese models are very competitive in that regard, you'll often look at 70-90% price reduction at the same quality.

by epolanski

4/2/2026 at 7:16:14 PM

4.5 is better than 4.6 though in practice. 4.6 was purely a cost savings change with enough benchmark gamification to look better.

by nwienert

4/3/2026 at 10:09:10 AM

I've found Opus 4.6 to be smarter than 4.5, at least in some ways. There's a bug I'd been trying to solve for a decade (and so had other humans) and I've been giving it to each model to try and solve, including in interactive sessions. Each model got closer, but none of them actually solved it, until Opus 4.6 got it on the first go (I probably used Ultrathink). This was before the 1M context was available.

I'd agree that 4.6 and 4.5 are different, but I don't think it's correct that 4.6 is just reduced and benchmaxxed. It genuinely solved problems for me that no other model has been able to.

I think I'd like to have seen the 4.6 benchmarks also included against Qwen.

by SyneRyder

4/2/2026 at 10:59:16 PM

Exactly. 3.6 plus in the exact same coding agent harness is notably worse in all of my testing compared to 3.5 plus.

The former gets stuck in ridiculous thought loops on the exact same tasks I’m testing. Fascinating really, I expected more for some reason.

by girvo

4/2/2026 at 6:12:34 PM

I’m starting to wonder where the most is for any of these models.

Sure they are not cheap to train. But if open weight models continue to be trained and continue to become available on cheaper hardware, how do dedicated AI companies protect their margins?

by jimbokun

4/3/2026 at 1:53:35 PM

OpenAI found the answer: artificially curtail the supply of DRAM wafers (by buying 40% of world's supply without necessarily having a feasible plan to make use of all of them], to prevent consumers from getting access to gpus with more and more memory, which could allow them to get dangerously close to the state of art while running local AI

Rather than an increase in VRAM of consumer gpus, we are seeing a decrease, which is pretty optimal for OpenAI

by nextaccountic

4/2/2026 at 5:03:20 PM

Opus was released in Feb 2026. Even though it feels like a long 2 months has passed, its' not really clear that they were developing this as a competitor to that product.

There's nothing really strange about not competing directly with the best, but rather showing whom you are as good as.

by true_religion

4/2/2026 at 6:51:14 PM

I don’t know why anyone would do the mental backflips to defend this.

They posted charts with logos for Claude and others. You had to read the fine details to realize they weren’t comparing to the latest offerings from those companies. They were counting on you not noticing.

There’s zero reason to compare to old models unless you’re trying to mislead.

by Aurornis

4/2/2026 at 4:20:15 PM

> Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.

The naivety around this has been staggering quite frankly. All of a sudden, people thinking that meta etc are releasing free models because they believe in open access and distribution of knowledge. No, they just suck comparatively. There is nothing to sell. Using it to recruit and generate attention is the best play for them.

by jstummbillig

4/2/2026 at 5:11:29 PM

I thought Qwen was releasing open-weight because China can't compete with America (because of people's privacy concerns), so the only thing they could do is salt the ground economically with open models, and make sure everybody loses.

Apparently that wasn't actually the play here.

by miki123211

4/2/2026 at 5:27:52 PM

Qwen is actually a pretty strong player in the Chinese market. There is an implied "salt the ground" play but it's mostly from hardware makers, who are trying to keep the big AI players honest and also stand to gain if local inference becomes popular.

by zozbot234

4/2/2026 at 5:05:22 PM

I don't think there's so much naivety. People can be aware of the the plan and still be frustrated and disappointed when it happens.

by Gracana

4/2/2026 at 9:59:17 PM

I’m waiting for the grand reversal where Anthropic abuses the Qwen API to train the next Haiku.

by GorbachevyChase

4/2/2026 at 6:49:29 PM

For a brief moment there were a lot of comments about how Chinese tech companies are our saviors in the age of AI because they were releasing their models. It was an edgy contrarian take that was getting a lot of traction, mostly from commenters who were unfamiliar with Alibaba and thought it was the anti-Big-tech

by Aurornis

4/3/2026 at 5:02:58 PM

My explanation is simpler and does not rely on assuming that anyone is an idiot. Or an edgy contrarian.

by Gracana

4/2/2026 at 5:09:48 PM

I'm not frustrated or disappointed, we have lots of models from Qwen already. We haven't really lost anything. And plenty of players only release "smaller" models anyway, so it's hardly unprecedented.

by zozbot234

4/2/2026 at 3:51:47 PM

How stupid somebody has to be to mix up Opus with Qwen?

by dev_l1x_be

4/2/2026 at 4:56:54 PM

OP didn't say about confusing Opus with Qwen but rather people being confused about Qwen3.6-Plus not being available as an "open weight" model available for self hosting.

by cieplok

4/2/2026 at 3:04:58 PM

> I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper.

There isn't, pretty much everyone wants the best of the best.

by cubefox

4/2/2026 at 4:10:13 PM

No. Right now I'm upset that Google has removed (or at least is in the process of removing) the Gemini 2.0 flash model. We use it for some pretty basic functionality because it's cheap and fast and honestly good enough for what we use it for in that part of our app. We're being forced to "upgrade" to models that are at least 2.5 times as expensive, are slower and, while I'm sure they're better for complex tasks, don't do measurably better than 2.0 flash for what we need. Yay. We've stuck with the GCP/Gemini ecosystem up until now, but this is kind of forcing us to consider other LLM providers.

by thraxil

4/2/2026 at 6:03:37 PM

this is one of the reasons im hearing more and more people are using open/locally hosted models. particularly so we dont have to waste time to entirely redo everything when inevitably a company decides to pull the rug out from under us and change or remove something integral to our flow, which over the years we've seen countless times, and seems to be getting more and more common.

products entirely disappearing or significantly changing will be more and more common in the llm arena as things move forward towards companies shutting down, bubbles deflating, brand priorities drastically reshifting, etc...

i think, we're at or at least close to a time to really put some thought into which pieces of your flow could be done entirely with an open/local model and be honest with ourselves on which pieces of our flow truly needs sota or closed models that may entirely disappear or change. in the long run, putting a little bit of thought into this now will save a lot of headache later.

by toofy

4/2/2026 at 8:11:46 PM

Yeah. Back when Gemma2 came out we benchmarked it and were looking at open models. For our use case though, while the tasks are pretty simple, we do need a pretty large context window and Gemini had a big lead there over the open models for quite a while. I'll probably be evaluating the current batch of open models in the near future though.

by thraxil

4/2/2026 at 6:20:57 PM

What’s interesting about this is that for previous technologies you could define a standard and demonstrate compliance with interfaces and behavior.

But with LLMs, how do you know switching from one to another won’t change some behavior your system was implicitly relying on?

by jimbokun

4/2/2026 at 6:42:33 PM

In case you don't know, Gemini 2.5 flash is hosted on DeepInfra. They also have 1.5 flash but not 2.0 flash.

I have no affiliation with DeepInfra. I use them, because they host open-source models that are good.

by elbear

4/2/2026 at 8:07:39 PM

Thanks. Yeah, for now we're moving to 3.1 flash lite as that's the new cheapest at $.25/1M and is also still "good enough". 2.5 flash is more expensive at $.30/1M (looks like Deep Infra charges the same as GCP/VertexAI for it). I might check them out for Gemma though. We benchmarked Gemma2 when that came out and it wasn't remotely usable for us largely because the context window was way too small. It looks like 3 or 4 might be worth evaluating though.

by thraxil

4/3/2026 at 3:31:08 AM

Xiaomi's mimo-v2-flash is great if you care about speed and performance - it's 1/10 the price of Gemini 3.1 Flash Lite and faster (on OpenRouter).

GCP does server other non-Google models, but I'm not sure what they have other than Anthropic models. I don't think Haiku is a great model though.

by nl

4/2/2026 at 3:19:26 PM

The OpenRouter usage stats indicate the opposite: https://openrouter.ai/rankings?view=month

by PhilippGille

4/2/2026 at 3:27:18 PM

OpenRouter usage is likely skewed towards LLMs that are more niche and/or self-hostable by solid hardware that's available, but most consumers don't have on hand. I can imagine Anthropic and OpenAI LLMs often get called directly from their APIs instead.

At least from my experience and friends of mine, we use OpenRouter for cases where we want to use smaller LLMs like Qwen, but when I've used ChatGPT and Claude, I use those APIs directly.

by jjice

4/2/2026 at 4:36:07 PM

Same, and my little SaaS is pushing more than 0.1% of the TOTAL volume of tokens on OpenRouter, so the reality is they’re TINY.

by senordevnyc

4/3/2026 at 9:06:02 AM

0.1% of OpenRouter is around 400 billion tokens per month or around $400k per month at a cost of $1 per 1 million input tokens, not counting output.

I think it's pretty disingenuous to call your SaaS little when it is projected to spend at least 5 million USD just on tokens and this is a low end estimate.

by imtringued

4/4/2026 at 3:23:58 PM

Their homepage says 30T tokens monthly, so 0.1% would be 30 billion.

And I pay way less than $1 per input token, especially when caching is taken into account.

EDIT: they updated it in the last day or two, now it says 70T, so I’m a little below 0.1% now. But seriously, the point stands, 70T tokens a month just isn’t that much in the global scheme. The big labs are pushing quadrillions each.

by senordevnyc

4/4/2026 at 3:26:23 PM

Ill sell you tokens for just 1 cent each, as many as you want. Bargain!

by cap11235

4/2/2026 at 6:39:22 PM

I use ChatGPT and Claude on OpenRouter, because it's just easier than buying credits on each platform separately.

by elbear

4/2/2026 at 4:25:29 PM

what happened around jan this year(26) that caused such a climb in usage?

by vorticalbox

4/2/2026 at 5:47:59 PM

Openclaw

by wcallahan

4/2/2026 at 3:15:52 PM

> There isn't, pretty much everyone wants the best of the best.

For direct user interaction or coding problems, perhaps. But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets, or as sub-agents called from expensive SOTA models.

For example, in Claude, using Opus as an orchestrator to call Sonnet sub-agents, is a popular usage "hack." That only gets more powerful, as the Sonnet equivalent model gets cheaper. Now you can spawn entire teams of small specialized sub-agents with small context windows but limited scope.

by Someone1234

4/2/2026 at 4:05:49 PM

Exactly.

I did create my own MCP with custom agents that combine several tools into a single one. For example, all WebSearch, WebFetch, Context7 exposed as a single "web research" tool, backed by the cheapest model that passes evaluation. The same for a codebase research

Use it with both Claude and Opencode saves a lot of time and tokens.

by alexsmirnov

4/2/2026 at 9:37:08 PM

I'd be interested in seeing the source for this if you have a moment

by hadlock

4/2/2026 at 4:05:44 PM

> But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets

Seems like a huge waste of money and electricity for processes that can be implemented as a traditional deterministic program. One would hope that tools would identify recurrent jobs that can be turned into simple scripts.

by thinkcontext

4/2/2026 at 5:38:29 PM

It depends on the specific task.

For example: "Here our dataset that contains customer feedback comment fields; look through them, draw out themes, associations, and look for trends." Solving that with a deterministic program isn't a trivial problem, and it is likely cheaper solved via LLM.

by Someone1234

4/3/2026 at 5:18:59 AM

It makes sense if the dataset is so large that LLM cost is a prohibitive factor. Otherwise a frontier LLM has the advantage of producing a better result.

by cubefox

4/2/2026 at 6:22:43 PM

That is a very complex, high level use case that takes time to configure and orchestrate.

There are many simpler tasks that would work fine with a simpler, local model.

by jimbokun

4/2/2026 at 3:54:58 PM

For coding I want the best. Both I and $work do lots of things besides coding where smaller models like qwen3.5-27b work great, at much lower cost.

by wongarsu

4/2/2026 at 10:05:50 PM

Not all tasks require models like opus. If they do not, then it is more efficient to use cheaper and faster models. For most of my tasks now I use the big kimi/qwen/glm models because they are cheap and good enough, if not even the smaller locals ones.

I would say that for a significant part of the current market open-source models are good enough to fill a part of it.

by freehorse

4/2/2026 at 3:17:00 PM

Ever hit your daily limit on Claude Code and saw how expensive it is to pay per token?

by joefourier

4/2/2026 at 11:10:43 PM

All the time now… it’s wild how little usage you get with Opus on the Pro sub now haha

by girvo

4/2/2026 at 3:11:56 PM

maybe there isnt, but as understanding grows people will understand that having an orchestration agent delegate simple work to lesser agents is significant not only for cost savings, but also for preserving context window space.

by sidrag22

4/2/2026 at 3:08:55 PM

That isn't true. In a Codex or Claude Code instance, sure... but those are not the main users of APIs. If you are using LLMs in a service for customers, costs matter.

by scoopdewoop

4/2/2026 at 3:50:43 PM

That's only because current models don't saturate people's needs. Once they are fast and smart enough people will pick cheaper ones.

by esafak

4/2/2026 at 3:09:25 PM

The market for API tokens is bigger than people like you and I (who also want the best) using then for code.

There are a lot of data science problems that benefit from running the dataset through an LLM, which becomes bottlenecked on per-token costs. For these you take a sample subset and run it against multiple providers and then do a cost versus accuracy tradeoff.

The market for API tokens is not just people using OpenCode and similar tools.

by Aurornis

4/2/2026 at 3:51:40 PM

Nope. I get very good results from GLM 5 and 5.1. I’m not working on anything so complex and groundbreaking that I need the best.

Coding is a rung on the ladder of model capability. Frontier models will grow to take on more capabilities, while smaller more focused models start becoming the economical choice for coding

by wolttam

4/2/2026 at 11:11:35 PM

GLM-5 is surprisingly good to be fair. Punches well above its weight IMO

by girvo

4/2/2026 at 3:16:12 PM

Everyone may want the best, but the amount of AI-addressable work outstrips the budget available for buying the best by quite a wide margin.

by regularfry

4/2/2026 at 3:23:45 PM

OpenCode allows for free inference tho.

by noman-land

4/2/2026 at 5:08:51 PM

Not really. It depends on the usecase. For private stuff I'm very happy to take what was SOTA a year or 2 ago if I can have it all running in my home and don't have to share any of my data with some sleazy big tech cloud.

The price is a concern too of course. But privacy is a bigger one for me. I absolutely don't trust any of their promises not to use data for training purposes.

by wolvoleo

4/2/2026 at 4:06:15 PM

I understand peoples reactions of Qwen team comparing against Opus 4.5 instead of 4.6. And them comparing against Gemini Pro 3.0 instead of 3.1. But calling it misleading is a bit of stretch in my eyes, people here are acting like we immediately forgot how previous generations performed just because a new version is released.

This field is going in a incredible pace, the providers release a new model every quarter or so. The amount of criticism is a bit overblown in my opinion. The benchmarks still look very good to me. I’ve used GLM-5 (latest is GLM-5.1) and Kimi K2.5, they are decent and gets the job done, so seeing how this model of Qwen performs compared to it is kinda impressive.

Also, why are so many pointing out the fact that this model is not open-weight as if this is their first time doing so. Qwen-3.5-plus, Qwen-3-Max is also closed source. This is not something new.

I think Qwen trying to catch up to the SOTA models is still healthy for us, the consumers. Sure, its sad news that this version is closed-weight, but I won’t downplay their progress.

by Alifatisk

4/2/2026 at 4:19:35 PM

I think it’s more the principle of deception that upsets people. Imagine if Apple released a new iPhone and publicly compared its specs to some previous gen Android. It’s not in good faith.

by nickvec

4/2/2026 at 5:28:08 PM

They compared their M-series chips to older Intel Macs for a while, likely to target users who were still on Intel chips. If they released a lower cost iPhone and compared it to a previous gen Android I could see the reasoning for it. It's not deception if it's a valid comparison and people just fail to understand what's being compared.

Now, is it mildly deceptive because all of the companies using incredibly confusing naming conventions for their models? Maybe!

by threetonesun

4/2/2026 at 6:35:22 PM

Apple continues to compare to prior versions of Apple Silicon. I suspect it is a mix of trying to provide useful, realistic upgrade information and numbers that still sound good for those not paying attention.

I don't think any org doing this is necessarily being deceptive, so long as there's some reasonable basis for the chosen comparable(s).

For example, comparing a new iPhone to a prior Android phone might make sense if the install base is considerably large and Apple is targeting the cohort for user acquisition. (~"These benchmarks are not for you.")

The community will always run the numbers and get the clicks for the benchmarks not filled in by the 1st party. I noticed what appeared to be some movement from Apple in content they've produced to get ahead of this with recent product content.

by bredren

4/3/2026 at 12:25:31 AM

Apple doesn’t compare themselves to Android the same way Coke doesn’t compare themselves to Pepsi.

by conception

4/3/2026 at 5:19:44 AM

If that was true everyone would have tuned out Sam Alman and ilk. It’s selective.

by gmerc

4/2/2026 at 11:42:32 PM

That's because Android phones release every year. If they released every couple months then I doubt it would appear that way.

by array_key_first

4/2/2026 at 4:28:53 PM

Why are we so quick to call it deception? Their figure is quite clear. They aren't fiddling with the graph or hiding the labels, they are clearly stating which models it compares against. But I agree on the sentiment that the standard practice should be to bench against the latest SOTA models.

by Alifatisk

4/2/2026 at 5:02:16 PM

Even if openly stated, why would they be comparing to a previous generation if not for deception?

Laziness? Lack of time? It's not like the latest generation of the SOTA models were released yesterday.

by patates

4/2/2026 at 6:12:37 PM

Opus 4.5 is already pretty good.

Opus 4.5 is $25/m output tokens.

This is at most $6/m output tokens.

That's ~1/4 the price.

by Garlef

4/3/2026 at 5:54:54 AM

I can't imagine Qwen3.6 Plus would be more expensive than the 3.5 Plus model. That was $2.4/m output initially and was reduced to $1.56/m at <256k context ($1.95/m above).

by Aerroon

4/3/2026 at 6:35:18 AM

I've taken the numbers from the alibaba pricing page which says $2-$6

by Garlef

4/2/2026 at 4:24:20 PM

Pretty solid Pelican: https://gist.github.com/simonw/ca081b679734bc0e5997a43d29fad...

I used the https://modelstudio.alibabacloud.com/ API to generate that one, which required signing up for an account and attaching PayPal billing - but it looks like OpenRouter are offering it for free right now so I could have used that: https://openrouter.ai/qwen/qwen3.6-plus:free

by simonw

4/2/2026 at 6:40:22 PM

Pelican is drafting rear peloton

by bredren

4/2/2026 at 6:47:55 PM

they're going to start training a pelican riding a bike specifically on these models soon. it's the key global benchmark!

by manc_lad

4/3/2026 at 8:04:26 AM

They will have a long time ago. By now Simon's meme will be well represented in training sets.

by teruakohatu

4/2/2026 at 2:50:54 PM

Worth noting that this model, unlike almost all qwen models, is not open-weight, nor is the parameter count exposed. Also odd that it is compared against opus 4.5 even though 4.6 was released like 2 months ago.

by jgbuddy

4/2/2026 at 2:53:34 PM

They said in the last paragraph[0]:

"[...] In the coming days, we will also open-source smaller-scale variants, reaffirming our commitment to accessibility and community-driven innovation. [...]"

[0] https://qwen.ai/blog?id=qwen3.6#summary--future-work

by pferdone

4/2/2026 at 3:00:37 PM

> we will also open-source smaller-scale variants

In other words, like GP said, this Qwen3.6-Plus model is not open-weight unlike the other Qwen models.

by deaux

4/2/2026 at 3:18:44 PM

In a practical sense, I'm primarily interested in small to medium sized models being open. I think that might be common sentiment.

However, my hope is that there will be at least somewhat competitive big and open models as well, from an ethical/ideological perspective. These things were trained on data that was provided by people without their consent, so they should at least be be publicly accessible or even public domain.

by dgb23

4/2/2026 at 3:25:19 PM

Qwen3.5-Plus is the largest variant of the open weight Qwen3.5 model, expanded with a 1M context window and fine-tuned on the Qwen-native harness’ specific tools.

by thepasch

4/2/2026 at 3:04:27 PM

> unlike almost all qwen models

Almost all means there have been ones before that were not open. So, no contradiction there.

by pferdone

4/2/2026 at 3:10:59 PM

> unlike the other Qwen models

Please send the download link for qwen 3.5-plus.

Also, who cares? If you have the hardware to run a ~400b model i don’t think you count as a home user anymore.

by kennywinker

4/3/2026 at 7:11:00 AM

So the Qwen3.6-Plus model is like the Qwen3.5-Plus model?

by sroussey

4/3/2026 at 5:58:13 AM

Qwen 3.5 Plus was closed weights too. It was supposedly the same model as Qwen3.5 397B, just with 1 million context size and only available on the API and their website.

by Aerroon

4/2/2026 at 4:21:02 PM

If Opus 4.6 was only released two months ago, then it seems reasonable that Qwen hasn't finished fully comparing against the latest Opus.

by cpburns2009

4/3/2026 at 1:54:22 PM

Well don't we have numbers from both models on these benchmarks already? What else is there to do except include them in the table?

by jgbuddy

4/5/2026 at 12:36:27 AM

Do we? I admit ignorance in this.

by cpburns2009

4/2/2026 at 3:29:46 PM

I wouldn't say "almost all" seeing as -MAX and -Omni models were always closed.

by zozbot234

4/2/2026 at 3:53:42 PM

I'll diverge from some of these comments, I don't find it misleading to compare to Opus 4.5.

I can remember how good Opus 4.5 was. If I'm considering using this, it's most informative to me to compare to the model it's closest to that I have familiarity with.

I'm obviously not switching to this if I want the best model. I'm switching if I'm hopeful that the smaller versions are close to it, or if I want to have more options for providers, or for any other reasons unrelated to getting the highest quality responses possible.

by furyofantares

4/2/2026 at 4:07:33 PM

Exactly this. If you can get something close to Opus 4.5 for free, that's noteworthy. I may not use it for the most critical pieces of my app, but not everything I do is galaxy-brain coding.

by bensyverson

4/2/2026 at 7:06:46 PM

Yes, honestly, Opus 4.6 and GPT 5.4 were mostly not really noticeable improvements over 4.5 and 5.3 respectively. If we were stuck at 4.5 levels but at 1/10th of the price, I'll take it.

by cmrdporcupine

4/2/2026 at 7:13:59 PM

I find 4.6 pretty noticeable upgrade, but it might be the 1M context. I'm interested in how the 1M context works out with Qwen.

by furyofantares

4/2/2026 at 9:00:22 PM

From Qwen-3-max thinking, I remember the inference becoming veeery slow as you pushed towards 1M context, already at 300k tokens you would notice the degradation. But of course, I was using Qwen Chat, so could be a resource allocation thing.

by Alifatisk

4/2/2026 at 7:20:00 PM

I found it worse, in a very clear way.

by nwienert

4/2/2026 at 3:30:36 PM

I’m surprised that people are surprised. Qwen has been hosting private plus and max variants for a while now.

by linolevan

4/2/2026 at 3:11:32 PM

The agent benchmarks here are interesting but I'd love to see how Qwen3.6-Plus handles long-horizon tasks where it needs to recover from its own mistakes. Most agent evals test the happy path. The hard part is when the model takes a wrong action at step 3 and needs to recognize and backtrack at step 15. Has anyone stress-tested this in a real dev workflow?

by Caum

4/3/2026 at 4:03:58 AM

In my experience, whenever an LLM says “wait, but actually” or some variant thereof is when you need to step in before it goes totally off the rails.

by rogerrogerr

4/3/2026 at 3:20:10 AM

23/25 on my agentic benchmark for the free version on OpenRouter. That's a great score - only 4 models have ever scored higher.

But there are open models that also score 23/25 including Qwen 3.5 27B.

by nl

4/2/2026 at 2:57:07 PM

> In the coming days, we will also open-source smaller-scale variants, reaffirming our commitment to accessibility and community-driven innovation.

by karimf

4/2/2026 at 11:58:35 PM

For anyone that believes Chinese labs will stop open sourcing their models, let me tell you why that won't happen.

First, try signing up for Z.ai's coding plan. I know how to but I bet you won't be able to.

The absolute disaster that is Z.ai's internet presence shows that these small labs have no ability to market themselves and drive direct sales.

For marketing, they lack capabilities, and releasing open models is the only way for them to remain in the conversation.

For sales, they rely on distribution via OpenRouter, OpenCode etc. Interest with their users is driven by open model performance.

Open sourcing for Chinese labs is not some large national scheme. It is their only way to commercialization.

by try-working

4/5/2026 at 11:42:42 AM

> First, try signing up for Z.ai's coding plan. I know how to but I bet you won't be able to.

What's the issue with signup up for Z.ais coding plan?

by Alifatisk

4/6/2026 at 2:55:07 PM

you can't find it

by try-working

4/6/2026 at 5:01:03 PM

1. Go to https://z.ai

2. Top right corner, "API"

3. https://z.ai/subscribe

by Alifatisk

4/3/2026 at 12:00:47 AM

Well, can’t they just direct their model to do some marketing for them?

Only partially tongue in cheek - if it’s not good at marketing itself, that seems like a red flag for capabilities?

by rogerrogerr

4/3/2026 at 12:46:05 AM

China can't get good chips. But I don't understand why they can't license their closed source models to US inference providers so we can get more than 80% reliability on their models on OpenRouter.

by polski-g

4/3/2026 at 1:05:40 AM

I think they already are licensing their biggest models to third party inference providers.

by try-working

4/3/2026 at 12:20:46 PM

Agree, it’s surprising that these companies don’t thoroughly test their own workflows to ensure a smooth and seamless user experience.

by sumedh

4/3/2026 at 2:32:40 AM

I've gone through about 500M tokens on this model already. They've got some free inferencing options (such as on openrouter) ... $0 is hard to beat and it's creating not-crap.

by kristopolous

4/3/2026 at 3:29:15 AM

How can it be free? What do you mean?

EDIT: Ah, I see. Some kind of promotion. Pretty cool.

by edg5000

4/3/2026 at 4:09:57 AM

Also the Qwen cli, their vibe coding agent, allows a thousand free requests a day. In practice it's about ~300M tokens for me

It's very generous

I know I could certainly pay anthropic $1500 a day for my use and they'd be delighted... I'd rather pay $0 ... Just a personal preference

by kristopolous

4/3/2026 at 4:30:11 AM

Hahaha

by edg5000

4/2/2026 at 2:46:42 PM

The benchmarks provided are for Opus-4.5, not for the latest Opus-4.6 and Qwen is still lagging in a lot of them.

by srmatto

4/2/2026 at 2:51:12 PM

There is no reason to benchmark against Opus 4.5 when Opus 4.6 has been out so long, other than to be misleading.

by Aurornis

4/2/2026 at 4:18:59 PM

I can see reasons, among others that 4.5 was the one established as they were preparing this version. "So long" is merely 2 months ago, and Qwen 3.5 was barely released less than 2 months ago. They were likely already working on finalizing 3.6 before 3.5 official launch, and as 4.6 came out.

In any case, aside Claude fanboyism, having other plays inch closer to similar performance is always useful. Even if they are "6 months behind" as the pace slows down, this guarantees that there's no huge moat and they'll eventually either get to where the SOTA is, or the difference wont be that big.

I'd rather put fewer eggs in 2-3 big player baskets.

by coldtea

4/2/2026 at 2:50:55 PM

And it seems they've decided to go closed-source for their largest, best models.

by thegeomaster

4/2/2026 at 3:25:43 PM

3.5-plus was also only available via api. I don’t know what the long term business model for open weights is, I hope there is one, but it seems foolish to assume that companies will be willing to spend millions of dollars of compute on an asset worth zero in perpetuity.

by FuckButtons

4/2/2026 at 5:36:46 PM

The business case is to salt the earth for new competitors, coupled with marketing.

by vidarh

4/2/2026 at 3:35:16 PM

They've always had closed-source variants:

- Qwen3.5-Plus

- Qwen3-Max

- Qwen2.5-Max

etc. Nothing really changed so far.

by kgeist

4/2/2026 at 4:15:05 PM

They always did that. Did they say anywhere they'd open all their models? They still have a business.

by coldtea

4/2/2026 at 3:01:57 PM

Quite strong results in the benchmarks but why Gemini 3 Pro instead of 3.1? Why only for a few of the benchmarks? Why is OpenAI not there in the coding benchmarks? Why Opus 4.5 and not 4.6? Just jumps out into my eye as a bit strange.

As always, we'll have to try and see how it performs in the real world but the open weight models of Qwen were pretty decent for some tasks so still excited to see what this brings.

by eis

4/2/2026 at 3:01:35 PM

Just more evidence that the B tier models are six months behind. Ultimately that’s good. Opus 4.6 level intelligence will be cheap later this year!

by woeirua

4/3/2026 at 5:52:30 PM

Most agent work focuses on task completion. Browse the web, fill out the form, and/or write the code. The harder problem is social agency, where the AI has to decide whether to participate at all. We built a cheap model gate that reads the conversational dynamics of a group chat before the expensive model runs. Wonder how Qwen3.6 performs in these nuances cases.

by mtrifonov

4/3/2026 at 3:35:17 AM

Do they have an API where you can control the chat template or at least just put everything in the system prompt? This way you can control everything including the tool calling syntax. Even if you use the trained tool syntax, it allows you to control the tool system prompt which you may want to tweak. With DeepSeek this is all possible. An undocumented feature, great for harness builders. Anybody got info on Qwen regarding this?

by edg5000

4/3/2026 at 6:01:04 AM

Can't you do that on OpenRouter? You can set a system prompt there. Is that insufficient for what you had in mind?

by Aerroon

4/3/2026 at 8:15:16 AM

They are talking about the chat template and not the system prompt. With current gen models, the system prompt is only part of a larger pretext that is passed to the model at the start of the "chat". The models are trained on a specific chat template with things like tool lists, reasoning budget, special feature flags and the "system prompt" formatted in a certain template.

by breisa

4/2/2026 at 10:24:38 PM

I wish these AI vendors would quit publishing comparisons with the previous generation of their competitors's models. It's just such a glaringly bad look and no one is fooled by it, even if their achievements deserve praise in their own right. The Qwen models are great and don't deserve the reputational hit that comes from dodgy marketing tactics.

by davesque

4/2/2026 at 3:33:06 PM

I hope their open source variants are just as good, having a 1 million token window for a fully offline model would be VERY interesting.

by giancarlostoro

4/2/2026 at 3:44:42 PM

I don't know how well it performs, but you can extend Qwen3.5 to 1 million token context using YaRN. Also, Nemotron 3 Super was recently released and scales up to 1 million token context natively.

by sosodev

4/2/2026 at 10:29:24 PM

3.6 Plus seems to be simply a refined/more consistent 3.5 Plus: https://aibenchy.com/compare/qwen-qwen3-5-plus-02-15-medium/...

by XCSme

4/3/2026 at 3:22:01 AM

Has anybody done serious agentic work (e.g. using a CLI harness or simmilar) with 3.5 Plus/3.0 Max and such? How does it compare against Opus with Claude Code? I've used the chat quite a bit and I can't say at this point.

by edg5000

4/2/2026 at 4:24:59 PM

It hallucinates a lot more then Sonnet or even MiniMax M2.5. Especially in tool calls, it would end up duplicating the content in code files and then realising later and getting stuck in a loop.

by wg0

4/2/2026 at 8:21:54 PM

My initial experiments are not encouraging. I have a basic planning prompt that includes instructions not to edit any files or implement anything. Qwen-3.6-Plus will consistently ignore that completely and proceed with implementation. I expect that kind of behavior from small models I run locally, not a hosted closed model claiming to compete with the frontier models.

by noelsusman

4/2/2026 at 9:22:50 PM

> It hallucinates a lot more then Sonnet or even MiniMax M2.5.

Ugh, that's not good.

I evaluated Kimi K2 a while back for some text understanding -> summarisation tasks, and of the 100 tasks it hallucinated about 30% of the output. :( :( :(

by justinclift

4/3/2026 at 9:09:17 AM

> I evaluated Kimi K2 a while back

I guess that it was Kimi K2-Instruct, the first model (or it's fine-tune) in the lineup of Kimi-K2 models. And I remember trying it just for the sake of curiosity, and... except for the almost total absence of the sycophancy and "sugar syrup" in it's outputs, it was not very good at the time. Right now though, if you're still interested in this model family, you could look at Kimi-K2.5 which is way better.

That said, it's still not perfect, and to be honest, looking where things are going with LLMs right now I prefer the use of my own brain (local private inference with power consumption of ~20-25W, having a capability for continuous learning and performing real-world tasks) to the use of any "AI" model (including proprietary models such as Claude 4.6 Opus, Gemini 3.1 Pro and others).

: )

by dryarzeg

4/2/2026 at 6:01:09 PM

Looking forward to when this gets on Bedrock. I built an app with a niche AI agent and to this point only Sonnet is really good enough for our use case, but its expensive!

by gburgett

4/2/2026 at 7:12:05 PM

Try using Grok 4.1 reasoning. It's crazy cheap, and really it's not that bad.

by swalsh

4/3/2026 at 1:49:32 AM

Sure, it might try to subtly steer you towards fascism, but other than that, it's great.

by sdenton4

4/2/2026 at 3:47:34 PM

I would love to hear from people using both (Claude Code OR Codex) AND (Qwen) and their experience with Qwen models, are they on par, or how far are they?

by throwaw12

4/2/2026 at 3:57:30 PM

I switch between Claude Code (Opus/Sonnet) and Qwen (OpenCode, OpenClaw) multiple times throughout the day and Qwen 3.5 is really nice. I do also use KimiK2.5 and GLM5 pretty often too and I'm starting to get a sense that the agent tool is becoming a little more important than the model with these level of models. As long as tool calling and prompt quality is all configured correctly by the provider.

by scottcha

4/3/2026 at 3:37:21 AM

What are the reasons for switching? Personally I got into the habit of doing a bit of a round robin with Codex/Claude (CLI) and then DeepSeek and Qwen web chat. And Claude in web chat. I like to switch just to learn the differences, otherwise I'd never know what the other models can do. But I still feel attached to Opus, but this can be fammillarity. If I only had Qwen maybe it would be effectively identical at the end of the day. Hard to say.

by edg5000

4/3/2026 at 6:34:09 PM

Mine are pretty unique since we optimize the energy for and run an inference service api so forces me to dogfood alot of different options.

by scottcha

4/2/2026 at 9:31:14 PM

[dead]

by danelliot

4/3/2026 at 10:32:38 AM

[dead]

by avib99

4/2/2026 at 2:54:01 PM

How convenient of them to compare themselves to the last generation Opus and GPT models to make their model look better than it really is.

by Art9681

4/2/2026 at 4:25:24 PM

It is no longer available on OpenRouter. They say "going away on 3-March", but it's already gone!

by zkmon

4/2/2026 at 3:13:21 PM

Does anyone have experience with Alibaba's coding plan? Not that I'm very tempted at $50/month...

by esafak

4/2/2026 at 5:48:21 PM

A bit off-topic but I’m on the legacy Lite plan (now discontinued), and it’s more than enough for hobby projects. The main draw is the generous request-based quota (18k requests/month) rather than a token-based one.

This means a 100k token request counts the same as a 100-token one. I’ve made about 8000 requests in the last two weeks, averaging around 80k tokens per request. It feels like they’re subsidizing this just to gather data on agentic workflows.

On the downside, the speed is mediocre (15–30 tg/s for GLM-5), and I’ve seen the model glitch or produce broken output about 10 times out of those 8k requests.

by usagisushi

4/3/2026 at 12:36:01 AM

They claim SOTA but are beaten by last gen Opus om every metric?

This one seems weird

by zwaps

4/3/2026 at 6:11:47 AM

ignoring gpt 5.4! I feel bad for people who have not even tried it. for the same 20$ I pay to openai and anthropic, I get significantly more from openai

by throwaway911282

4/3/2026 at 11:01:35 PM

GPT 5, 5.1 were bad which affected its reputation and most people went with Claude because it actually worked. There is no reason to switch if you are happy with Claude.

by sumedh

4/3/2026 at 2:39:03 PM

Same for me. But I tried the gpt business account for 35€. Did beat Claude in all instances.

by rurban

4/2/2026 at 5:07:42 PM

Nice, I hope there will also come a small open version of it.

by wolvoleo

4/2/2026 at 10:39:31 PM

i've been fan of qwen for quite some time, awsome!

by adinhitlore

4/2/2026 at 9:54:59 PM

Qwen free plan is still good.

you get a generous token limit.

by dzonga

4/2/2026 at 2:55:19 PM

It's not open weights so I'm not interested.

by MarsIronPI

4/2/2026 at 3:01:34 PM

Not really interested in using models hosted on alibaba cloud.

Like Qwen local for it’s privacy, but I trust the privacy of Google/OpenAI/Anthropic more than alibaba.

by daft_pink

4/2/2026 at 3:17:51 PM

I had the exact opposite reaction. I stopped using OpenAI/Google a while ago due to privacy and moved to local Qwen, now I'm considering using Alibaba cloud. You know Google and OpenAI are going to share everything with the US government and Western ad networks. But with Alibaba, who cares if the CCP & Chinese ad networks have a comprehensive profile on me? From a pragmatic perspective it's much better for (outcomes related to) privacy.

by the_pwner224

4/2/2026 at 3:32:06 PM

so if China has the data good, us has the data bad, got it lol.

us actually has laws around this and they arent sharing very much with thr us gov today. china shares 100% as required by law. and neither care much about "how long do i cook eggs for", but they do care about code generation a lot.

by zobzu

4/2/2026 at 3:52:09 PM

From an espionage perspective your own government is the safest. But from a civil rights perspective your own government is your most immediate threat. China isn't going to arrest me for my opinions on Netanyahu, my own government could

And the US government has repeatedly shown that it is very interested in collecting all the data available, just like China. In China this is simply done in the open while the US has a veneer of protection for citizens. But where the data collection is forbidden by law they either ignore the law or ask another five eyes member to do the spying and share the results. Both are well documented

by wongarsu

4/2/2026 at 6:50:33 PM

> China isn't going to arrest me for my opinions on Netanyahu, my own government could

I don't know whether you really believe this or it was an off the cuff remark. China is not going to tell you why they plan to arrest you. China is not a benevolent dictatorship.

by malshe

4/2/2026 at 8:09:08 PM

The actual offense isn't important, nor is whether they arrest me or just kick in my door and look through my stuff. What matters is that if I'm not in China I don't have to particularly care what Chinese officials think about me. My local police can kick in my door, Chinese police can't. At least as long as I stay out of China

by wongarsu

4/2/2026 at 8:17:02 PM

As a matter of fact, there's been multiple reports of the Chinese doing informal, heavy "policing" of their own citizens abroad. Even if you aren't Chinese or linked to China yourself, this does affect the strength of that particular argument.

by zozbot234

4/2/2026 at 9:25:50 PM

> ... this does affect the strength of that particular argument.

It doesn't really affect the strength of that particular argument.

And you're being misleading, seemingly on purpose. Please don't.

by justinclift

4/2/2026 at 3:45:56 PM

> so if China has the data good, us has the data bad

It's not that, it's about relative risk to your own life. Asking questions about "DEI" for example is much more likely to have adverse effects on your life if you ask Grok or an OpenAI chatbot, though still not that likely.

by thereitgoes456

4/2/2026 at 3:42:05 PM

As with all arguments equivalent to "I have nothing to hide, so I have nothing to fear," it may be true now, but it may not be true later. The only certainty is that this will not be your call.

by CamperBob2

4/2/2026 at 3:43:32 PM

Agreed

by the_pwner224

4/2/2026 at 5:19:47 PM

So I guess if it’s your personal data that’s up to you, but if you have private client data and client of your client data and that’s the fundamental reason why you are doing local ai, I can’t imagine moving from qwen local to qwen alibaba after not choosing google/anthropic/openai

by daft_pink

4/2/2026 at 3:10:05 PM

> Like Qwen local for it’s privacy, but I trust the privacy of Google/OpenAI/Anthropic more than alibaba.

None should be trusted, unless you are running them locally.

by rvz

4/3/2026 at 3:42:44 AM

I build custom harnesses (like many of us) and I genuinely think Anthropic will eventually sue their customers if they detect they are selling competing harnesses (competes with their vertically integrated offerrings).

I feel Alibaba and DeepSeek see themselves more as infra. No urge to control the stack and litigate competition out of existence.

by edg5000

4/4/2026 at 9:04:27 AM

[dead]

by bezlant

4/3/2026 at 1:48:26 PM

[dead]

by pratyushsood

4/3/2026 at 2:13:50 PM

[dead]

by 0xqlive

4/2/2026 at 4:13:43 PM

[flagged]

by techpulselab

4/3/2026 at 1:08:49 AM

[dead]

by Sim-In-Silico

4/2/2026 at 4:03:33 PM

[dead]

by maxothex

4/3/2026 at 4:34:23 AM

[dead]

by Vivolab

4/3/2026 at 12:16:57 AM

[dead]

by volume_tech

4/3/2026 at 5:29:30 PM

[dead]

by geenkeuse

4/2/2026 at 6:00:50 PM

[flagged]

by johnwhitman

4/2/2026 at 5:42:12 PM

[dead]

by kanehorikawa

4/3/2026 at 8:05:21 AM

[flagged]

by EdoardoIaga

4/2/2026 at 5:13:24 PM

[flagged]

by shubhamgarg86