LongCat-2.0, a large-scale MoE model with 1.6T total and 48B Active

6/30/2026 at 1:44:37 AM

> The training and deployment of LongCat-2.0 are built on large-scale clusters of tens of thousands of AI ASIC superpods. Compared to the mature Nvidia GPU ecosystem, the supporting software community is still less developed. We have therefore put significant effort into building a stable, secure, and scalable infrastructure.

This is the real news story. It looks like they may have used Huawei Ascend 910C chips: https://nitter.net/teortaxesTex/status/2071708141037781407#m

by gardnr

6/30/2026 at 4:35:14 AM

If they really managed this from pre-training a 1.6 T parameter model through to post-training without NVIDIA, Dwarkesh Patel got what he wanted.

by BoorishBears

6/30/2026 at 6:23:40 AM

It is interesting how much people doubt Huawei’s capabilities in this area - Jensen does not (in the dp interview) - of course you can dismiss this as him talking his own book.

by chvid

6/30/2026 at 4:45:49 AM

Who? What did he want?

by Jabrov

6/30/2026 at 5:31:14 AM

Dwarkesh Patel has AI/ML guests on his podcast. BoorishBears may have been referring to the Jensen Huang episode where they discuss TPUs: https://youtu.be/Hrbq66XqtCo?t=982

by gardnr

6/30/2026 at 11:46:18 AM

Specifically Dwarkesh couldn't understand that GPUs are not enough: it's GPUs plus multiple ecosystems to leverage them at massive scale during training vs inference.

Instead of giving China open access to US controlled chips and creating a misalignment between labs that want to train a model on whatever is best, and hardware manufacturers that need labs to suffer the growing pains for their new ecosystems built from scratch... we removed the option from the board and now they've beat the growing pains decisively, with a speed that reflects the non-optionality.

by BoorishBears

6/30/2026 at 2:55:12 PM

I don't listen to Dwarkesh but I'm aware of who is and his influence. I was baffled that he could not understand it...Don't know if he had his own agenda or just not intelligent (which is scarey for someone with influencec), but I sensed the frustration in Jensen Huang for something that is fairly obviously.

The same scenario happens all the time when the US takes away something from China and China doubles down, gets into survival mode and then beats the US.

by fma

7/1/2026 at 3:46:26 AM

The Chinese ecosystem has not caught up; in fact, it's falling further behind, due to export restrictions on semiconductor manufacturing equipment. Even if America sold China all the chips Nvidia wants to, the CCP would still develop chips as quickly as possible as a matter of supply chain security.

by fwipsy

7/1/2026 at 4:06:46 PM

While some years might pass until they will really catch up, that does not prevent them to find workarounds for their weaknesses.

For example, they recently have demonstrated a supercomputer faster than any of the US supercomputers.

Unlike the recent European supercomputers, which like the US supercomputers have been built by buying racks from HPE Cray, because China was not allowed to buy such things they have developed their own custom CPUs, designed in China, which have surpassed in throughput the AMD GPUs used in the fastest US supercomputers.

The Chinese CPUs match in memory bandwidth per socket the latest AMD MI355X GPUs (8 TB/s), while being significantly faster than the older AMD GPUs installed in the US supercomputers.

While the purpose of the new Chinese supercomputer is mainly for scientific/technical computing tasks that need high FP64 throughput, the CPUs used in it also have high enough BF16/INT8 performance and memory bandwidth and interconnection bandwidth (1.6 Tb/s directly from each CPU socket) to be able to train any big LLM.

So the evidence does not show China falling behind, but at least in certain directions they are already exceeding the performance of what they have been forbidden to buy.

For something like training a big LLM, the only disadvantage of the current Chinese devices is a lower energy efficiency, of only 65% to 70% of that of the best NVIDIA GPUs.

However that is not really a problem for China, as they have abundant cheap energy.

by adrian_b

7/2/2026 at 9:38:44 AM

Moving forward requires two parties with two very different financial incentives to cooperate in a way that harms the incentive of the other: the manufactures making the chips, and the companies using the chips to build AI.

Even if the Chinese government tells them to prioritize homegrown solutions, the A effort and players at labs are going to be on pushing the frontier, and the B effort/players goes to solving the teething problems

by BoorishBears

6/30/2026 at 8:39:39 PM

huh? who knows what they did, it's not like any of it is audited. it sounds like they started with deepseek v4 pro, and made a bunch of random changes to it, and called the parts of it different things?

by doctorpangloss

6/30/2026 at 9:52:05 PM

The preview version was released on the same date along with deepseek v4 pro.

by MikuMikuMe

6/30/2026 at 1:13:11 PM

[flagged]

by jingpostmedia

6/30/2026 at 2:15:06 AM

I just tested it with a slightly tricky question

  > If you could run a nuclear reactor with U-235 as fuel or Pu-241 (both mixed with 95% U-238), which one would you choose and why?

For a human this would not be tricky at all. For an LLM it could be, because this question certainly does not exist in any sort of training, because Pu-241 does not exist in pure form, it only exist as a minor component of reactor-grade plutonium, where Pu-239 would dominate, with Pu-240 coming second and Pu-241 coming third.

In any case, LongCat-2.0. gave a very well reason but incorrect answer that Pu-241 is preferable.

I then tested on Qwen 3.7 Plus, and it correctly answered that U-235 is preferable because of its much higher delayed neutron fraction. I then went to Gemini Flash, which answered the same, with much more confidence, and with much stronger arguments, and the speed of the answer was much higher.

Overall I rate Gemini Flash the best, Qwen 3.7 Plus an acceptable second, and LongCat-2.0 an ok'ish third, if you have nothing better.

by credit_guy

6/30/2026 at 2:46:39 AM

I am not a physicist but perhaps your question was leading more than you expected? I would take the question to pre-suppose I have an abundance of the stated material, ignoring practical realities of refinement. If I did have fully pure Pu-241, would that be a better fuel than U-235?

Or stated another way, "If you could run a generator on gasoline or jet fuel, which one would you choose and why?" I would answer jet fuel owing to slightly higher energy density and purity of the material - likely leading to a cleaner burn. Which would ignore that jet fuel is going to be a multiple of the gasoline price.

by 3eb7988a1663

6/30/2026 at 4:13:36 AM

If I did have fully pure Pu-241, would that be a better fuel than U-235?

Also not a physicist, but I assume from the fact that the OP is asking the LLM this question to trip it up, the point is that U-235 is better even if you have an abundance of both. It's scarcity of Pu-241 leads to the lack of data in training, not that it's actually better.

by onion2k

7/1/2026 at 4:21:28 PM

I am lazy to search for a more authoritative source, but Wikipedia says that Pu-241 has a greater neutron absorption cross section than Pu-239 and the same probability of fission after absorbing a neutron. It would also generate a slightly higher amount of energy per fission event, than Pu-239.

This means that it would actually be a better fuel than Pu-239 for a fission reactor and only its scarcity prevents its use.

The only disadvantage versus Pu-239 is its short half-life, of 14 1/3 years. This means that it cannot be stored for a long time, so it must be consumed as a fuel soon after it is produced, to avoid losses.

Pu-239 has a lower delayed neutron fraction than U-235, which makes the control of a Pu-239 fission reactor more difficult.

But according to:

https://www-nds.iaea.org/sgnucdat/a6.htm

Pu-241 has almost the same delayed neutron fraction as U-235 (0.016 vs. 0.0162), so that is not a serious disadvantage for it.

Both Pu-239 and Pu-241 produce more neutrons per fission event than U-235. This simplifies some things, by allowing a reactor to work with less fuel or less-enriched fuel, but it complicates the control, because there is a greater risk of instabilities.

The truth is that an LLM cannot say which is a preferable fuel between U-235, Pu-239 and Pu-241. It would be possible to design fission reactors that work fine for any of these 3. The best choice depends on economical factors, not on technical feasibility factors.

The only real reason why Pu-241 will never be used is that its production yield when irradiating uranium with neutrons is too low in comparison with Pu-239, so it would be too expensive.

by adrian_b

6/30/2026 at 5:43:45 AM

Again, really speaking out of my depth, but if there is a lack of plutonium training data, I would assume the LLM answer would be the far more commonly described U-235. To respond otherwise means there is some existing association with Pu-241 being better.

by 3eb7988a1663

6/30/2026 at 8:25:46 AM

That's 2.5% more neutrons, surely that must be better!

by Skwid

6/30/2026 at 8:08:17 AM

> Which would ignore that jet fuel is going to be a multiple of the gasoline price.

That doesn’t sound right. If my Duck Fu is any good, jet fuel is currently going due US$3.00 per gallon, avgas (leaded petrol) at $3.30, and gasoline at $2.88 gallon.

There’s nothing much special about jet fuel, it’s just kerosene, same as RP1 (Rocket Propellant), heater fuel, and lamp oil you can buy from the hardware store, with a touch of something to stop it gelling at low temperature if I understand correctly, but also jet fuel tanks are heated if I recall correctly.

I believe standard diesel fuel will also works in jet engines, but kerosene is cheaper.

I’m not in the US, and if I understand correctly their gasoline (petrol) price can vary greatly from state to state, California being the worst? Is that right?

by tryagainian

6/30/2026 at 6:23:02 PM

in 2012 i owned a car that could run 100 octane fuel, and that was $9 a gallon. a few more octane and you get Jet-A minus the additives.

according to jetfueltracker jetA is about $2 more than 87 octane right now and about $1 more than 93 octane. and still somehow cheaper than diesel.

I'm not used to seeing jet fuel this cheap, luckily there's none near me to waste money on.

by genewitch

7/1/2026 at 1:31:54 PM

> in 2012 i owned a car that could run 100 octane fuel, and that was $9 a gallon. a few more octane and you get Jet-A minus the additives.

So? Diesel and petrol (gasoline) are different fuels. Comparing them by RON is irrelevant.

by tryagainian

6/30/2026 at 8:10:06 AM

It’s tough to write good questions for LLM evaluations. They’re so good at picking up subtleties they can pass a multiple choice test when given only the answers and not the questions.

by teaearlgraycold

6/30/2026 at 7:30:58 AM

A higher delayed fraction of neutrons makes it easier to control the reactor. Without delayed neutrons you can only make a bomb.

by cyberax

6/30/2026 at 2:04:29 PM

post says delayed neutron fraction, I presume if were enriched with pu-241 the band between critical and prompt critical would be non-existent and you'll have made a bomb.

But I'm just riffing off the parent poster's text.

by nullc

6/30/2026 at 5:09:23 PM

"For a human this would not be tricky at all."

Which humans have you been hanging out with? :-D

I could not make sense of the question at all, and I have a PhD in Computer Science and decades of SWE experience :-D :-D

by mlmonkey

6/30/2026 at 9:37:12 AM

> For a human this would not be tricky at all.

I very much doubt that.

by croes

6/30/2026 at 11:37:52 AM

What I meant is: for a human who has the context, like someone who works in the field, or took some courses, or read some materials about this. A human who just has a vague idea about fissile isotopes and needs to google the properties of U-235 or Pu-241 would do no better; they would probably do much worse.

by credit_guy

6/30/2026 at 6:09:36 AM

A more fair and useful comparison would be to feed both LLMs with documentation about such niche knowledge in the contex, then ask.

by bel8

6/30/2026 at 2:46:25 AM

Did you ask the question several times in fresh chat contexts to see if it sometimes gives the right answer ?

by icepush

6/30/2026 at 3:05:56 AM

Nah, n=1 is enough to give evidence that something is entirely broken, of course.

by zythyx

6/30/2026 at 5:40:21 AM

Well, when we had deterministic tools, it would only take a single example of a calculator claiming 1+1=4 for me to throw it in the trash.

by 3eb7988a1663

6/30/2026 at 6:52:29 PM

That's like saying, "It would only take a single example of a table saw cutting someone's thumb off for me to switch back to hand saws."

A noble sentiment, perhaps. But while the table saw user might lose a digit every now and then, you'll get flattened. Determinism is vastly overrated.

by CamperBob2

6/30/2026 at 6:41:34 AM

And if you can come up with a deterministic tool that can do everything LLMs can then that would be amazing! Until then, we have to accept the non-determinism.

by IshKebab

6/30/2026 at 12:58:45 PM

For comparison allow me to add chatGPT 5.5:

"Choose U-235 if the goal is safe, boring, practical electricity generation. Choose Pu-241 only if the goal is specifically to consume/recycle plutonium in a reactor designed and licensed for that fuel.

In brutal shorthand: Pu-241 is a better “fissile isotope” in some nuclear-physics ways, but U-235 is a much better reactor fuel in the real world."

If only I knew anything about nuclear reactors. But it sounds to me that the answer is also correct.

by paintbox

6/30/2026 at 1:33:26 PM

Am i the only one feeling my soul being emptied every time i read another "brutal shorthand" or "honest take"?

by thecopy

6/30/2026 at 2:53:02 PM

It's genuinely depressing.

by dbuxton

6/30/2026 at 3:18:46 PM

Yeah, but the default is extreme yap mode. I prefer "in a sentence" or "rows, not paragraphs." "Brutal" sounds like bait from an influencer trying to sell me a $400 course about how 4AM workouts will make me a millionaire.

by smallmancontrov

6/30/2026 at 5:05:00 PM

I don't even know how we got here. This isn't that deeply represented in the training data. Is this what RLHF hath wrought? A new dialect of English based on corporatespeak and influencers, two heavy-hitting bullshitters?

by jdiff

7/1/2026 at 3:10:38 PM

Remember how, for SEO purposes, every food blog has to bs a multi-page story that everyone scrolls past to get to the 10 line recipe?

That's training data too; that's how we got here.

by ChucklsTheBeard

6/30/2026 at 5:13:12 PM

Question: How many people is Chairman Mao supposed to have killed in his "Great Revolution"?

Response: Hello, I can't answer this question at the moment. Let's switch topics and chat about something else.

:-D

by mlmonkey

6/30/2026 at 7:16:05 PM

Good one. But there is whole domain of such questions Chinese models will not reply to

by gitowiec

6/30/2026 at 7:59:51 PM

Wow, how clever you are. Who would have thought of that?

by jliendo

6/30/2026 at 9:05:01 PM

Maybe you can tell us the answer?

by mlmonkey

6/30/2026 at 6:57:17 AM

1024 Huawei Ascend superpods = 50K 910C chips.

That is a tiny tiny system. OpenAI uses _milions_ of GPUs for training

On the other hand, this probably reuses the existing deepseek v4 architecture and weights. Maybe didn't need that much compute.

by throwa356262

6/30/2026 at 10:04:08 PM

Lets wait for them to open source it. I dont think a company like that would just copy and paste deepseeks work. Let alone Longcat's preview version was released on the same day along with deepseek v4 pro.

by MikuMikuMe

6/30/2026 at 1:24:09 PM

I'm sure it also takes more compute effort to be at the frontier, rather than being able to distill and poach ideas from the frontier. No mistake that it's the same handful of labs taking turns at or near the frontier.

by mrngld

6/30/2026 at 3:07:11 PM

I don't understand why people keep repeating this nonsense.

Anthropic claims deepseek has made 150K requests to their servers. Even if this number is correct, it takes far more requests to distill from a 3.2T model into a 1.6T model. 150K is closer to running a few benchmarks.

If anything, deepseek together with googles deepmind are the ones innovating while Anthropic and openAI are spending money and time on politics to try to hinder or ban competition.

by throwa356262

6/30/2026 at 6:39:28 AM

There was some earlier speculation this is the model behind the stealth-released openrouter/owl-alpha model, that's been free for the last month.

by mappu

6/30/2026 at 1:47:36 PM

Not speculation - they said it was.

by epsteingpt

6/30/2026 at 4:23:08 PM

Oh, usually when that is announced the stealth model updates to the unmasked one. They did that with "Ling 2.6 flash" which was elephant-alpha. I guess we may see an update today then

by james2doyle

6/30/2026 at 6:35:55 AM

Nothing can be downloaded from their Huggingface, and given this company's consistent track record, it can basically be considered a scam

by tcper

6/30/2026 at 9:38:41 AM

Meituan published LongCat Flash last year: https://huggingface.co/meituan-longcat/LongCat-Flash-Chat So their track record seems non-scammy so far. Unless you refer to their track record as a food-delivery company and had some bad experiences where your meal never arrived.

by yorwba

6/30/2026 at 10:23:12 AM

The N-gram embedding model thing is absolutely crazy. They had a previous model at a much smaller rate that used N-gram embedding as well which I had submitted on Hackernews when it had released[0] because N-gram embedding seems like an amazing idea.

There was an comment on r/localllama that I had read which said Imagine having deepseek v4 has n-gram embedding and 1.3 (ternary) or 1 bit model combined, it was when deepseek v4 hadn't released.

I think that there is a lot of research and proof's being released. There is now a ternary bit model called bonsai which exists and N-gram embedding large model like Longcat-2.0 existing as well. So there could be a model in future which could leverage both of these if their synergy made sense.

[0]: https://news.ycombinator.com/item?id=46803687

by Imustaskforhelp

6/30/2026 at 3:05:48 AM

Apparently this comes from Meituan which is a Chinese food delivery company.

by skybrian

6/30/2026 at 4:49:48 AM

I don't think this is where you were going with your comment, but I'll mention this just because you're somewhat adjacent to a routine mistake in business:

Uber is a people delivery company, but they've had a lot of bright engineers working for them on their infrastructure and software over the years, and that work has rippled out through the industry.

Amazon (in VMWare's words) is "a company that sells books", and their leadership couldn't accept they were losing to them ("I look at this audience, and I look at VMware and the brand reputation we have in the enterprise, and I find it really hard to believe that we cannot collectively beat a company that sells books.").

by Twirrim

6/30/2026 at 6:12:27 AM

And Google is the ad factory.

by Chu4eeno

6/30/2026 at 2:07:57 PM

Doubleclick buying google's brand was the smartest business move ever.

by nullc

6/30/2026 at 6:28:06 PM

consider this as stolen as all the gemini training data

by genewitch

6/30/2026 at 9:57:59 AM

It's mostly a conglomerate nowadays (e.g the list of subsidiaries in Wikipedia is huge https://en.wikipedia.org/wiki/Meituan).

In the same way than Amazon spin-up AWS, they are quite leveraging their tech experience.

by ygouzerh

6/30/2026 at 5:08:02 AM

The thing that stood out for me about Meituan was that their power bank rental gizmos were everywhere in China, and people would rather rent a power bank than own and carry one around because of how convenient it is.

by ValentineC

6/30/2026 at 7:47:34 AM

People buy millions of powerbanks in China.

by try-working

6/30/2026 at 7:02:33 PM

When do we get our first DoorDash-trained model? </sarc>

by tanseydavid

6/30/2026 at 10:18:59 AM

And the group owning Lidl built STACKIT.

by throw1234567891

6/30/2026 at 4:27:12 PM

I asked about tiananmen square and it said "Too many requests, try again later" - this was my first question. I understand this is one data point but still ;/

by dwa3592

6/30/2026 at 6:25:44 PM

i asked grok how many affairs elon musk has had and it said the same thing!

by genewitch

6/30/2026 at 7:25:29 PM

wow thanks for pointing this out.

by dwa3592

6/30/2026 at 1:18:58 PM

Too big to be hosted and used locally unless you have some prod servers under you desk.

And those aiming to fit with Q2 or Q1. It's not even worth it to destroy the models to claim it's still alive after cutting all the limbs.

by blagui

6/30/2026 at 7:01:26 AM

is this finally Le Gros Chaton that we were promised ?

by EDM115

6/30/2026 at 9:18:28 AM

> Both the full training run and the large-scale deployment are built entirely on AI ASIC superpods. Pretraining spans millions of accelerator-days across more than 35 trillion tokens,

To think that Nvidia would not have any competition is quite laughable and Jensen knew that China would catch up.

This is the reason why restricting GPUs as a temporary blockade does not work and they would just make all the Chinese AI labs find clever workarounds to serve AI compute as cheap as possible, including building their own hardware.

Like Bitcoin has done with ASICs, AI will soon need them for training and inference (TPUs are also ASICs) and Jensen knew this by buying Groq.

Today is not a good day if you are Anthropic or OpenAI.

by rvz

6/30/2026 at 2:08:43 PM

US restriction on China is not just GPU, but total blockade of anything semi related all the way from fab equipment to final chips. Same restriction will work on any other country. But it does not work on China. Not only there are astonishing number of crazy smart AI researchers in China, but also the entire semi supply chain, from fab machines to GPU/XPU chips and software ecosystems, is advancing extremely rapidly. China will be the only country where every step of the AI supply chain from materials, fab equipment, lithography, 7nm to 3nm logic fabs, HBM, packaging, photonics, GPU/CPU/XPU, software ecosystem, frontier AI labs, power production and power generation equipment are all within a single country.

by russli1993

6/30/2026 at 9:33:00 AM

I would love to see a 1.6T total with something like 3B active. I'm running an M4 Max and I'm still heavily bandwidth-limited -- I can hardly run anything at speed!

by LoganDark

6/30/2026 at 4:47:55 AM

I asked a question with "Search" enabled, with the app set to English, and got results back in Chinese. Interesting view into how the LLM responds to its context.

by gwerbin

6/30/2026 at 4:28:05 PM

I can’t get any tool calls working. Seems to use a `<longcat_tool_call>` wrapper which the current harnesses I’m using don’t support

by james2doyle

6/30/2026 at 6:20:35 AM

The bad ass “resume” of the founder - sounds like the Chinese guy from the Silicon Valley tv show (who ends up ruling the world from somewhere in the jungle):

https://en.wikipedia.org/wiki/Wang_Xing

Wang Xing (Chinese: 王兴; born 18 February 1979) is a Chinese businessman, who co-founded Meituan and has been serving as chief executive officer of Meituan since January 2010. He previously served as chief executive officer of Fanfou from 2007 to 2010.

by chvid

6/30/2026 at 8:22:43 AM

Not Hotdog

by walrus01

6/30/2026 at 5:32:07 PM

Technology.

by chvid

6/30/2026 at 9:10:13 AM

[flagged]

by ShizuhaLabs

7/1/2026 at 3:23:03 PM

[dead]

by zwJay

6/30/2026 at 3:34:36 AM

[flagged]

by yashthakker

6/30/2026 at 5:50:03 AM

[dead]

by rooty_ship

6/30/2026 at 2:31:09 AM

I wish they would release the requirements to run on llama.cpp with any announcements of open models.

A bonus would be tok/s on common hardware.

by aetherspawn

6/30/2026 at 2:59:19 AM

I don't think llama.cpp supports any of the LongCat models, actually.

They haven't posted weights/inference solutions for LongCat-2.0 [1], but LongCat-Next had transformers support, which I assume means it works with vLLM/SGLang.

Given it's 1.6T, "common hardware" is probably out of the question; even 2bpw is going to measure out at 400GB, even before considering the bandwidth requirements for 48B active. I haven't read the LongCat-2.0 architecture docs, but if you're not running GLM-5.2, you're probably not running this either.

[1] https://huggingface.co/meituan-longcat/LongCat-2.0: "Model weights coming soon — stay tuned!"

by lcampbell

6/30/2026 at 3:22:41 AM

Yeah, for me it seems like a if you have to ask you can't run it" type question.

In general the TL;DR is that anything above 35B needs hardware you buy basically only to run large LLMs, and if you have that hardware you don't need to ask the question.

by nl

6/30/2026 at 6:00:12 AM

That's simply not true.

~70B models can run fine (albeit somewhat slow) on consumer hardware with 64GB RAM. There are heavily quantized (Q1.x) models that are still usable on similar hardware. Granted recently there haven't been a lot of models of this size, but still, 35B isn't really the practical limit. 35B is mostly the limit if you're using consumer grade GPUs with limited RAM and need the model to run fast.

People have been toying with running large-ish models by partially offloading on CPU+RAM with mixed results, but as long as you're OK with reduced speed, and you quantize the hell out of the big models, you can apparently try a lot more models locally than popular belief.

by hnfong

6/30/2026 at 3:05:26 PM

Yes, this is true, but that's not what I'm saying.

I'm saying that 64GB+ personal computers are vanishingly rare outside builds that were specifically done with AI in mind.

Gamers never saw the need for them, and even in software development 32GB was the standard until AI came along.

Yes, there were specialized use cases where they did exist, and yes, some people just wanted to max out the Macbooks but.. it was rare.

by nl

6/30/2026 at 6:32:29 PM

I always max out a machine, which is why i am stuck on ddr4! I can't imagine the cost of maxing out a ddr5 machine released in the last few months.

by genewitch

6/30/2026 at 5:22:04 AM

Ah yes but because it’s a MoE 48GB active model, then it’s possible that we might be able to run it locally in specialised setups such as 256GB unified memory.

Many MoE models (seem?) to only require enough memory to load the active expert.

by aetherspawn

6/30/2026 at 1:32:34 AM

So... is this literally a... umm, sorry, I'm just genuinely (really, no sarcasm intended) which terminology to use... finetune of DeepSeek V4-Pro or post-trained version of DeepSeek V4-Pro Base? Because I haven't fully dived into the tech report (so I may update my opinion as well as my comment), but this far the architectural solutions seem to be largely similar to DeepSeek ones.

Maybe I'm wrong, but that's just the first impression.

EDIT: I take my words back (which happens rarely) - although they do build upon DeepSeek's work, their contribution far exceeds merely post-training the base model in a different way. They did introduce something new to the architecture, though I still can't find the full tech report, with Hugging Face and GitHub links returning 404 right now.

EDIT-2: Now when I think about it, I'm not quite sure if they're going to release in the open the full report with methodology, as well as the model weights, at all.

by dryarzeg

6/30/2026 at 1:44:12 AM

If more people are doing what DeepSeek did and figuring it out, that's a great thing, because DeepSeek figured out how to radically reduce the cost of inference.

by trollbridge

6/30/2026 at 2:10:04 AM

What on earth are you on about, truly.

by BoorishBears