alt.hn

1/1/2026 at 2:03:18 PM

DeepSeek kicks off 26 with paper signalling push to train bigger models for less

https://www.scmp.com/tech/big-tech/article/3338427/deepseek-kicks-2026-paper-signalling-push-train-bigger-models-less

by ksec

1/1/2026 at 3:48:47 PM

Jesus. Why do people that clearly don't understand this field insist on writing on the subject?

> DeepSeek kicks off 2026 with paper signalling push to train bigger models for less

> DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

Both the title and the first paragraph are completely and unambiguously wrong.

While the method improves stability (preventing training collapse), it technically increases the computational cost per step rather than reducing it. The benefit is reliability, not raw cost reduction. (page 4 > "mHC supports training at scale and introduces only a 6.7% additional time overhead")

Secondly the proposed mHC is an extention of HC, and while cool, it's nowhere near a "rethink of its core architecture". If proven beyond the small models they tried (27B models), this method fixes some instability issues, but the "core" architecture stays the same.

by NitpickLawyer

1/1/2026 at 4:29:31 PM

Personally I'm interested in the prospect of enabling ways to change the learning process of a model based on topological structures.

by ranyume