3/19/2026 at 9:59:24 AM
> I replicated David Ng's RYS method [...] found something I didn't expect.> Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer.
How did you not expect that if you read his post? That's literally what he discovered, two years ago.
For anyone interested, there's more meat in the post and comments from last week: https://news.ycombinator.com/item?id=47322887
by simgt
3/19/2026 at 10:50:02 AM
That's explicitly not the unexpected part. Read the rest of the post.by regularfry
3/19/2026 at 11:32:38 AM
After reading both the original post and this submission, what do you think is new here?by yorwba
3/19/2026 at 1:18:06 PM
> The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM, different routing.As far as I can see that's not implied by the original post.
But that's beside the point: quoting the bit where the poster says "here's what I'm building on top of" and using that to imply they haven't done anything new is a bit pointless, no?
by regularfry
3/19/2026 at 1:43:32 PM
You're right that my quote was misleading, I overlooked "the weird part" in the post because it didn't seem new to me either.Here's the section in the original post that covers it: https://dnhkng.github.io/posts/rys/#the-brain-scanner All heatmaps are split by tasks and show an optimal point for each. The resulting routing he chose is a trade-off for both tasks, there isn't much else to do unless you intend to train a router anyway.
> So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.
by simgt
3/19/2026 at 5:19:27 PM
This is stated in the original post as well, under "The Beginning of LLM Neuroanatomy?" section: > From end-position 43 to 46, we then see solid boosts in math scores (red = good, yay). But include layer 46 or beyond, and the benefits collapse again. The hypothesis: position 47 is where a different circuit begins. Including even one step of the next recipe messes up the current recipe.
> So the ‘math organ’ has boundaries on both sides. Too few layers and you get nothing — you’ve cut into the circuit and it can’t complete its operation. Too many layers and you also get nothing — you’ve included tissue from a neighbouring circuit that doesn’t belong. Pre-training carved these structures out of the layer stack, and they only work whole. It also doesn’t translate to other tasks, as the heatmap for EQ scores doesn’t have this patch.
> This is a much more specific claim than “middle layers do reasoning.” It’s saying the reasoning cortex is organised into functional circuits: coherent multi-layer units that perform complete cognitive operations. Each circuit is an indivisible processing unit, and the sweeps seen in the heatmap is essentially discovering the boundaries of these circuits.
by gavinray
3/20/2026 at 12:29:43 PM
That's just saying there are circuits. It's not saying you get different effects by stacking the same circuit in different ways.by regularfry
3/19/2026 at 1:05:19 PM
It's all new to me.by jstanley