Show HN: Microgpt is a GPT you can visualize in the browser

2/16/2026 at 9:23:05 AM

Amazing work! Reminded me of LLM Visualization (https://bbycroft.net/llm) except this is a lot easier to wrap my head around and that I can actually run the training loops, which makes sense given the simplicity of the original microgpt.

To give a sense of what the loss value means, maybe you can add a small explainer section as a question and add this explanation from Karpathy’s blog:

> Over 1,000 steps the loss decreases from around 3.3 (random guessing among 27 tokens: −log(1/27)≈3.3) down to around 2.37.

to reiterate that the model is being trained to predict the next token out of 27 possible tokens and is now doing better than the baseline of random guess.

by kengoa

2/16/2026 at 9:51:56 AM

The linked inspiration has a good blog post of microgpt implemented in python.

https://karpathy.github.io/2026/02/12/microgpt/

It was submitted to hn a few days ago but only received a few comments. https://news.ycombinator.com/item?id=47000263

by interloxia

2/15/2026 at 11:45:26 PM

There used to be this page that showed the activations/residual stream from gpt-2 visualized as a black-white image. I remember it being neat how you could slowly see order forming from seemingly random activations as it progressed through the layers.

Can't find it now though (maybe the link rotted?), anyone happen to know what that was?

by krackers

2/16/2026 at 9:35:10 AM

I was a little confused by "see, its much better" when the output is stuff like isovrak and kucey. What is it supposed to be generating?

by RugnirViking

2/16/2026 at 10:54:30 AM

the untrained model is literally just generating random characters, whereas your examples are at least pronouncable. you can add more layers to get progressively better results.

by b44

2/16/2026 at 9:38:16 AM

It's just hallucinating training data, the model is very small so it cannot be useful at all

by lucrbvi

2/16/2026 at 4:27:59 AM

I'd love to understand how LLMs work, but this site assumed a bit too much knowledge for me to get much from it. Looks cool though.

by stevage

2/16/2026 at 10:23:35 AM

I think this blog post in particular might be helpful here https://sebastianraschka.com/blog/2023/self-attention-from-s...

by alansaber

2/16/2026 at 4:58:11 AM

This is the best content I’ve found to to learn how LLMs really work : https://youtu.be/7xTGNNLPyMI?si=Gk0u4suz8pv39tP4

by BloondAndDoom

2/16/2026 at 9:24:17 PM

Really nicely presented, well done!

by armcat

2/16/2026 at 7:06:05 PM

It reminds me the anything+GPT era of 2022-2024

by prakashdep

2/16/2026 at 10:30:47 AM

My Android phone was not a fan of this site, but on my desktop it works great! Cool stuff

by ramon156

2/16/2026 at 5:55:08 AM

nice. visualizing the prompt->tool->output graph is underrated, it makes failure modes (and cost) obvious. do you track token/call cost per node + cache hits, or is it purely structural right now? also curious if you let users diff two runs (same prompt, different model/tool) and see which node diverged first.

by umairnadeem123

2/16/2026 at 11:17:58 AM

I can't help but think there has to be a cheaper way to LLM.

by keepamovin

2/15/2026 at 9:18:18 PM

Minor nit: In familiarity, you gloss over the fact that it's character rather than token based which might be worth a shout out:

"Microgpt's larger cousins using building blocks called tokens representing one or more letters. That's hard to reason about, but essential for building sentences and conversations.

"So we'll just deal with spelling names using the English alphabet. That gives us 26 tokens, one for each letter."

by kfsone

2/15/2026 at 10:50:03 PM

Using ascii characters is a simple form of tokenization with less compression

by mips_avatar

2/15/2026 at 9:31:48 PM

hm. the way i see things, characters are the natural/obvious building blocks and tokenization is just an improvement on that. i do mention chatgpt et al. use tokens in the last q&a dropdown, though

by b44

2/15/2026 at 10:02:21 PM

About how many training steps are required to get good output?

by msla

2/16/2026 at 10:24:21 AM

Depends on the model size, batch size, input sequence length, ... etc. With a small model like this you'll never get a 'good' output but you can maximise its potential.

by alansaber

2/16/2026 at 12:29:32 AM

I trained 12,000 steps at 4 layers, and the output is kind of name-like, but it didn't reproduce any actual name from it's training data after 20 or so generations.

by WatchDog

2/15/2026 at 10:10:04 PM

not many. diminishing returns start before 1000 and past that you should just add a second/third layer

by b44

2/16/2026 at 5:43:19 AM

Wtok and Wpos should be 26-dim along one of the axis but it shows a 16x16 matrix be default, fc1 instead 16x64 with the default settings (not 16x16).

by GaggiX

2/16/2026 at 5:58:21 AM

good catch - i intentionally cap node visualizations at 16 so it doesn't get super long, but the sidebar shouldn't have that

by b44

2/16/2026 at 11:08:24 AM

really well done

by youio

2/15/2026 at 11:44:35 PM

thank you for this

by darepublic

2/16/2026 at 10:37:46 AM

[dead]

by nivcmo