alt.hn

4/20/2026 at 7:51:07 PM

Soul Player C64 – A real transformer running on a 1 MHz Commodore 64

https://github.com/gizmo64k/soulplayer-c64

by adunk

4/20/2026 at 10:23:06 PM

> 25K parameters is about 70 million times smaller than GPT-4. It will produce broken sentences. That's the point - the architecture works at this scale.

Since it seems to just produce broken and nonsensical sentences (at least based on the one example given) I'm not sure if it does work at this scale.

Anyway, as written this passage doesn't really make a whole lot of sense (the point is that it produces broken sentences?), and given that it was almost certainly written by an AI, it demonstrates that the architecture doesn't work especially well at any scale (I kid, I kid).

by wk_end

4/20/2026 at 10:38:03 PM

How does it compare to a Markov chain generator I wonder.

by forinti

4/21/2026 at 12:20:36 AM

The Transformer is the more powerful model than Markov chain, but on such a weak machine as the C64, a MC could output text faster - but it surely would sound "psychedelic", as the memory limits a MC to a first-order or second-order model, so to predict one word, only the two words before would be taken into account as context (and no attention).

On a plain vanilla C64, the Transformer cannot really show what it's capable of doing. An implementation using 2 bit per weight (vectorized) could be slightly better, perhaps.

by jll29

4/21/2026 at 8:47:22 AM

You can build an unlimited-order Markov chain by, instead of pre-computing a table of counts for all possible contexts, using a substring-search index on the training data to count possible continuations on the fly: https://arxiv.org/abs/2401.17377 That paper uses suffix arrays, but more compact indices are possible: https://arxiv.org/abs/2506.12229

by yorwba

4/20/2026 at 11:36:25 PM

[dead]

by pizza234

4/20/2026 at 11:40:01 PM

You can chat with the model on the project page: https://indiepixel.de/meful/index.html

It (v3) mostly only says hello and bye, but I guess for 25k parameters you can't complain. (I think the rather exuberant copy is probably the product of Claude et al.)

by daemonologist

4/21/2026 at 8:44:31 AM

It seems to like getting its own output fed back:

YOU> HI! HOW ARE YOU DOING?

C64> BELON

YOU> WHAT DO YOU MEAN BY THAT?

C64> YOU DES

YOU> YOU TOO!

C64> YOU DESER! THIS THE SHIN

YOU> DESER? SHIN? THOSE AREN'T EVEN REAL WORDS, WHAT'S GOING ON HERE?

C64>

YOU> THIS THE SHIN

C64> YOU MAKE ME HAPPY TOO

YOU> YOU DESER

C64> YOU MADE MY HEART! W

YOU> DES, DESER, DESERST?

C64> YOU DES

YOU> BELON

C64> I LIKE OURON ME! WONDERFUL!YOU!

YOU> YOU BELON DES

C64> YOU MAKE ME HAPPY

by rob74

4/21/2026 at 10:43:29 AM

Maybe impressive in one way, but I'm also pretty sure a simple n-gram Markov model (a la Niall on the Amiga) would have a lower loss on the test set.

Transformers don't scale down very well, in my experience - I used to train local models all the time as new ones were released, as I recall transformers were the first ones I couldn't get better results out of with my limited training data and GPU.

by vintermann

4/20/2026 at 11:52:09 PM

Just reminded me of the random sentence generator program on my Vic-20. I had changed most of the words to all the bad words a preteen could think up. So many laughs with the neighborhood kids.

by mixmastamyk

4/21/2026 at 6:58:03 AM

I love these counterfactual creations on old hardware. It highlights the magical freedom of creativity of software.

by arketyp

4/21/2026 at 1:36:37 AM

Interesting, I’ve always thought neural network progress was primarily bottlenecked by compute.

If it turns out that LLM-like models can produce genuinely useful outputs on something as constrained as a Commodore 64—or even more convincingly, if someone manages to train a capable model within the limits of hardware from that era—it would suggest we may have left a lot of progress on the table. Not just in terms of efficiency, but in how we framed the problem space for decades.

by borsch_not_soup

4/21/2026 at 2:01:36 AM

  YOU> hey
  C64> HELLO! RE SOUNDS ME. MEFUL!

60s per token for that doesn't strike me as genuinely useful.

Very, very cool project though!

by dpe82

4/21/2026 at 2:31:13 AM

not useful in a disaster scenario:

YOU> HELP I'M DROWNING

C64> YOU' HERE!

YOU> OH NO I'M ON FIRE

C64> IGLAY!

YOU> IM BEING SWALLOWED BE A SNAKE

C64>

YOU> BIRDS ARE NIPPING ON ME

C64> YOU

by chillingeffect

4/21/2026 at 3:07:46 AM

Reminds me of Terry Davis' random word generator :')

Maybe there is deeper wisdom in there that we have yet to unearth

by Razengan

4/21/2026 at 5:27:07 AM

Next-word prediction features always existed for flip phones...

by numpad0

4/20/2026 at 11:02:02 PM

If you're running this in VICE, run it under the SuperCPU with warp mode on.

by classichasclass

4/20/2026 at 11:16:46 PM

That's a good idea because, although I love this, 1 minute per token is absolutely savage. Whereas if you can juice the performance you're into semi-credible Jar Jar Binks simulator territory.

It does also make me wonder what you could do with somewhat more powerful retro hardware. I'd love to see what a transformer running on a PSX or an N64 could do.

by bartread

4/20/2026 at 11:17:40 PM

This would have blown me away back in the late 80s/early 90s.

(Or maybe not, if it doesn't perform better than random, I haven't actually tried it out yet. Some more examples would have been nice!)

I wonder how far you could push this while still staying period correct, e.g. by adding a REU (RAM Expansion Unit), or even a GeoRAM (basically a REU on steroids).

SuperCPU would also be an option, but for me it's always blurring the line of "what is a C64" a bit too much, and it likely just makes it faster anyway.

by anyfoo

4/21/2026 at 1:26:32 AM

How fast is the “new” Commodore 64?

Have not heard much about it since launch. Although, now that I look, it seems they are just shipping now.

https://www.commodore.net/product-page/commodore-64-ultimate...

by LeFantome

4/21/2026 at 4:07:47 AM

RAM can be increased to 16 MB and CPU speed to 48 GHz.

by steve_taylor

4/21/2026 at 12:16:02 PM

The 64Ultimate goes to 64MHz, the Ultimate64 cartridge goes to 48MHz "only".

by IcePic

4/21/2026 at 5:08:21 AM

I’m sorry how many Hz???

by wk_end

4/21/2026 at 7:44:30 AM

A little disappointed to see PyTorch + Claude here. I was hoping for some "demo-scene" hand-crafted 6502 assembly, and hopefully training on the C64.

by rahen

4/21/2026 at 9:50:28 AM

Same, however I do conceed having the whole assembler toolchain written in Python was also kind of cool, even if it may have been AI generated.

Even cooler would have been to have the 6502 directly generated from the LLM.

by pjmlp

4/21/2026 at 8:36:39 AM

so... it is vibe-code?

meh

by NooneAtAll3

4/21/2026 at 9:19:55 AM

Yes. The author mentions Claude for testing, but it was obviously used for the README and code as well.

This is a giveaway for AI generation, from the docstring to the terrible opcode dispatch (Claude sucks at assembly or low-level optimization): https://github.com/gizmo64k/soulplayer-c64/blob/main/src/cpu...

A human would use a proper dispatch table and wouldn't make excuses for a sloppy implementation ("Python is fast enough").

Besides, the author has an art and design background, which doesn't seem to match the deep knowledge of Transformers or assembly required for such a project.

by rahen

4/21/2026 at 3:25:45 AM

Dissapointed - there was no 6502 code in the GitHub repo.

by djmips

4/20/2026 at 11:23:59 PM

How does this compare to ELIZA?

by brcmthrowaway

4/21/2026 at 12:30:09 AM

ELIZA is better, because this doesn't seem to generate anything coherent. You can try the original ELIZA with DOCTOR script here: https://anthay.github.io/eliza.html

by Geee

4/21/2026 at 12:16:40 AM

Jopsph Weizenbaum's ELIZA was rule-based and ran on even slower (1960s) hardware, but because it relied on simple pattern matching instead of neural nets, it would easily have been more responsive (the Emacs editor/operating system has an implementation included, start it with: M-x doctor RETURN).

ELIZA was not written in assembler, but (different versions) in COMIT, FORTRAN and LISP.

https://dl.acm.org/doi/pdf/10.1145/365153.365168

by jll29

4/20/2026 at 10:18:05 PM

Eliza called, and asked if we saw her grand kids...

by harel

4/20/2026 at 10:41:43 PM

What makes you say that? This is about you, not me.

(Came here to say an update to Eliza could really mess with the last person still talking to her.)

by tclancy

4/21/2026 at 1:32:56 AM

Load”*”,8,1

Brings back memories

by Vaslo

4/20/2026 at 10:54:05 PM

Ok now we need 1541 flash attention.

I'm not sure what the venn diagram of knowledge to understand what that sentence is suggesting looks like, it's probably more crowded in the intersection than one might think.

by Lerc

4/21/2026 at 6:27:23 AM

How many 40+ AI pillers? Assume 10M devs in the world. 10% heard of flash attention, 1% heard of 1541 then 10,000

by dnnddidiej

4/21/2026 at 7:14:21 AM

Ahh but you also have to know the significance of the 1541 that makes the Flash attention reference work

by Lerc

4/20/2026 at 10:20:04 PM

i hate ai, and i love the c64, but i'll allow it.

by bighead1

4/20/2026 at 11:04:20 PM

but can you make mac keyboards feel like a c64c?

by ghstinda