alt.hn

4/7/2026 at 1:06:28 PM

Hybrid Attention

by JohannaAlmeida

4/7/2026 at 5:38:07 PM

I've been interested in faster attention and smaller models for some time but haven't had the time to do serious research so I can't answer your questions.

However, everything you do sounds very interesting, useful and well thought out, please keep doing it, I'd encourage others to work in the same direction too.

I hope, more of us can find the time for more than best wishes in the near future.

by bigbadfeline

4/8/2026 at 9:20:38 AM

Thank you so much . The next thing i want to tackle is the training bottleneck we have right now .

That will probably be another HN post when i figure it out.

by JohannaAlmeida

4/7/2026 at 1:32:48 PM

Full attention O(n²): 17.96s / 5.6 tok/s

HybridAttention O(n·W + n·D): 0.35s / 286.6 tok/s

by JohannaAlmeida

4/7/2026 at 2:07:18 PM

Is this for just like auto complete, because you are not going to get anything very useful out of a code-only training set.

by empath75

4/7/2026 at 2:27:30 PM

Yeah auto complete is an amazing use case. I needed a small model that used transformers , could fit on my weak consumer GPU .

So i needed to make fundamental arquitecture changes .Do some KV cache tricks.

And then prove the new arquitecture was faster with benchmarks and perplexity was acceptable.

by JohannaAlmeida

4/7/2026 at 5:52:06 PM

Well, coding is a kind of extended autocomplete - I prefer that way of working because I don't like the mess created by LLMs when you let them work on their own. Smaller models, specialized on a single language, make a lot of sense.

by bigbadfeline

4/7/2026 at 3:06:30 PM

I think it's more a proof of concept: locally trained. It would take lots of resources/time to train something non-trivial.

by altruios

4/7/2026 at 2:51:13 PM

Look into RWKV.

by woodson

4/7/2026 at 3:04:21 PM

Yeah RWKV is definitely related in spirit (recurrent state for long context). Here I’m combining local windowed attention with a gated recurrent path + KV cache compression, so it’s more hybrid than fully replacing attention

by JohannaAlmeida

4/7/2026 at 3:41:13 PM

[dead]

by MarcelinoGMX3C

4/7/2026 at 2:40:23 PM

[dead]

by Aegis_Labs

4/7/2026 at 4:32:48 PM

[flagged]

by _2fnr