12/11/2025 at 3:59:56 PM
https://github.com/beehive-lab/GPULlama3.javaby mikepapadim
12/11/2025 at 3:59:33 PM
by mikepapadim
12/11/2025 at 3:59:56 PM
https://github.com/beehive-lab/GPULlama3.javaby mikepapadim
12/12/2025 at 1:56:54 AM
Does it support flash attention? Use tensor cores? Can I write custom kernels?UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.
by lostmsu
12/12/2025 at 8:32:32 AM
Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...by mikepapadim
12/12/2025 at 10:12:35 AM
TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393by lostmsu
12/12/2025 at 1:27:06 PM
https://github.com/beehive-lab/TornadoVM/pull/732 https://github.com/beehive-lab/TornadoVM/pull/313by mikepapadim
12/14/2025 at 8:48:37 AM
I believe these are SIMD. Tensor cores require MMA family of instructions. Ask me how I know. :)https://github.com/m4rs-mt/ILGPU/compare/master...lostmsu:IL...
Good article: https://alexarmbr.github.io/2024/08/10/How-To-Write-A-Fast-M...
by lostmsu
12/12/2025 at 3:01:34 AM
[dead]by sliicemasternet