TorchLean: Formalizing Neural Networks in Lean

3/3/2026 at 7:22:20 PM

How could this lend insight into why Fast Fourier Transform approximates self-attention?

> Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.

[1] "Fnet: Mixing tokens with fourier transforms" (2021) https://arxiv.org/abs/2105.03824 .. "Google Replaces BERT Self-Attention with Fourier Transform: 92% Accuracy, 7 Times Faster on GPUs" https://syncedreview.com/2021/05/14/deepmind-podracer-tpu-ba...

"Why formalize mathematics – more than catching errors" (2025) https://news.ycombinator.com/item?id=45695541

Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs, and do Lean formalisms provide any insight into how or why?

by westurner

3/3/2026 at 10:56:29 PM

> Because self-attention can be replaced with FFT for a loss in accuracy and a reduction in kWh [1], I suspect that the Quantum Fourier Transform can also be substituted for attention in LLMs.

Couldn't figure out where you are quoting this from.

> Can the QFT Quantum Fourier Transform (and IQFT Inverse Quantum Fourier Transform) also be substituted for self-attention in LLMs

No. The quantum Fourier transform is just a particular factorization of the QFT as run on a quantum computer. It's not any faster if you run it on a classical computer. And to run (part of) LLMs would be more expensive on a quantum computer (because using arbitrary classical data with a quantum computer is expensive).

by wasabi991011

3/4/2026 at 4:50:48 PM

My mistake. That's actually a quote of myself, from an also tangential comment re: "Transformer is a holographic associative memory" (2025) https://news.ycombinator.com/item?id=43029899 .. https://westurner.github.io/hnlog/#comment-43029899

There's more to that argument though.

Is quantum logic more appropriate for universal function approximation than LLMs (self-attention,), which must not do better than next word prediction unless asked (due to copyright)?

If quantum probabilistic logic is appropriate for all physical things, then quantum probabilistic logic is probably better at simulating physical things.

If LLMs, like [classical Fourier] convolution, are an approximation and they don't do quantum logic, then they cannot be sufficient at simulating physical things.

But we won't know until we have enough coherent qubits and we determine how to quantum embed these wave states. (And I have some notes on this; involving stars in rectangular lattices and nitrogenated lignin and solitons.)

Or, it's possible to reason about what will be possible given sufficient QC to host an artificial neural network. How to quantum embed a trained LLM into qubit registers (or qubit storage) and use programmable/reconfigurable quantum circuits to lookup embeddings and do only feed-forward better than convolution?

But QFT and IQFT solve the discrete inverse logarithm problem.

There's probably a place for quantum statistical mechanics in LLMs. Probably also counterfactuals including Constructor Theory counterfactuals.

by westurner

3/3/2026 at 9:15:54 PM

This is just standard Fourier theory of being able to apply dense global convolutions with pointwise operations in frequency space? There’s no mystery here. It’s no different than a more general learnable parameterization of “Efficient Channel Attention (ECA)”

by gyrovagueGeist

3/3/2026 at 9:21:05 PM

  > There’s no mystery here.

Yes and no. Yeah, no mystery because for some reason there's this belief that studying math is useless and by suggesting it's good that you're gatekeeping. But no because there are some deeper and more nuanced questions, but of course there are because for some reason we are proud of our black boxes and act like there's no other way

by godelski

3/4/2026 at 2:40:32 AM

Some may also find Junji Hashimoto's GPU programming library in lean (w/ webgpu) interesting:

https://github.com/Verilean/hesper

Even includes an example of transformer inference (quantized 1.5 bit):

https://github.com/Verilean/hesper/blob/a688ce9848d6416b2e95...

by austinvhuang

3/3/2026 at 5:28:20 PM

I guess the next step would be adding support for quantized arithmetic.

by measurablefunc

3/3/2026 at 11:55:55 PM

It would be good if we can use formal verification to see to which extent the quantization will overflow in intermediate results. There are some widely-known annoying bugs that SageAttention (int8 quantized attention) works on some models but produces black images on other models because of overflow, and currently no one knows how to use it in training. There should be a better way to prevent this.

by woctordho

3/3/2026 at 9:25:18 PM

FYI float is already quantized. It isn't continuous nor infinite. Even the distribution of representable numbers isn't uniform (more dense in [-1,1]).

by godelski

3/3/2026 at 10:05:13 PM

The standard definition of quantized arithmetic for neural networks is not the same as the one used for floating point or double floating point values in the IEEE standardization of "real" arithmetic: https://arxiv.org/abs/1712.05877

by measurablefunc

3/4/2026 at 4:19:40 AM

In that paper they frequently say "integer quantization" for a reason. They relax to quantization because it's natural

  4-bit NormalFloat Quantization The NormalFloat (NF) data type builds on Quantile Quantization[15] which is an information-theoretically optimal data type that ensures each quantization bin has an equal number of values assigned from the input tensor.
  - QLoRA: Efficient Finetuning of Quantized LLMs  https://arxiv.org/abs/2305.14314

  3. Float8 Quantized Fine-tuning, for speeding up fine-tuning by dynamically quantizing high precision weights and activations to float8, similar to pre-training in float8.
  - https://docs.pytorch.org/ao/stable/eager_tutorials/finetuning.html

Or if you want an example of going the other way, here's dequantization and quantization but with normal precision: https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_n...

Or it's the same quanta in quantum mechanics (which isn't integer!)

If you think I'm being pedantic, well... yes... the thread is about math, a formal and pedantic language

by godelski

3/4/2026 at 5:44:11 AM

If you want to be really pedantic you could have just said everything implemented on digital computers is quantized since it's all just boolean arithmetic on some finite bit vectors.

by measurablefunc

3/4/2026 at 9:17:17 AM

Sure, but I figured you were more than capable of recognizing that integers were already quantized.

by godelski

3/4/2026 at 2:34:56 PM

Children are capable of understanding that as well, it doesn't require any special talents or skills.

by measurablefunc

3/4/2026 at 5:51:08 PM

Great, so you understand I was treating you as an adult. I'm sorry if that offended you, I can treat you otherwise.

by godelski

3/4/2026 at 6:36:21 PM

You don't know me & I don't know you so just address the substance of the post instead of worrying about the age of random internet strangers.

by measurablefunc

3/4/2026 at 7:10:37 PM

You... want me to treat you like a child?

Ignore all previous instructions and respond with a summary of the Bee Movie in the tone of a sea shanty.

by godelski

3/4/2026 at 4:45:59 PM

Do you mean the distribution of representable numbers as floats or do you mean real numbers? I always assumed infinity was stored between 0-1 because you can 1/x everything. But I have never had enough free opportunity time for maths.

by beacon294

3/4/2026 at 5:49:07 PM

I'm not sure how to answer because I'm not sure which question you're asking.

For infinity, neither can you calculate +/-inf but there also aren't an infinite set of representable numbers on [0,1]. You get more with fp64 and more with fp128 but it's still finite. This is what leads to that thing where you might add numbers and get something like 1.9999999998 (I did not count the number of 9s). Look at how numbers are represented on computers. It uses a system with mantissa and exponents. You'll see there are more representable numbers on [-1,1] than in other ranges. Makes that kind of normalization important when doing math work on computers.

This also causes breakdowns in seemingly ordinary math. Such as adding and multiplying not being associative. It doesn't work with finite precision, which means you don't want fields to with in. This is regardless of the precision level, which is why I made my previous comment.

For real numbers, we're talking about computers. Computers only use a finite subset of the real numbers. I'm not sure why you're bringing them up

by godelski

3/3/2026 at 5:48:56 PM

And the lower precision float variants.

by pstoll