πFS | alt.hn

6/10/2026 at 8:40:24 PM

Reminds me of when I tried to use the library of babel as a data compression tool. It led me down a fun rabbit hole and was my first introduction to information theory.

The conclusion being that you basically need the same amount of data to represent the address of your data as the data itself, so it's not really effective at compression, just a fun thought experiment.

The cool part of this in modern times is that LLMs are basically a form of lossy compression that actually achieves the gist of what these tools fail at. Although it is lossy, and requires a massive substrate. This is related to the idea of AI/LLMs being a form of language compression.

by jamwise

6/10/2026 at 11:33:49 PM

You'll find this an interesting watch:

Reinventing Entropy Compression is Intelligence Part 1

3blue1brown https://youtu.be/l6DKRf-fAAM?is=ne73FCJ7ErXhzZ-v

by ithkuil

6/11/2026 at 11:37:32 AM

You, and the HN users, `lojban`, `klingon`, `ido`, `brithenig`, `solresol`, `babm`, and `tokipona`, may want to start a club. Amusingly, nobody seems to have registered the `esperanto`, `volapuk`, `interslavic`, `balaibalan`, and `dothraki` usernames.

by nz

6/11/2026 at 1:56:51 PM

What can I say other than thank you for the inspiration.

by dothraki

6/11/2026 at 1:32:39 PM

I feel like I am having a stroke reading this comment

by idiotsecant

6/11/2026 at 2:10:13 PM

The user names all describe conlangs[0]. Though I'd suggest nz to join as well, considering only a true conlang-connisseur would actually notice.

[0]: https://en.wikipedia.org/wiki/Constructed_language

by lompad

6/11/2026 at 3:15:35 PM

I don’t see users with ‘khuzdul’, ’sindarin’, or ‘quenya’ either.

by cestith

6/11/2026 at 6:12:27 AM

Also this article by Ted Chiang as a literary explanation of the connection between intelligence and compression: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

by sam_lowry_

6/10/2026 at 11:06:37 PM

In some sense, science is the most extreme form of compression - Newtonian mechanics explains an incredible number of phenomena in a few lines of text.

by ainch

6/11/2026 at 4:27:31 PM

It does, but only vaguely unless you already know how it works and can work backwards to Newton's laws. Eg Newtonian mechanics can explain how flying works, but if you don't already know then it's hard to go from Newton's 3 laws to a functional explanation of why planes don't fall out of the sky.

Some of that is also the domain. It's less that science is an extreme form of compression, and more that natural phenomenon are highly compressible. They're a small number of kinds of interactions repeated a bajillion times. How many equations does it take to explain electricity (ignoring equations that are derivatives of ones already included)? I think it's less than 5.

On some level, you could probably reduce all of the Standard Model down to models of atoms, their motion, and the basic subatomic particles (the non-quantum ones). That would explain almost everything that happens on Earth in a very short form, though few people would be able to go from that to explaining how lightning works.

by everforward

6/11/2026 at 6:40:25 PM

I agree it's an oversimplification. The example I think of is something like Newton's law of gravitation vs Ptolemaic epicycles: one simple explanation replaced many layers of tweaks.

It's also a relevant example for AI - one paper tested the ability of Transformers to model planetary orbits: unlike Newton's Law, the implicit forces they learn are nonsense.

https://arxiv.org/pdf/2507.06952

by ainch

6/11/2026 at 7:15:07 AM

Yes. But /lossful/ compression: (scientific, philosophical etc.) laws compress an abstract narration of events into that tiny, hard, fundamental, predictive detail.

(Then it depends on your concern: "Aagh, the aunt fell!" // "Oh yes, that'd be Newton")

by mdp2021

6/11/2026 at 7:27:16 AM

> "Aagh, the aunt fell!" // "Oh yes, that'd be Newton"

This is totally lost on me.

by esquivalience

6/11/2026 at 9:04:53 AM

> This is totally lost on me.

Appears to be lossy then ;)

(Sorry, you have to admit that was too easy to not say)

by user_7832

6/11/2026 at 9:01:39 AM

Compression minimizes the representation of information.

Laws (scientific, philosophical etc.) as compression represent the common side of classes of events - an abstraction of said events, stripping the irrelevant - irrelevant to some perspective, or irrelevant in a potential Procuste's bed. So, laws are compression, but a so extremely lossful compression that the loss can be relevant.

Brutally, "there may be more to the story of the fall of an elderly than just gravitation" - also in the sense that there are details behind the event.

Laws are compression - yes, with caveats.

On a more scientific, epistemological side: Einstein extended Newton covering more exceptions (reducing the abstraction - reducing the loss).

by mdp2021

6/10/2026 at 10:39:05 PM

3Blue1Brown just released a viduo about this Intelligence-Compression connection.

https://youtu.be/l6DKRf-fAAM

by quirino

6/10/2026 at 10:43:21 PM

The idea was fresh in my mind because I watched this yesterday. Great video, the illustrations and intuition-building of the compressability of information was so good! I'm so grateful for 3Blue1Brown.

by jamwise

6/11/2026 at 12:02:20 PM

That conclusion is similar to the concept of 'unconditional security' especially WRT one-time pads. The key must be at least as long as the message itself.

Other forms of encryption are based on assumptions and conditions being true (e.g. factoring is a hard problem, etc.) that may or may not be true. We don't know.

by seethishat

6/10/2026 at 11:36:05 PM

The level of compression is pretty impressive when you think about it. I wrote a comment a while back which is still true (although bytes should be bits, so in that sense it’s still wrong): https://news.ycombinator.com/item?id=39559969

Back of the envelope calculation for storing valid 4-grams (sequences of four words) is around 10 billion x 14 bits per word = 17 gb for all 10 billion. There are LLMs 100x smaller which can write coherent prose.

by janalsncm

6/11/2026 at 9:29:56 AM

If you combine the LLM probability distribution with arithmetic coding you can actually use them to compress text losslessly. When people reports 'bits per byte', it is actually the compression rate for text.

GPT-2 for instance achieves roughly 1 bit per byte, so it can be used to compress (english) text 8-fold. Modern models are likely much better.

by frotaur

6/11/2026 at 11:36:41 AM

LLM's seem to be the weird interesting outcome of applying lossy (de)compression concepts to text instead of the audio/image/video domains where they have traditionally been used.

by briansm

6/11/2026 at 11:55:20 AM

If you set temperature to 0.0 you almost have a key-value store, but finding the right key for your value might take some effort.

by jnovek

6/12/2026 at 6:00:25 AM

https://github.com/philipl/inferencefs/ by the same author in case you missed it

by 47282847

6/12/2026 at 5:45:17 PM

I did miss it, thank you!

by jnovek

6/11/2026 at 5:53:21 AM

> you basically need the same amount of data to represent the address of your data as the data itself

Almost like the other Borges work where “the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire”.

by divbzero

6/11/2026 at 12:03:16 PM

[flagged]

by aafaqzahid

6/10/2026 at 9:08:49 PM

Related. Others?

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=36357466 - June 2023 (107 comments)

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=28699499 - Sept 2021 (30 comments)

PiFS – The Data-Free Filesystem - https://news.ycombinator.com/item?id=26208704 - Feb 2021 (1 comment)

Πfs: Never worry about data again - https://news.ycombinator.com/item?id=21359338 - Oct 2019 (1 comment)

The π Filesystem for FUSE: Store Your Data in π - https://news.ycombinator.com/item?id=19223032 - Feb 2019 (1 comment)

pifs - Avoid disk space usage by saving your files in the digits of Pi - https://news.ycombinator.com/item?id=18687275 - Dec 2018 (1 comment)

πfs – A data-free filesystem - https://news.ycombinator.com/item?id=13869691 - March 2017 (105 comments)

Πfs: Stores your data in π - https://news.ycombinator.com/item?id=10856108 - Jan 2016 (1 comment)

Πfs: Never worry about data again - https://news.ycombinator.com/item?id=10847693 - Jan 2016 (1 comment)

File system that stores location of file in Pi - https://news.ycombinator.com/item?id=8018818 - July 2014 (98 comments)

100% Compression Using Pi - https://news.ycombinator.com/item?id=6698852 - Nov 2013 (32 comments)

(Reposts are fine after a year or so; links to past threads are just to satisfy extra-curious readers)

by dang

6/10/2026 at 9:15:14 PM

How are you generating these lists

by Levitating

6/10/2026 at 9:17:31 PM

If you click the website's name to the right of the title, it pulls up all the submissions from the same site:

https://news.ycombinator.com/from?site=github.com/philipl

by programjames

6/10/2026 at 9:23:29 PM

Even then I don't see a direct way to extract a list like this.

by Levitating

6/10/2026 at 9:36:00 PM

I think it's safe to assume that dang has access to tools that we mortals are unable to comprehend, without being driven to madness.

by ChrisMarshallNY

6/10/2026 at 10:02:53 PM

For this use case, of finding related threads, I thought he wrote not special tools, but rather uses just

https://hn.algolia.com/

by lukan

6/10/2026 at 10:17:18 PM

Even using algolia, I don't see a way to generate a list in this exact format.

I think ChrisMarshallNY is right, dang has access to eldritch powers.

by Levitating

6/11/2026 at 5:26:03 AM

the Glider HN app for Android shows related posts with high overlap to dang's list, so it must be possible for mere mortals after all.

by khimaros

6/11/2026 at 10:05:26 AM

It was really just a rhetorical joke, but I wrote an app that is a system, based on a custom backend and native frontend.

I wrote a special native management app, and often use that, to implement dashboard functionality, like the kind of thing that the HN mods do.

Yeah, I could, for example, feed the logs into an LLM, and get fancy reports, but it’s a lot easier to simply hit the charts button in the navbar, and view interactive graphs, customized exactly for my workflow.

by ChrisMarshallNY

6/11/2026 at 12:41:46 AM

I can only imagine opening dbeaver and running "select * from hn.posts where site like '%github.com/philipl`"

by whynotmaybe

6/10/2026 at 9:16:34 PM

He’s the mod hero from HN

by jwpapi

6/10/2026 at 9:41:53 PM

[flagged]

by gnaritas99

6/10/2026 at 9:53:40 PM

Citation needed

by LoganDark

6/10/2026 at 10:20:06 PM

See https://news.ycombinator.com/item?id=44861185 and the links back from there.

by dang

6/10/2026 at 10:01:53 PM

Reminds me of nsafs, the National Security Agency Filesystem ("free" because the government pays for it) - https://github.com/freedomtools/nsafs

by emptyroads

6/10/2026 at 10:46:21 PM

I once interviewed for a company and the interviewer was telling me how he (a vc) funded a project to generate large streams of random numbers; you would select an index at random, share that private key with somebody, and then the subsequent text could be used as a one-time-pad. NSA would be forced to buffer/save the entire stream, which could be generated at GB/sec, if they wanted to decrypt.

It didn't seem very practical.

by dekhn

6/10/2026 at 11:16:19 PM

I wonder if we could mess with NSA-style surveillance by having a good chunk of the population streaming lots of random data over the internet. Essentially, Alice piping her /dev/random to Bob's /dev/null over netcat or something. Make a slick looking app that does it 24/7 in the background using excess bandwidth and tell people it sticks it to the NSA.

Spy agencies would not only have to store it all in case it was something valuable, but at some point they may try to crack it because it's indistinguishable from encrypted data and waste resources on it. If enough people did it, total web surveillance could become impractical.

by helterskelter

6/11/2026 at 7:22:46 AM

That's known as cover traffic and is a tactic employed by many of the anonymity oriented overlay networks.

I'll note that any observer already has this problem to the extent that video streams are also encrypted. However most observers presumably recognize the endpoints as well as being able to classify the traffic by means of statistical analysis.

What might be useful would be a tool to generate arbitrary user data of various forms, including HMTL, video, audio, and various message formats. Then it could assemble a convincing traffic stream full of gibberish to exchange with peers at random. You wouldn't even necessarily need all that much of it to overwhelm any would be observers when considered relative to the volume of streaming service traffic that already exists.

by fc417fc802

6/11/2026 at 8:10:40 AM

Local llms and diffusion models can help you with that.

by nairboon

6/11/2026 at 11:01:31 AM

https://github.com/marcus0x62/quixotic bot and LLM obduscator

by initramfs

6/10/2026 at 11:31:07 PM

I suspect this would have an effect similar to early internet worms that caused significant strain

https://en.wikipedia.org/wiki/Melissa_(computer_virus)

by danielmeskin

6/11/2026 at 12:58:14 AM

This violates law #27, "do not unnecessarily increase the entropy of the universe"

by dekhn

6/11/2026 at 2:01:02 AM

It’s not unnecessary. If anything it seems necessary.

by iwontberude

6/11/2026 at 5:31:43 PM

That's kinda happen right now. Telegram's proxy produce huge amount of garbage data to hide and avoid blocking

by poilcn

6/11/2026 at 12:27:05 AM

> generate large streams of random numbers; you would select an index at random, share that private key with somebody, and then the subsequent text could be used as a one-time-pad.

This is what stream ciphers are

by agnishom

6/11/2026 at 12:42:14 AM

This is just WOM with extra steps.

https://en.wikipedia.org/wiki/Write-only_memory_(joke)

by gowld

6/10/2026 at 8:30:27 PM

It is worth noting that as the length of data increases it becomes extremely unlikely that the index and length of the sequence within pi would actually be smaller than the data.

by adzm

6/10/2026 at 8:38:44 PM

That seems easy enough to solve. Simply record the index and length in pi of the index and length in pi.

by Aloisius

6/10/2026 at 9:02:18 PM

See also: cofixpoints of co-algebras

by agnishom

6/11/2026 at 1:48:38 AM

check mate

by jesuslop

6/10/2026 at 11:25:46 PM

Back in college, I thought I could compress my phone number by telling people its index in pi, but my 7 digit phone number is at an 8 digit index.

I didn’t have the compute to find my 10 digit number with the area code.

by jastr

6/11/2026 at 4:05:51 AM

HEX should've solved for char length?

by xavortm

6/10/2026 at 9:21:43 PM

The index of your 20 line file is <20TB number>

by mondrian

6/10/2026 at 10:48:20 PM

Unless, in turn, you locate the index itself in pi at a much smaller index. And so on...

Find k candidate indices for your data, then locate each of them. If the smallest one is a significantly smaller index space, repeat.

by russfink

6/10/2026 at 11:27:40 PM

Can't tell if you're in on the joke or not, but for anyone who is genuinely wondering whether this might work: Consider that there are at most 256 different indexes that could be represented by a 1-byte index value, but if you're trying to store 9 bits of data, there are already 512 different possible things it could be that each need to be represented by a different index value, otherwise you won't be able to tell them apart. Those pigeons aren't gonna fit.

by akoboldfrying

6/11/2026 at 4:31:29 AM

That’s what variable length encoding is for!

by jonhohle

6/11/2026 at 10:41:47 AM

It's recursive as well, you now need to store how many levels of indirection of indices you had to resolve, which will in turn take 20TB to store, unless you store that in pi as well, which in turn...

by Galanwe

6/10/2026 at 8:43:46 PM

yes I believe that's the joke

by 12_throw_away

6/10/2026 at 9:17:42 PM

He’s aware, he just added some curious information.

by jwpapi

6/10/2026 at 11:04:38 PM

TFA addresses this

> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

by hatthew

6/10/2026 at 11:47:08 PM

Why stop at bytes? Let's split it in individual bits and then look up the bits in pi!

But Pi's binary expansion is not very practical for this purpose, since it's 11.0010...

OTOH. e is 10.1011...

Let's stick to fractional digits (the ones right of the binary point) at index 0 we have 1 and at index 1 we have 0.

So, to encode a stream of bytes so that each bit is encoded as the index of that bit in the e, all you need to do is to xor it with 0xFF

by ithkuil

6/11/2026 at 12:46:55 AM

Hang on hang on let me write a CUDA kernel for this. This is going to be really huge.

by nvader

6/11/2026 at 12:40:19 AM

genius

by hatthew

6/10/2026 at 10:36:07 PM

Point taken about the index potentially being really long. Why would the length be longer than the data? Don’t you need to find the right sequence?

by liamYC

6/11/2026 at 12:45:15 AM

For a given length of data, considering all possible data of that length, it's impossible for the median length to be shorter than the data length. There aren't enough strings of that length that early in the data.

by gowld

6/11/2026 at 2:15:49 PM

I wonder if it might make more sense to come at it from the opposite angle. Take pi as a sequence you want to compress with. But pi, being random, has redundancies in it that make it less than optimal. So instead, for a given size of block you want to look up, design the optimal number to use for compression. For instance, if you want to compress "594" in the digits of pi, the sequence 253 appears before it twice, which means any attempt to "compress" any three-digit sequence that only first appears after the second 253 is costing you more to get past the second 253, and "pi, but with all the 253s removed after the first one" is clearly a more efficient encoder for 3-digit numbers than pi itself.

So, instead of using pi, design an optimal number to encode with.

What you'll find is that the optimal sequence ends up being equally efficient as listing the blocks in order and indexing by block number itself. There are a number of other solutions; you could use superpermutations to get "all possible subsequences" with fewer digits in your target number, but you'll end up needing to provide the encoder and decoder a table of where the digit sequences appear since they are no longer regular and indexing into that table will cost exactly the same as just writing your number as the concatenation of all the blocks and its efficient method for indexing into them by indexing on the block rather than the digit number.

This actually has some natural overlap with the "normal numbers" in that one of the earlier normal numbers was: https://en.wikipedia.org/wiki/Champernowne_constant I'm not sure whether this is necessarily optimal for an arbitrary block size. (My quick intuitive check suggests it may be, but "my quick intuitive check" in the time of an HN post is not something I'd count on.) In this scheme, you can include the fact that the person using this constant to encode knows the nature of the constant, so they know that if you give index 0-9, it's single digit, and if you index into the two-length blocks, it must have a length of two. Since the encoder and decoder know that, they can also skip the middle of the block and just index into "the n'th number"... which degenerates into "the index of number N is N", which means this is not a compression scheme.

To put all that in a nutshell, if you want to deeply understand why this compression scheme doesn't work, I think you can attain a deep understanding of why by optimizing it.

by jerf

6/11/2026 at 9:01:10 AM

That just means you'll be creating even more valuable metadata to store your files. Win-win.

by account42

6/11/2026 at 7:31:16 AM

At least as of 15 years ago when I was in grad school that remained an open conjecture.

by bandrami

6/10/2026 at 8:26:19 PM

Reminds me of: https://www.spronck.net/sloot.html

Further reading: https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System

by MisterTea

6/10/2026 at 10:48:36 PM

I looked into this a bit a while ago, what Sloot did was at least a little novel. Basically the way his encoding scheme actually worked was that it would store each line of video into a database, encode each video frame as a series of line lookups, and then store that encoded frame into another database. Then each video is a series of frame lookups. When you hear accounts of him being able to demo smooth playback of 16 videos at once on late 90s hardware, this is how he did it. Because each frame is a series of line lookups, splitting the screen horizontally 16 times and playing 16 videos at once is not any more taxing than playing a single video fullscreen. Similarly, he was able to fast-forward and rewind smoothly because each frame is individually decoded, it's not like traditional video compression where you have to calculate differences from each keyframe. Playing at 2x speed was not any more taxing than 1x speed. Of course he never would have been able to store a video file in 8KB or whatever, but this meant that (for example) if you had a whole season of a TV show in your database, the opening and ending credits would only be stored once.

by ndiddy

6/10/2026 at 11:56:23 PM

Interesting. Do you have any resources you can share?

by MisterTea

6/11/2026 at 1:15:43 AM

There's a good podcast about the whole saga here (with a transcript): https://corecursive.com/sloot-digital-coding-system/ and Sloot's patent is here: https://patents.google.com/patent/NL1009908C2/en .

One thing to note is that Sloot consistently refers to his scheme as "encryption" rather than "compression". His encoding scheme originated as a method to encrypt TV repair manuals for his previous project, RepaBase. The idea was that they'd send out a compressed and encrypted database of repair manuals for free, then whenever a technician needed one he would call up RepaBase and pay for the key for that manual. That way, a tech would only need to pay for the manuals he needed instead of for the whole database. The video encoding scheme was basically the same idea except the key was stored on a smart card. Of course the scammy part was misleading investors into believing that all the video data was somehow stored in that decryption key.

by ndiddy

6/11/2026 at 12:51:13 AM

Block deduplication. This is how Enterprise storage arrays (such as NetApp Deduplication) and local file systems (like ZFS and Microsoft ReFS via Windows Server Data Deduplication) (and normalized databased in general) work.

by gowld

6/10/2026 at 9:36:29 PM

> The SDCS is only possible if keys are allowed to become infinite, or the data store is allowed to become infinite (...) This would, of course, make the idea useless.

But Pi is infinite. And thus this genius contraption will work as long as we have Moore's law on our side :)

by Levitating

6/10/2026 at 8:35:38 PM

Never heard of that one, that's amazing! Love it.

by giancarlostoro

6/10/2026 at 9:42:39 PM

I have very fond memories of reading that book.

by beng-nl

6/10/2026 at 10:31:27 PM

>One of the properties that π is conjectured to have is that it is normal

conjectured

Glad to see one of my pet points of pedantry come up. No non-constructed irrational number has never been proven to be normal or disjunctive.

by windward

6/10/2026 at 10:40:39 PM

That’s a lot of negatives!

by oofbey

6/10/2026 at 10:49:07 PM

One of which probably needs to go away.

by mikestew

6/10/2026 at 11:46:00 PM

doh!

by windward

6/11/2026 at 8:14:40 AM

Chaitin's constant does not count? Depends on your definition of constructed, but contrary to "easy" normal numbers such as Champernowne's constant, it's not defined by its sequence of digits.

by vbarrielle

6/11/2026 at 12:07:53 AM

What do you mean by "non-constructed" here?

by umanwizard

6/11/2026 at 12:33:38 AM

You can design a number. Just take all finite digit strings in order of length and numerical order: 0.123456789 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 ... 99 000 001 002 ... 999 0000 0001 ...

obviously it contains every finite digit string in base 10. I can't prove the digits are uniformly distributed in every base - you'd have to be more clever but you see the idea.

by pocksuppet

6/11/2026 at 12:52:48 AM

But pi is also "constructed", in the sense that you can write down a constructive definition for it, for example \sqrt{6 \times \sum_{k=1}^\infty \frac{1}{k^2}}.

So I suppose maybe OP meant we haven't proven any number to be normal (or not) that is not designed to be normal (or not) ?

by umanwizard

6/11/2026 at 11:58:33 AM

That's what it means.

by pocksuppet

6/11/2026 at 1:49:45 AM

You must be fun in RL.

by niggischiggi

6/11/2026 at 7:46:42 AM

Outdated! Should have linked directly to https://github.com/philipl/inferencefs/ obviously.

by mkesper

6/11/2026 at 10:12:44 AM

"This file doesn't look like what I remember.

Are you sure? It's been a while since you last opened it. Memory is funny like that. The file is fine — maybe take another look with fresh eyes."

from https://github.com/philipl/inferencefs/

Maybe I do not indeed remember properly. Anyway, back to watching "Eternal Sunshine of the Spotless Mind" for the first time, I think.

by utopiah

6/11/2026 at 8:51:22 PM

The FAQ just keeps getting creepier...

by matneyx

6/10/2026 at 8:23:18 PM

This is disturbing to realize that pi then contains all the past and future knowledge, including when I'll pass away.

by bobim

6/10/2026 at 8:32:56 PM

So does every other random infinite sequence of bits. The unintuitive part comes from infinity, not pi.

It also doesn't contain all past and future knowledge because it also contains all possible falsehoods about the past and future in a way that's indiscernible from the truth.

Encoding information as an offset into a pseudorandom sequence is no more storage efficient than storing the information directly.

by mike_hock

6/10/2026 at 11:45:16 PM

Keyword is conjectured.

Infinities of random sequences exist that can be shown not to contain all data, 0-8 (base 10) is one such random sequence that is trivially proven to never contain 9...

There are no known patterns to pi, but, (I am legitimately curious about this), are there any known sequences e.g. of 1 million 0s and a single other digit within the decimal sequence of pi?

Given how it (pi) looks, I'm of the strong suspicion is that the answer is "no". But of course, proving that requires that some property of the randomness is provable. Which it does feel as if, given there are different infinities, there are also different randomnesses, hence the conjecture is ill-formed and probably incorrect...

by smaudet

6/11/2026 at 12:54:32 AM

The longest consecutive sequence of decimals digits found in pi is a sequence of 13 8s. All other digits have a sequence of length 12.

https://bellard.org/pi/pi2700e9/pidigits.html

by gowld

6/10/2026 at 9:08:09 PM

Are you aware this is meant as a joke, right?

by sph

6/10/2026 at 9:54:25 PM

Jokes can be educational too.

by LoganDark

6/10/2026 at 8:34:06 PM

The worst part is that it contains Star Wars 4-6 from an alternate timeline where Disney did a reboot casting Chris Pratt as Han Solo.

(Fun fact: "Chrispratt" is an ancient Californian word that means "Joel McHale didn't want the role.")

by nosioptar

6/10/2026 at 8:35:51 PM

Thank you for this Prattfall

by 1attice

6/10/2026 at 9:38:35 PM

Around here it just means chrisp ratt.

by Yokohiii

6/10/2026 at 10:13:09 PM

You will love reading Jorge Borges The Library of Babel.

https://dn760100.eu.archive.org/0/items/TheLibraryOfBabel/ba...

by arialdomartini

6/11/2026 at 5:33:09 AM

I thought of this as soon as I saw the repo as well.

by teapourer

6/11/2026 at 6:31:23 AM

All knowledge already exists. Humans are merely discovering it.

All knowledge is information. All information is sequences of bits. All sequences of bits are numbers. All numbers already exist.

All files in a computer are sequences of bits. Intellectual work creates files. Intellectual work is number discovery.

Humans are interesting number generators. Humans are anti-random number generators.

by matheusmoreira

6/10/2026 at 9:27:05 PM

If it makes you feel better, consider that it also contains all plausible and implausible falsehoods about your demise as well.

by xp84

6/10/2026 at 9:40:32 PM

The person who starts reading ahead into pi will always gets the freshest numbers.

Perfect crypto!

by Yokohiii

6/10/2026 at 8:43:46 PM

So does a calendar, if you you buy them enough years in advance.

by OkayPhysicist

6/10/2026 at 9:17:37 PM

It also contains all possible falsehoods and comes with no way to distinguish what's true from what isn't.

by thih9

6/10/2026 at 9:44:42 PM

But enough about LLMs

by vadansky

6/10/2026 at 8:26:54 PM

this statement is equivalent to "pi is a normal number." While most real numbers are normal and pi is suspected to be so, it isn't known.

https://en.wikipedia.org/wiki/Normal_number

by skulk

6/10/2026 at 8:34:19 PM

Fear not! It’s probably so deep in pi that you’d pass away listening to someone tell you where!

by cadamsdotcom

6/11/2026 at 1:27:51 PM

Worse - it's also possible to calculate your time of death! That is: assuming enough compute & knowing how many digits to calculate.

by RetroTechie

6/11/2026 at 3:00:27 AM

A not so distant calendar also has the day you will pass away.

by deadbabe

6/10/2026 at 8:46:32 PM

And also all the days you don’t, so, by itself not very meaningful. Especially since you can’t tell which one is right in advance. In some sense, so does a calendar

by nighthawk454

6/10/2026 at 8:42:21 PM

It isn't actually proven true.

by koolala

6/10/2026 at 11:32:12 PM

It also contains all past and future fake news, and you don’t know which is which.

by layer8

6/10/2026 at 8:54:33 PM

So does a random number generator

by anthonj

6/11/2026 at 12:55:31 AM

You need to be more specific in order to make that statement falsifiable.

by gowld

6/10/2026 at 8:04:23 PM

Love it! This feels very much in the spirit of Tom7's Harder Drive [1]

[1] https://www.youtube.com/watch?v=JcJSW7Rprio

by Lalabadie

6/10/2026 at 8:59:14 PM

I vaguely remember an entry to a compression-benchmark that gamed the benchmark by treating the filename as part of the input to the decompression-algorithm, thus beating the metric that only measured the size of the file.

by aidenn0

6/10/2026 at 8:14:57 PM

Finally, someone is doing something about the rising prices of storage!

by partsch

6/10/2026 at 10:21:12 PM

Just a heads up, this is writing 16 bits for every 8 bits of input:

https://github.com/philipl/pifs/blob/fded8bf7b8f4fc64233e37b...

by nyc_pizzadev

6/10/2026 at 11:29:54 PM

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

Considering each individual bit separately would be even more performant: you only need the indexes 2 and 33, and there is an efficient mapping of those to the bits in storage.

by layer8

6/10/2026 at 8:30:21 PM

This is probably a dumb question, but do we actually know that pi has an infinite number of decimal digits or are we assuming that it does because we haven’t developed a sufficiently powerful computer to calculate the last digit of pi?

I’m guessing this is something that could be formally proven?

by hnlmorg

6/10/2026 at 8:32:08 PM

Here is a one page proof that pi is irrational - https://heuklyd.github.io/papers/pdf/Niven-1947.pdf

by hasteg

6/10/2026 at 8:37:39 PM

Thanks for the PDF. I feel like I understand even less now than I did before.

by partsch

6/10/2026 at 10:33:53 PM

For a superb explanation of Niven's proof (which leaves more questions than answers when you first read it), I like Michael Penn's video: https://youtu.be/dFKbVTHK4tU?is=d2DbV5HDP0IpP9tA ....notwithstanding the length of the proof, this is quite a hard problem.

by simonreiff

6/10/2026 at 8:36:08 PM

It's amazing how inscrutable calculus can be when you return to reading it after not doing so for a period of time, much like lisp or forth. I don't think I've actually done an integral or taken a derivative in years. I can see the elegance of that proof but I'll be damned if I can actually follow the mathematics from one step to the next.

by stackghost

6/10/2026 at 8:53:11 PM

[dead]

by liglam

6/10/2026 at 8:38:33 PM

Thanks for sharing. That’s a nice read. I’m glad I asked :)

by hnlmorg

6/10/2026 at 8:37:58 PM

We definitely know that Pi is irrational, we just don't know if it's normal (i.e. if the PiFS joke even works).

by mike_hock

6/10/2026 at 8:31:40 PM

Well, that should get GPT-5.5 extended thinking going for a few weeks.

by pixel_popping

6/10/2026 at 10:04:01 PM

It is actually not proven that the decimal expansion (or any rational base expansion) of pi contains all possible sequences of numbers. It sounds like it intuitively would be since the expansion is infinite, but it is not necessarily true. For example, the number 0.101001... (i.e., decimal formed by concatenating N zeros and then 1 for all N 0 to infinity) is infinite, never-ending, and irrational but does not contain every sequence of numbers.

by anon291

6/10/2026 at 8:26:05 PM

I... I can't tell if this is an elaborate troll or pure genius. I love it.

by giancarlostoro

6/10/2026 at 8:48:16 PM

Both.

by pokstad

6/10/2026 at 8:37:38 PM

https://cs.stackexchange.com/a/53737/1704

> Matches that occur early enough in π to attain significant compression will not be varied. That is, it isn't possible to use π to compress interesting, real-world data because real-word strings are unlikely to arise early.

by thangalin

6/10/2026 at 9:21:59 PM

> Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.

> Calculate the number of bits to encode that value using log2(938933556), which is ~29.8

Can someone explain these two statements to me?

by Levitating

6/10/2026 at 9:43:47 PM

for > Calculate the number of bits to encode that value using log2(938933556), which is ~29.8

This is roughly same as saying: "If you rewrite 938933556 as a binary number / usize, it will need 30 bits".

Sanity check: 1101111111|0110111111|0100110100 (| delimits every 10 bigits).

> Since the file is 128 bits long, one would expect this place to be around the 2*128th bit.

This statement is a bit more subtle. As a first ord approximation, we can see pi sort of as a RNG.

If we write pi (ignore the decimal point), as a binary number, we get: 11011001111111011110010101011110001010101111101101110001001100001...

You can... kind of squint and pretend this is a random sequence of 1s and 0s.

Now, if you had a file that is 128 bits (so lots of intermingling 0s and 1s), and each next digit of pi is effectively a coin flip. Pretend 1s are heads, and 0s are tails. You basically have to get the exact 128 consecutive coin flips of the same result as your file to get your file back.

Imagine now, PI not as a number, but a sequence of experiments of flipping the coin 128 times.

  - (11011..01000)(10000...00100)....
  - ^attempt 1     ^attempt 2

You have to try, on expectation, quite a few times to win this game! Now, you could easily get lucky for sure. But on average, your chance of winning per attempt is roughly 0.5^128! So, how many times do you have to try to win this game? Something like 2^128 times - and you have to consider that each attempt uses 128 bits as well. So more like 2^135. But you don't have to start fresh in each attempt, you can see it as like this:

  - 11011................00100...
  - (       128 flips     )
  -  (  another 128        )
  -   (                     )
  -     ... so on and so on

That's where the 2^128 number came from.

by csunoser

6/10/2026 at 10:13:35 PM

Thank you!!!

by Levitating

6/10/2026 at 10:16:45 PM

np :-)

by csunoser

6/10/2026 at 9:37:14 PM

[dead]

by thangalin

6/11/2026 at 8:04:30 AM

This would be easier using the Champernowne constant (https://en.wikipedia.org/wiki/Champernowne_constant) which is guaranteed to be normal, not just conjectured.

by vbarrielle

6/10/2026 at 8:37:05 PM

Short Storage Number - SSN

0x123456789ABCDEF0

use this number as a shorter nibble storage alternative...

by koolala

6/10/2026 at 8:03:46 PM

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

by tptacek

6/11/2026 at 2:22:12 PM

Theoretically, if we had a GPU so fast it could instantly calculate billions of digits of Pi, and a small hard drive, could this actually be made to work?

Cache all the last lookups but otherwise just store the index within pi? And for larger files - split them into chunks of whatever size could be handled?

(I mean, I realize this is a joke and can't make sense - but GPUs can be really really fast, and am willing to make a fool of myself by asking.)

And if we had a quantum computer that stores all of pi on one qubit, that could make things even faster ;/

by sam_goody

6/11/2026 at 6:44:05 AM

This got me thinking about the "simulation theory":

If our universe is simulated, it must be possible to snapshot the entire state for one iteration (however time now is quantized, open question). "... From here, it is a small leap to see that if π contains all possible files, why are we wasting exabytes of space storing those files, when we could just look them up in π!" (from pifs, above)

This means that not only does a singular snapshot of our universe exists in pi, but every single one does

The information for our entire universe's simulation is stored in pi (and every other number like it)

by baalimago

6/11/2026 at 6:58:19 AM

A theory is something you can test, model, write proofs, do calculations, eg quantum field theory, group theory.

Simulation "theory" isnt a theory its a conjecture.

Theres no meat to it.

by foxes

6/11/2026 at 7:22:16 AM

> This means that not only does a singular snapshot of our universe exists in pi, but every single one does

The word 'exist' is doing a lot of work here. Could any computer actually find the value in pi? Each computation takes energy, and there is a finite amount of energy in the universe. Does the value 'exist' in pi if it could never be rendered? How much different is that claim from "our simulated universe is a file on gods computer, located just past the edge of the observable universe or 18 inches away in the 4th dimension"?

A similar thought experiment I've had is with the lottery. With just a few sheets of paper, there exist a sequence of numbers that would completely shut down both major US lotteries - ossibly even get you arrested - if they contained the winning numbers for the next 50, 20, or even 10 consecutive draws. Think about the consequences. You would win, and win again the next draw, and they would be certain you cheated. Then confiscate your pad, and draw again and win a third time. They would have to shut the contest down because this is "mathematically impossible ". But it's not. Just like your thought experiment, it's "just numbers"

by gosub100

6/11/2026 at 7:11:14 AM

If we could restore the snapshot from around the year 2000 that'd be great, would like to make some changes.

by Lorin

6/10/2026 at 8:27:26 PM

I'm intrigued that π was capitalized to Π presumably automatically in the HN headline.

by adzm

6/10/2026 at 8:36:05 PM

    jshell> "πfs".toUpperCase()
    $1 ==> "ΠFS"

    Welcome to Node.js v26.3.0.
    Type ".help" for more information.
    > "πfs".toUpperCase()
    'ΠFS'

    Python 3.14.5 (main, May 10 2026, 10:21:34) [Clang 21.0.0 (clang-2100.0.123.102)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "πfs".upper()
    'ΠFS'

    echo 'πfs' | awk '{print toupper($0)}'
    ΠFS

by cbm-vic-20

6/10/2026 at 9:13:13 PM

Why does your Python terminal report May 10th? Today is June 10th.

by noman-land

6/10/2026 at 9:28:42 PM

It's the build date of their Python binary

by atvrager

6/10/2026 at 9:35:17 PM

He prepared the comment a month ago.

by Yokohiii

6/11/2026 at 4:15:14 AM

Well it was already ready in pi.

by elcritch

6/10/2026 at 9:27:59 PM

Probably daylight savings

by danlitt

6/10/2026 at 10:26:07 PM

Someone should make a service "where in the pi am I" then you could use it as a short link. Then there will be hardware accelerated pi chips. All computers will come with pi preinstalled.

by z3t4

6/10/2026 at 11:31:41 PM

isn't this relying on properties that aren't proven about pi? it needs to be disjunctive or normal, and neither of those are proven

by keithnz

6/10/2026 at 8:49:58 PM

Posted many times before: https://news.ycombinator.com/from?site=github.com/philipl

My favourite issue being about GDPR compliance https://github.com/philipl/pifs/issues/56

by charles_f

6/10/2026 at 10:28:04 PM

Funnily enough I’m reading Service Model and just got to the bit in the Library Archive, which has a very similar vibe to this project. Love it

by chris_sn

6/11/2026 at 7:13:35 AM

Reminds me of https://en.wikipedia.org/wiki/MS_Fnd_in_a_Lbry

Meta: every single comment seems to start with some variation of "Reminds me of". Had to get mine in.

by notatyrannosaur

6/11/2026 at 7:21:53 AM

This isn't really going far enough; the readme says - keep the metadata on a piece of paper or whatever. But: The metadata is data too, you can find it ALSO within \pi. So it's \pi all the way down.

Not even sure if there an interesting Collatz-like conjecture here.

by golem14

6/10/2026 at 10:27:03 PM

I've simplified it and made it more flexible

3._1_415926535897932384626433832795_0_288419716939

by woah

6/11/2026 at 8:45:29 AM

No thanks, I have all the files I need right here in /dev/urandom.

by torh

6/12/2026 at 8:12:40 AM

Per-byte lookup is the killer. Finding each byte individually in π would take longer than the universe's age.

by adamwright326

6/11/2026 at 7:59:06 AM

Reminded me of PortalRunner's latest video: https://www.youtube.com/watch?v=w6rkhvdAqHU

by outadoc

6/11/2026 at 2:46:59 PM

So this is how I find out that in Verdana lowercase pi looks exactly like lowercase Cyrillic п (pe), i.e. like an open rectangle rather than a bit curvy.

by hnbad

6/11/2026 at 7:29:55 AM

Technically π being normal is still unproven. So if the conjecture is false this whole thing falls apart. But that's what makes it a perfect nerd joke.

by markcollins05

6/10/2026 at 9:37:40 PM

This is why I got pi tattooed. It's a tattoo of all tattoos.

by actusual

6/10/2026 at 10:58:46 PM

yes, but can you get a tattoo of all tattoos that do not contain themself?

by dekhn

6/11/2026 at 8:01:39 AM

I am curious what this means for copyright. I.e. if all music/songs were already encoded in Pi even before the universe started existing.

by amelius

6/11/2026 at 1:53:36 AM

μῆνιν ἄειδε:

Sing, the wrath. Rendering in LaTeX.

[1]: https://news.ycombinator.com/item?id=48010729

by ctan4

6/10/2026 at 10:22:51 PM

I built something with a similar spirit for Pi day: https://pi.yassi.dev/

by yassi_dev

6/10/2026 at 8:32:46 PM

At what point is the metadata larger than the actual file?

by glitchc

6/10/2026 at 8:57:15 PM

Part of the joke is that, in this implementation, the metadata is guaranteed to be larger than the file:

> Now, we all know that it can take a while to find a long sequence of digits in π, so for practical reasons, we should break the files up into smaller chunks that can be more readily found.

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

by wavemode

6/11/2026 at 12:26:58 PM

It almost always will be, except for the extremely rare case of your string being somewhere early in the pi string.

by bean469

6/10/2026 at 8:40:51 PM

Half the time it should be larger, right?

by mike_hock

6/11/2026 at 8:03:02 AM

Instead of using Pi wouldn't it be better to choose a number for which the conjecture is true?

And for which the index is easy to compute?

by amelius

6/11/2026 at 9:15:07 AM

Where do you store the indices? Blockchain!

by yason

6/10/2026 at 8:34:29 PM

What a brilliant idea! Of course, of course, it’s not in the repository so I can’t apt-get install it. Debian...always so far behind.

by leephillips

6/10/2026 at 9:23:06 PM

> Why is this thing so slow? It took me five minutes to store a 400 line text file!

> Well, this is just an initial prototype, and don't worry, there's always Moore's law!

Seriously? They're only storing individual bytes in pi:

> In this implementation, to maximise performance, we consider each individual byte of the file separately, and look it up in π.

So the whole transformation should be trivially reducible to a 256-element lookup table from source byte to location in pi and a similar table used to convert back the other way. Maybe a fancy formula could be used for the (never actually encountered) case in which a byte is encoded by one of the infinite available noncanonical encodings.

by amluto

6/10/2026 at 10:41:45 PM

I’d guess even the index in pi for my phone number would be more digits than the phone number.

So not really a compression scheme.

by bilsbie

6/11/2026 at 4:20:37 AM

This is part of the plot in Murakamui's Hard Boiled Wonderland and the End of the World.

by psadri

6/10/2026 at 9:19:24 PM

Why would anyone need πfs, since you can already build such a system yourself quite trivially on Linux.

by j3th9n

6/11/2026 at 8:07:33 AM

If you think about it, a piano has all the possible songs in it too!

by mohsen1

6/11/2026 at 8:42:21 AM

No it doesn't. Think of the intro of We'll rock you and show me how you perform that on your piano.

by X-Ryl669

6/11/2026 at 2:16:40 AM

Developed a UI with Claude here:

https://ljsimpkin.github.io/pi-compress

It really shows how inefficient such a compression would be. Haha nice idea

by liamYC

6/11/2026 at 3:22:34 PM

Fascinating. Would this actually work?

by jklimosk

6/11/2026 at 1:25:46 AM

Has there been attempts to prove the conjecture?

by stogot

6/11/2026 at 7:32:07 AM

The metadata storage problem is the real punchline here. You end up needing more space for the metadata than the original data, so it's a zero-sum joke.

by adamwright326

6/11/2026 at 2:41:13 AM

Note, this (2012)

by keyle

6/11/2026 at 8:27:10 AM

The design is very human

by 0x1ceb00da

6/11/2026 at 6:38:50 AM

This is a classic

by dofcof

6/10/2026 at 8:04:42 PM

absolutely genius

by Levitating

6/11/2026 at 12:01:27 AM

Horrible. Brilliant. Love it.

by dwheeler

6/10/2026 at 9:38:03 PM

Looked at the repo but it says NOTHING about what value this project offers.

I mean, I get that it's "fun" to store information within the digits of pi. But is this just amusement, or is there a value prop for production use here?

(Speaking as a math major, by the way. I'm sympathetic to the cause.)

by mzelling

6/10/2026 at 10:34:39 PM

It's a(n IMO weak) argument raised when discussing illegal files/numbers.

This project makes clear the counter-argument: the input that gets you the file out of π is a badly compressed version of the file.

by windward

6/10/2026 at 9:39:20 PM

I think it's pretty clearly for amusement. And it would kind of spoil the amusement if it were to explicitly mention that it's a joke...

by tcoff91

6/10/2026 at 9:39:08 PM

It's a joke.

by mherkender

6/11/2026 at 12:03:47 PM

[dead]

by aafaqzahid

6/10/2026 at 11:35:28 PM

[dead]

by sonixaep

6/11/2026 at 5:03:12 AM

[dead]

by RedMagicBox

6/11/2026 at 1:27:25 PM

[flagged]

by yamakasi007

6/10/2026 at 10:53:08 PM

[dead]

by RedMagicBox

6/10/2026 at 9:37:20 PM

This is interesting, but I feel like my use cases would better align with a different irrational number. Could I get an option to do this with e instead? /s

by spchampion2

6/11/2026 at 10:16:10 AM

https://math.stackexchange.com/questions/1817064/bbp-formula...

by gatestone

6/11/2026 at 11:29:50 AM

[dead]

by insumanth

6/10/2026 at 9:17:18 PM

[dead]

by Lapsa