5/8/2026 at 4:41:43 PM
This is surprisingly common.The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what "high-quality entropy" actually means and that it is required for UUIDv4 to work as advertised.
It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did.
UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.
by jandrewrogers
5/8/2026 at 5:52:47 PM
This is why CloudFlare has done what they did with the lava lamp wall. Not that the wall is such a great source of entropy on its own - I'm sure it's not their only source, but you can never have too many sources of entropy - but it makes it visible in a way that can grab those who don't fully understand the concepts of RNGs and how entropy plays into that.The more sources of entropy, the more closely you approach "perfect" randomization. And a large chunk of those entropy sources need to be non-deterministic. Even on the small level, local applications running on local systems, like games, can use things like the mouse coordinates, the timings between button presses, the exact frame count since game start before the player presses Start to greatly enhance randomness while still using PRNGs under the hood
Yes, for the latter, that's technically deterministic (and the older the game considered, the more deterministic it is, see TAS runs of old games obliterating the "RNG"). But when you have fifty different parameters feeding into the initial seed, that's fifty things an attack would have to perfectly predict or replay (and there are other ways to avoid replay attacks that can be layered on top)
If CloudFlare had less than 100 different sources of entropy, I'd be disappointed. And that's assuming their algorithm for blending those entropy sources into a single seed value is good
by LocalH
5/8/2026 at 9:39:08 PM
> you can never have too many sources of entropyThis is so true. And the beauty is that with algorithms, we don't even need to know much about the entropy to be able to extract it.
There is the Von Neumann method of generating an unbiased coin from a biased coin. Of throwing it twice, and checking if you got HT or TH. And completely discarding all HH or TT results. It doesn't matter if the coin you are using is 20% or 80%, the result will be a true 50/50.
There are more modern algorithms that can be even better (in that they need less coin tosses if you have a very unbalanced coin).
And then there is modern cryptographic hashing. Feed it all the bits you can. Collisions end up only happening in the real world if every single one of those bits is identical. So if you have actual entropy being fed, that cannot be controlled, predicted, or replicated, modern cryptography tells you that the end result is unique.
by greiskul
5/9/2026 at 4:41:11 AM
> There is the Von Neumann method of generating an unbiased coin from a biased coin. Of throwing it twice, and checking if you got HT or TH. And completely discarding all HH or TT results. It doesn't matter if the coin you are using is 20% or 80%, the result will be a true 50/50.This blew my mind. Thank you!
I had to think about it a bit, so for anyone scratching their head right now trying to figure it out, consider it this way:
what matters is the ordering, of heads-then-tails, or tails-then-heads.
It doesn't matter that it's biased one way or the other, if you keep flipping pairs until you get a result with two different values, it's a 50/50 chance whether the less-likely result comes first, or second.
You might only have a 20% chance of any particular pair having a tails (for example), but in the cases where you do have a tails, it's a 50/50 chance that it comes first or second.
by ms_menardi
5/9/2026 at 5:19:58 AM
And for people who like equations, here is my attempt at explaining it.Assume each flip is independent and the bias remains same in each flip.
Let
P(H) = p,
P(T) = 1 - p.
Then P(HH) = p^2,
P(HT) = p(1 - p),
P(TH) = (1 - p)p,
P(TT) = (1 - p)^2.
Therefore P(HT or TH) = 2p(1 - p).
Now calculate P(HT | HT or TH) = p(1 - p) / (2p(1 - p)) = 1/2,
P(TH | HT or TH) = (1 - p)p / (2p(1 - p)) = 1/2.
by susam
5/9/2026 at 7:23:25 AM
You don't need conditional probability here, as the flips are independent.It's just p(H)p(T).
And p(H)p(T) = p(T)p(H), thus 2*p(H)p(T) = 2p(1-p).
by taegee
5/10/2026 at 1:14:14 AM
Independence tells us how to compute the probability of a sequence like HT or TH: P(HT) = P(H)P(T) = p(1 - p)
But the question I am addressing is not just "what is the probability of HT?" It is "given that the two flips are different, what is the probability that the order was HT rather than TH?"That is a conditional probability:
P(HT | HT or TH)
by susam
5/9/2026 at 6:02:15 PM
That wasn’t what he was trying to prove, but the proof could be done without conditionals like this:If: p(H)p(T) = p(T)p(H)
And: p(H)p(T) + p(T)p(H) = 1
Then: p(H)p(T) = p(T)p(H) = 0.5
by clickety_clack
5/9/2026 at 3:40:52 PM
Thats how i noodled thru it internallyby cdaringe
5/9/2026 at 9:55:36 AM
Thanks for your explanation. I did not get it in the first read, and was too lazy to think, until saw your comment.Just want to point out, that one is actually doing the experiment with a biased coin, then one must ignore all pairs.
e.g in case a coin which is heavily biased, say .9 H and .1 T. One should start with ignoring all the HH pairs, and start only at odd index. Lest, one picks a value like HHHHT (in the case the 2nd HH pair was not skipped, instead they greedily picked up the first HT, which will make the experiment HT biased).
by aws_ls
5/9/2026 at 6:25:37 AM
Afaics it's just basic commutativity – p(H)p(T) = p(T)p(H) – since instances are independent.Same, of course, holds for flipping it multiple times. But there you get more than Head or Tail (binomnk(n, k)).
by taegee
5/9/2026 at 1:36:13 PM
I was doubting this for a minute as I wondered with a significantly biased coin towards the head side would you be more likely to get HT. With probability problems like Monty Hall I like to think about extreme cases like say it's 99 heads to every 1 tails. You'd expect HT 0.99% of the time. Ditto TH.by sporkland
5/9/2026 at 2:03:27 PM
You can’t flip coins until you get the first different outcome… You have to flip twice each time, until you get a pair with different outcomes.by dash2
5/9/2026 at 3:37:28 PM
Oh wow that’s really amazing. What’s the source - I love Von Neumann.by pyuser583
5/9/2026 at 4:50:19 AM
Not very random if it's only TH or HT. Trivial to brute force with no more than two tries!by throwaway89865
5/9/2026 at 11:53:55 PM
I remember hearing about an interview problem from a while back, and the trick was to use exclusive-or. Now I understand why.by MathMonkeyMan
5/9/2026 at 10:50:10 AM
(Note that this still assumes that each biased-coin toss is i.i.d.)by hun3
5/8/2026 at 6:36:05 PM
If I understand it the Lava lamps are 90% PR/fun. They have a lot of other sources for entropy that scales better.by victorbjorklund
5/8/2026 at 8:38:32 PM
Yes, they also have wave machines, pendulums, and mobiles :)https://blog.cloudflare.com/harnessing-office-chaos/
https://blog.cloudflare.com/chaos-in-cloudflare-lisbon-offic...
by pverheggen
5/9/2026 at 8:09:13 AM
Wouldn’t thermal noise in a resistor make more practical sense?by geon
5/9/2026 at 1:45:31 PM
I prefer cosmic microwave background radiation (CMBR) as my RNG of choiceby drzaiusx11
5/11/2026 at 12:21:09 AM
until someone uses microwave lolby iberator
5/9/2026 at 12:42:13 AM
The original from SGI back in the mid 90's, before CPUs had RDRAND instructions etc... was a an actually practical solution.At the time I was at the Internet company that originally got online-gaming banned in the US, we were looking at CCDs and Cesium emitters that required a license etc...
While I am not sure, it seems cloudflare basically implemented one after SGI's[0] patent expired.
The patent and the licensing cost and adding SGI was a major blocker for us doing it, the startup closed before we found a real solution. But the best PRNGs like Blum Blum Shub were way too slow at the time. But things did improve quickly at that time.
by nyrikki
5/9/2026 at 3:38:52 PM
SGI was pretty amazing. I know some folks who worked there - Cray too. There’s a loyalty that just doesn’t exist any more - and arguably isn’t earned anymore.by pyuser583
5/8/2026 at 6:57:52 PM
Ant farm ? Hamster wheels ? Anything critter-driven should provide some entropy.by euroderf
5/8/2026 at 8:57:25 PM
Speaking of ants, Fourmilab (i.e. John Walker, of Autodesk fame) used to provide a random number generator powered by background radiation: https://www.fourmilab.ch/hotbits/by throw-the-towel
5/8/2026 at 7:02:14 PM
I once read that noise of camera in total darkness is apparently a good source.by BSVogler
5/8/2026 at 9:27:15 PM
You can already have a good entropy source from a single resistor.by amelius
5/9/2026 at 3:39:46 PM
This is what gets me - entropy is hard, but not that hard. I get it goes against everything a computer is built to do, but so does telling time.by pyuser583
5/9/2026 at 3:14:01 AM
Would a CRT TV tuned to channel 3 and no RF input be a good source?by wpm
5/10/2026 at 6:01:44 AM
I imagine that there might still be a way to swing by with RF equipment and tip the scales in your favor. And if you're important enough, I'm sure there'll be someone motivatd enough to do this. After all, Polymarket was motivating enough for someone to take a hair dryer to a weather station...https://www.theguardian.com/world/2026/apr/23/hairdryer-or-l...
by nxobject
5/9/2026 at 10:39:05 AM
In the sense that RF noise can be a source of entropy: Sorta*. But one doesn't need the whole thrift-store television set to do that; the visual aspect of a CRT displaying analog video snow just adds style points**.*: Sorta, because if someone discovers that the entropy is derived from an analog TV tuned to channel 3, then they also know how to influence it from outside.
**: Style points can have value; it's OK to have fun with work. But that's a secondary function.
by ssl-3
5/11/2026 at 12:20:33 AM
better to just switch to... random channel every while :) Not perfect but something.by iberator
5/8/2026 at 7:12:23 PM
The noise probably makes the lava lamp wall just as effective as pointing the camera at the Mona Lisa - the lamps themselves are not that unpredictable frame-to-frame.by unilynx
5/8/2026 at 8:18:13 PM
For the record, the lamps and camera are present in their lobby afaik, so you can actually go there, stand in front of them, and slightly affect the entropy.A cool parlor trick, certainly.
by LocalH
5/8/2026 at 8:40:54 PM
https://www.random.org/ Uses atmospheric noise. These dudes use dice? https://youtube.com/shorts/ncoDq5EcPFg?si=lI6f9cw8dWcaDZ4Yby conception
5/8/2026 at 8:04:18 PM
https://www.idquantique.com/random-number-generation/product...by FuriouslyAdrift
5/8/2026 at 8:54:26 PM
The lava lamps are just for show.You can get entropy just by plugging an oscilloscope into a pile of dirt and cranking the gain up.
by dheera
5/8/2026 at 9:13:16 PM
Any high-gain amplifier can be used, with its input connected to a resistor or a diode.For instance you can use the microphone input of a PC, together with an additional external amplifier made with an audio amplifier integrated circuit or an operational amplifier integrated circuit and with a diode or a resistor at its input. The microphone input of PCs provides a 5 V voltage that can be sufficient as a power supply for a noise source plugged in it.
Such a true RNG can be made on a small PCB with an audio jack, so you can plug it into any PC with microphone input and have a true RNG that you can trust better than the RNG included in modern Intel and AMD CPUs. In the past, many AMD CPUs had defective internal RNGs. Moreover, both for Intel and for AMD it is impossible to verify whether the internal RNG does what it claims to do or it generates predictable pseudo-random numbers.
by adrian_b
5/9/2026 at 9:34:05 AM
Meh. The problem is that it might start receiving you local radio station and end up deterministic enough to screw you. So you need to shield the dirt properly.by tliltocatl
5/9/2026 at 8:34:32 AM
> This is why CloudFlare has done what they did with the lava lamp wall.Interesting. I wonder how true it actually is that they use it like they claim here: https://www.cloudflare.com/learning/ssl/lava-lamp-encryption.... It's in one of their lobbies, so doesn't that make it susceptible to an attack in some way? I'm not knowledgeable enough to know, but I figured if they actually used that method, they'd have a more controlled environment.
I also don't fully understand it. A large part of that wall is static. And the camera isn't going to pick up on the stochastic properties of the lava as much as exists in the real world. So it feels like their images will be very statistically similar.
by bmitc
5/9/2026 at 7:44:44 PM
It's probably just one of many sources. Just by being in one physical location it would be vulnerable to a network outage (ignoring any potential for attacks)by Melatonic
5/8/2026 at 8:14:50 PM
Old games are RTA viable to RNG manip: https://m.youtube.com/watch?v=Bgh30BiWG58by __s
5/8/2026 at 7:59:50 PM
Yep - I've seen legitimate-looking dups on bad hardware, and "there are a ton of trailing zeros" is also an incredibly common duplicate mode for some UUID libraries (like earlier Go ones that didn't validate the "requested N bytes, returned 3, you must re-request to get N-3 more" return values. it doesn't happen on most hardware or OSes, so people never check it, so it just comes up in production some day with tens of thousands of collisions).by Groxx
5/8/2026 at 5:06:24 PM
Thanks for the insight! Mind expanding on what alternatives are being used in high reliability systems instead of UUIDv4?by thecloud
5/8/2026 at 6:00:59 PM
In high-reliability systems a criterion for identifier design is easy detection of defective identifiers. This includes buggy systems and adversarial manipulation.The problem with UUIDs that rely on entropy sources is that it is computationally expensive to detect if the statistical distribution of identifiers is diverging from what you would expect from a random oracle. I've written systems that can detect entropy source anomalies but you'll want to turn it off in production.
It is pretty cheap to sanity check most non-probabilistic identifier schemes. UUIDs that use broken hash algorithms (e.g. UUIDv3/5) or leak state (e.g. UUIDv7) are exposed to adversarial exploitation.
The identifier scheme is dependent on the use case. Does the uniqueness constraint apply to the instance of the object or the contents of the object? Is the generation of identifiers federated across untrusted nodes? How large is the potential universe of identifiers?
The basic scheme I've seen is a 128-bit structured value that has no probabilistic component. These identifiers can be encrypted with AES-128 when exported to the public, guaranteeing uniqueness while leaking no internal state. The benefit of this scheme is that it is usually drop-in compatible with standard UUID even though it is technically not a UUID and the internal structure can carry useful metadata about the identifier if you can decrypt it.
Federated generation across untrusted nodes requires a more complex scheme, particularly if the universe of identifiers is extremely large. These intrinsically have a collision risk regardless of how the identifiers are generated.
All of the standardized UUID really weren't designed with the requirements of scalable high-reliability systems in mind. They were optimized for convenience and expedience which is a perfectly reasonable objective. Most people don't need an identifier system engineered for extreme reliability, even though there is relatively little cost to having one.
by jandrewrogers
5/8/2026 at 7:36:57 PM
> leak state (e.g. UUIDv7)But according to PostgreSQL, UUIDv7 provides better performance in the database, so is this essentially a trade off between security and speed?
by eaf7e281
5/8/2026 at 7:48:55 PM
Yes, because UUIDv7 gives up some random bits in order to include the timestamp, which is done in a way that makes UUIDv7s quick to sort by timestamp.by jubilanti
5/8/2026 at 9:53:44 PM
How does including the timestamp expose me to adversarial exploitation?by ai_slop_hater
5/8/2026 at 11:56:40 PM
It reveals the time you created the UUID, for one. That can lead to a bunch of problems.by danpalmer
5/9/2026 at 1:16:38 PM
The same way using an auto increment integer ID does, but imagine that integer also leaked your created timestamp column too.by devmor
5/9/2026 at 12:10:26 AM
I’ve not come across any.by goalieca
5/9/2026 at 12:57:09 PM
How does a high-reliability system have a broken /dev/random? You're better off fixing it rather than trying to fix every downstream component that uses it. You can put your AES-128 counter there if you can count reliably.by dchest
5/8/2026 at 5:23:39 PM
The latest UUID (7?) Uses half random gen, half timestamp. This not only makes it sortable by creation, but would also make a collision like this impossible.by filcuk
5/8/2026 at 5:45:22 PM
It's still possible in most implementations of UUIDv7.UUIDv7 assigns the first 48 bits for the timestamp in milliseconds. You can generate a lot of UUID's in a millisecond though!
Then you have another 12 bits that you can use as you wish; "rand_a". The spec has a few methods they suggest on how to use these bits including 12 bits of random data, using it for sub-millisecond timestamps, or creating a monotonic counter, but each have their downsides:
- Purely random data means you can still run into collisions and anything within the same millisecond is unordered
- Sub millisecond you can run into collisions; there's nothing stopping you from generating two UUID's with the same 62 bits of rand_b data in the same sub-millisecond timestamp.
- Monotonic counters can overflow before the next tick, then what? Rollover? Once you roll over it's no longer monotonic and you can generate the same random data within the same monotonic cycle. Also; it's only monotonic to the system that's generating the UUID. If you have a distributed system and they each have their own monotonic cycles then you'll be generating UUID's with the same timestamp + monotonic counter, and again, are relying on not generating the same random data.
You can steal some of the 62 bits in rand_b if you want as well; you can use rand_a for sub-millisecond accuracy, and then use a few bits of rand_b for a monotonic counter. There's still a chance of collision here, but it's exceedingly low at the expense of less truly random data at the end.
If you want truly collision free, you'd also need to assign a couple of bits to identify the subsystem generating the UUID so that the monotonic counter is unique to that subsystem. You lose the ordering part of the monotonic counter this way though, but I guess you could argue that in nearly 100% of cases the accuracy of sub-millisecond order in a distributed system is a lie anyways.
by stanmancan
5/8/2026 at 7:13:30 PM
I think by the time you're building a system that needs to generate (and persist!) billions of identifiers per millisecond, you're solidly past the point where all your design decisions need to be vetted for whether they make sense on your extremely exotic setup.by naniwaduni
5/8/2026 at 11:48:05 PM
But 12 bits is not "billions of identifiers" -- it's 4096. Once you exhaust that counter in the same millisecond, you are still relying on a gamble that your random source will not generate the exact same bit sequence for the previous same counter value. And this thread started out with the OP explaining that random collisions are much more common than we'd like them to be, for various reasons.by tremon
5/8/2026 at 7:01:28 PM
We have a dedicated snowflake id generator service that returns batch ids. It's also distributed, each service adds its own instance number to the id. When it overflows it just blocks for the next ms. For our traffic, it's never a bottleneck.by rootlocus
5/8/2026 at 9:51:45 PM
Something I use on my own distributed system (where I wanted 64-bit IDs), is use 32 bits for the time in seconds (with an epoch from 2020, so good until 2088), 8 bits for the device ID and 24 bits for a serial number (resets to 0 every time the seconds increments).That's generally enough IDs per second for most of my edge nodes, but the central worker nodes need more, so I give them a different split and use 4 bits for the device ID and 28 bits for serial number instead.
If a node overflows its serial number that second, I kind of cheat and increment the seconds field early. Every time this happens, I persist the seconds field to the database, and when the app restarts, it starts its seconds count at the last persisted seconds plus one. If the current time in seconds is greater than the last used seconds, I also update it and reset the serial number. Works remarkably well for smoothing out very occasional spikes in ID generation while still approximately remaining globally sortable.
I also "waste" a bit of the 32-bit time field by considering it to be signed, even though it's not really because I don't expect this system to last long enough to reach times where the MSB gets set. But if I ever change my system, I'll set that bit and everything will stay ordered. I'll probably reset the epoch at that point too.
by ralferoo
5/8/2026 at 5:38:58 PM
Considering the context I think it's worth pointing out that it's technically not impossible - it's just even less likely.Everything in crypto is always a probability - never a certainty
by ffsm8
5/8/2026 at 5:45:30 PM
True, but it makes the specific collision the post observed completely impossible.by nitsky
5/8/2026 at 5:56:07 PM
I left a more detailed comment on the parent, but it's definitely not impossible!by stanmancan
5/8/2026 at 6:40:18 PM
The scenario in this post is that the first uuid was created one year before the duplicate uuid. That isn’t possible with v7by ryanmonroe
5/8/2026 at 6:50:44 PM
You're heavily leaning on "collision like this" to relate to the exact time stamps for your statement to be true.It's equality possible to interpret the "like this" to the collision itself, without a focus on the 1 year distance between the creation dates.
So I guess both views are valid.
by ffsm8
5/8/2026 at 11:15:57 PM
The inclusion of a timestamp in v7 makes collisions impossible unless the generating systems think that the time is the same down to the millisecond, which makes the temporal distance quite relevant.by calfuris
5/8/2026 at 11:39:34 PM
Plenty of systems end up generating multiple UUID's in a single millisecond.The issue with UUIDv7 is that you also have significantly less entropy since you only have a 62 bits (sometimes less, depending on implementation) of "random" data. So while the time aspect of format lowers the chances of collisions, generating two UUIDv7's in the same millisecond (depending on implementation) have a significantly higher chance of collision than two UUIDv4's.
It's still incredibly unlikely, but it's also incredibly unlikely you generate two matching UUIDv4's, but it does happen.
TLDR; It's possible to generate matching UUIDv7's, don't assume otherwise.
by stanmancan
5/10/2026 at 3:34:11 PM
I answered this in another HN topic just the other day: https://news.ycombinator.com/item?id=48061098But essentially, using UUID v7 you actually have less risk of collisions than with UUID v4.
Because of the birthday paradox, if you have N bits of randomness, you can expect a collision approximately after (2^((N/2)-1)) random numbers.
With v4, you have 122 bits of entropy over all time, so will see a collision after 2^60 allocations, approx 1.2 x 10^18.
With v7, you sacrifice 48 bits of entropy to give you 74 bits of entropy every millisecond, so you will see a collision after approximate 2^36 allocations per millisecond, approx 6.8 x 10^10 per millisecond.
You could argue that the risk of a collision is too high per millisecond because it's likely that 68 billion UUIDs are generated every millisecond. And maybe I'd agree. But the counter argument is that with v4 you'd expect a collision after 2^24 milliseconds, or 280 minutes, allocating at the same rate of 68 billion UUIDs per millisecond.
Obviously "all time" is longer than "280 minutes", so v7 is actually statistically less likely to cause collisions than v4, even though it seems counter-intuitive because it has a smaller space devoted to entropy. The key insight is that the time provides bits that are guaranteed to be unique, so only collisions within the same timestamp are significant, and every bit used to provide known-unique values is worth 2 bits of entropy.
by ralferoo
5/10/2026 at 5:08:34 PM
Sorry if I worded poorly but you’re definitely less likely to run into a collision with v7, but it’s not impossible, which is what I was trying to point out.Thanks for a more articulate answer!
by stanmancan
5/8/2026 at 6:57:53 PM
Surely the scenario where he generates the same number of items as he did between 2025 and now, but did it in 1 tick of v7 UUIDs also runs into it?by JamesSwift
5/8/2026 at 8:34:10 PM
The scenario being the collision itself, the time period isn’t particularly relevant aside from it occurring much quicker than expected.by stanmancan
5/9/2026 at 4:16:38 PM
Almost impossible, it depends on how fast they're being generated and the precision of the timestamp. The real problem is two years later when someone finds and removes that usleep(10000); /* sleep 10 µs */ that was the hard speed brake needed for the UUID generator, and suddenly duplicate IDs start showing up a few times per day or something similar.by erlkonig
5/9/2026 at 12:39:04 AM
The spec doesn't require the use of actually random numbers though.by majorchord
5/8/2026 at 6:54:01 PM
UUIDv7 is arguably better, because it is entropy plus time.by matt-p
5/8/2026 at 8:09:19 PM
It is what I usually use for its sorting, but some people don't want to leak time info.by otherme123
5/9/2026 at 12:39:34 AM
Entropy is not a requirement in the UUID spec.by majorchord
5/8/2026 at 5:10:32 PM
Sequences, generally.by lazide
5/9/2026 at 12:02:40 PM
That depends on your definition of high-availability. If high availability includes distributed writers, (global) sequences are not the best solution because generating unique sequence values requires synchronisation between all writers. In those cases, you might need to explicitly partition the ID space so that individual writers are guaranteed not to get in each others' hair.by tremon
5/9/2026 at 1:23:51 PM
That is merely a sequence generation strategy.by lazide
5/8/2026 at 5:10:31 PM
How is UUIDv4 to blame for a broken source of entropy? Or am I misinterpreting your words?by perching_aix
5/8/2026 at 5:35:34 PM
I wouldn't say it's "to blame", but it is more susceptible to bad RNG.If the RNG is bad, you'll get more benefit from adding non-random bits than you would from additional badly RNG'd bits.
The probability of future collisions also rises the more IDs you generate. If you incorporate non-random bits, you can alleviate that:
- timestamps make the collision probability not grow over time as you accumulate more existing UUIDs that could collide
- known-distinct machine IDs make the collision probability not grow as you add more machines
by hmry
5/8/2026 at 6:18:40 PM
I never blamed UUIDv4 for broken entropy sources. A broken entropy source breaks UUIDv4 even if you are using it correctly.There is a long history of broken entropy sources showing up in real systems. No matter how hard people try to prevent this it keeps happening. Consequently, a requirement for high-quality entropy sources is correctly viewed as an unnecessary and avoidable foot-gun in high-reliability software systems.
by jandrewrogers
5/8/2026 at 5:15:59 PM
Presumably they mean using randomness as unique IDs.by hombre_fatal
5/9/2026 at 12:34:13 AM
Reading the UUID spec leads me to believe that good entropy is not even a requirement for any version:> Implementations SHOULD utilize a cryptographically secure pseudorandom number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique").
From https://www.rfc-editor.org/rfc/rfc9562.html#unguessability
So I don't think technically we can say entropy or random numbers at all are even "required for UUIDv4 to work as advertised."
by ranger_danger
5/9/2026 at 2:24:36 PM
Any PRNG, including a CSPRNG is simple to predict if you know its inputs.You need entropy to seed your CSPRNG.
by toast0
5/9/2026 at 3:32:32 PM
I think you misunderstood the meaning of the word "SHOULD" in the spec.It means it's not strictly necessary, as in, a PRNG is not a requirement in order to support UUIDs in a compliant way.
To me this means UUID itself is not a viable solution if randomness is a requirement for you, because even if one claims they are using a UUID implementation that is compliant with the spec, and it is in fact compliant, that doesn't mean it's actually random at all.
by ranger_danger
5/9/2026 at 3:57:31 AM
For a while we’ve been fixing telemetry-reported crash bugs in the project I maintain, and now hardware bugs are showing up with some frequency. I was amazed how common they are. Sometimes data values (e.g. SP register) are corrupted, but other times even infallible operations (e.g loads of rodata constants) crash, indicating that the instruction itself was corrupted. So, yeah, I believe you’ll eventually see UUID collisions, but not because the underlying cryptanalysis was wrong.by adonovan
5/8/2026 at 8:04:59 PM
> UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.Hmm. What do those systems do for cryptography? Just assume it won't work and not rely on it at all?
by Hizonner
5/8/2026 at 8:18:15 PM
In these kinds of systems the cryptographic components often aren't even accessible from the software. It isn't a thing you need to worry about.This makes it easier to audit for use of entropy sources in the software since there really isn't a valid use case for it.
by jandrewrogers
5/9/2026 at 3:28:27 PM
> We just had an actual UUID v4 collision...(...)
> This is surprisingly common
And there goes though the window my blind faith with UUID v4 :)
I recently read about UUID v7 (https://en.wikipedia.org/wiki/Universally_unique_identifier#...) that became my favourite random identifier.
by BrandoElFollito
5/11/2026 at 12:17:47 AM
what else so you suggest instead of uuuid4?by iberator
5/8/2026 at 9:38:41 PM
[dead]by aaron695
5/8/2026 at 7:35:42 PM
Super simple to detect and try again.by erikerikson
5/8/2026 at 7:58:30 PM
A collision is simple to detect but it requires you to actually check, which is expensive at scale. The entire point of UUIDv4 is that you don't have to check for collisions because it should never happen. But if you don't check and it does happen you are in UB territory which is generally very bad.A risk of collision before it happens is non-trivial to detect but this is really what you'd want.
by jandrewrogers
5/8/2026 at 10:49:57 PM
Only expensive if you have unsorted keys or lack an index. Neither of which are unscalable.by erikerikson
5/8/2026 at 11:06:38 PM
You must have missed the “at scale” part. There is nothing inexpensive about extra network hops, cache misses, and page faults implied by your solution. Indexing at scale is almost always lossy for performance reasons. The location where you insert a new record is frequently not the same location as where you have to search for an existing record.It is resource amplification all the way down. In a lot of systems that index these keys the cost of that check is several times that of doing a blind insert.
by jandrewrogers
5/8/2026 at 11:10:25 PM
No I didn't miss it.DynamoDb works fine, using CQRS if necessary.
by erikerikson
5/8/2026 at 11:52:07 PM
literally the whole point of randomly generating UUIDs is that you don't need to check for collision. that's what the "U"s are for. that is the abstraction that is supposedly being provided. "using <insert Amazon AWS Certification Test Answer #7>" is not in any way a "scalable solution" for that with no other context. nor is just throwing out <random Martin Fowler concept #27>. the whole point is that it is a global (well, per name, "universal") abstraction that can, in practice, have holes that make it so you can't use it "universal"-ly.by keeganpoppen
5/9/2026 at 12:08:06 AM
I totally appreciate what you are complaining about. It's always been part of the documentation for a UUID. Having had Martin Fowler as a colleague and meeting with him weekly for a bit, I'd expect him to nod along with what I've written. It's standard knowledge and part of the technical corpus. As is actually distributed unique ID generation which is also not hard.by erikerikson
5/8/2026 at 11:06:29 PM
AKA centralising a decentralised identifier generator?by orf
5/8/2026 at 11:12:22 PM
There are better approaches like pre -avoiding collisions but generating tends to be more expensive than checking.by erikerikson
5/11/2026 at 9:39:58 AM
UUID is used where checking is difficult, think distributed devices offline at a plantation. How could checking be easier in that case? It would require infrastructure that doesn't exist. There are many other cases where it's easier to handle collisions.by soni96pl
5/11/2026 at 1:46:28 PM
Agreed. While it's an uncommon scenario it is a significant case and UUID shouldn't be used in that case because as you write checking doesn't work well. Better to use an alternative such as coordinate/reserve monotonic producer IDs that get paired with monotonic, per-producer monotonic sequence IDs to produce guaranteed unique, well-ordered IDs.[edit: in IOT it's common to issue x509 certificates as a type of authentication which could be used instead of using producer IDs. Solutions always have to be paired to use case.]
by erikerikson
5/8/2026 at 11:14:59 PM
In what world is generating a UUID more expensive than checking for duplicates? at any scale?Walk me through that please
by orf
5/8/2026 at 11:18:44 PM
Yeah, that was a little sloppy but it's generating is more expensive than not generating. In more words, generating an id and validating uniqueness is more expensive than only validating uniqueness.by erikerikson
5/11/2026 at 9:42:43 AM
Wild edge case. Curious if they ever found the root cause.
by tommoneytools
5/11/2026 at 1:49:25 PM
Welcome to Hacker News.I suspect you meant to reply to a different comment. Regardless the most plausible speculation I've read here is that the RNG used to generate the UUID is low quality.
by erikerikson
5/8/2026 at 11:53:19 PM
exactly lmao. that is exactly what is being presented as "scalable <full stop>". sigh.by keeganpoppen
5/9/2026 at 12:12:31 AM
No one has yet defined the scale but almost all of the real world scenarios people are actually encountering would be handled by either of the offered solutions.by erikerikson
5/9/2026 at 2:36:22 AM
In this specific case. In the case of trace IDs (an example of which is [1]) where the equivalent of UUIDs are explicitly used to avoid coordination, it’s hard to imagine how you’d reliably detect and retry.by squirrellous
5/9/2026 at 3:45:58 AM
A lot of databases have a uniqueness constraint that is basically a register level compare and replace. Others have a if_not_exists which is nearly the same. If you're not targeting a serious throughput use case, it's enough. If you are then there are lots of solutions/alternatives that completely avoid coordination. On the other hand, maybe tracing protocols are robust to out of order delivery. If that won't do them sequence numbers tied to monotonic sequence IDs should be plenty. If not then I'd need very serious conversations to be convinced you're not wasting everyone's timeby erikerikson