Ask HN: We just had an actual UUID v4 collision...

5/8/2026 at 4:41:43 PM

This is surprisingly common.

The security of UUIDv4 is based on the assumption of a high-quality entropy source. This assumption is invalidated by hardware defects, normal software bugs, and developers not understanding what "high-quality entropy" actually means and that it is required for UUIDv4 to work as advertised.

It is relatively expensive to detect when an entropy source is broken, so almost no one ever does. They find out when a collision happens, like you just did.

UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.

by jandrewrogers

5/8/2026 at 5:52:47 PM

This is why CloudFlare has done what they did with the lava lamp wall. Not that the wall is such a great source of entropy on its own - I'm sure it's not their only source, but you can never have too many sources of entropy - but it makes it visible in a way that can grab those who don't fully understand the concepts of RNGs and how entropy plays into that.

The more sources of entropy, the more closely you approach "perfect" randomization. And a large chunk of those entropy sources need to be non-deterministic. Even on the small level, local applications running on local systems, like games, can use things like the mouse coordinates, the timings between button presses, the exact frame count since game start before the player presses Start to greatly enhance randomness while still using PRNGs under the hood

Yes, for the latter, that's technically deterministic (and the older the game considered, the more deterministic it is, see TAS runs of old games obliterating the "RNG"). But when you have fifty different parameters feeding into the initial seed, that's fifty things an attack would have to perfectly predict or replay (and there are other ways to avoid replay attacks that can be layered on top)

If CloudFlare had less than 100 different sources of entropy, I'd be disappointed. And that's assuming their algorithm for blending those entropy sources into a single seed value is good

by LocalH

5/8/2026 at 9:39:08 PM

> you can never have too many sources of entropy

This is so true. And the beauty is that with algorithms, we don't even need to know much about the entropy to be able to extract it.

There is the Von Neumann method of generating an unbiased coin from a biased coin. Of throwing it twice, and checking if you got HT or TH. And completely discarding all HH or TT results. It doesn't matter if the coin you are using is 20% or 80%, the result will be a true 50/50.

There are more modern algorithms that can be even better (in that they need less coin tosses if you have a very unbalanced coin).

And then there is modern cryptographic hashing. Feed it all the bits you can. Collisions end up only happening in the real world if every single one of those bits is identical. So if you have actual entropy being fed, that cannot be controlled, predicted, or replicated, modern cryptography tells you that the end result is unique.

by greiskul

5/9/2026 at 4:41:11 AM

> There is the Von Neumann method of generating an unbiased coin from a biased coin. Of throwing it twice, and checking if you got HT or TH. And completely discarding all HH or TT results. It doesn't matter if the coin you are using is 20% or 80%, the result will be a true 50/50.

This blew my mind. Thank you!

I had to think about it a bit, so for anyone scratching their head right now trying to figure it out, consider it this way:

what matters is the ordering, of heads-then-tails, or tails-then-heads.

It doesn't matter that it's biased one way or the other, if you keep flipping pairs until you get a result with two different values, it's a 50/50 chance whether the less-likely result comes first, or second.

You might only have a 20% chance of any particular pair having a tails (for example), but in the cases where you do have a tails, it's a 50/50 chance that it comes first or second.

by ms_menardi

5/9/2026 at 5:19:58 AM

And for people who like equations, here is my attempt at explaining it.

Assume each flip is independent and the bias remains same in each flip.

Let

  P(H) = p,
  P(T) = 1 - p.

Then

  P(HH) = p^2,
  P(HT) = p(1 - p),
  P(TH) = (1 - p)p,
  P(TT) = (1 - p)^2.

Therefore

  P(HT or TH) = 2p(1 - p).

Now calculate

  P(HT | HT or TH) = p(1 - p) / (2p(1 - p)) = 1/2,
  P(TH | HT or TH) = (1 - p)p / (2p(1 - p)) = 1/2.

by susam

5/9/2026 at 7:23:25 AM

You don't need conditional probability here, as the flips are independent.

It's just p(H)p(T).

And p(H)p(T) = p(T)p(H), thus 2*p(H)p(T) = 2p(1-p).

by taegee

5/10/2026 at 1:14:14 AM

Independence tells us how to compute the probability of a sequence like HT or TH:

  P(HT) = P(H)P(T) = p(1 - p)

But the question I am addressing is not just "what is the probability of HT?" It is "given that the two flips are different, what is the probability that the order was HT rather than TH?"

That is a conditional probability:

  P(HT | HT or TH)

by susam

5/9/2026 at 6:02:15 PM

That wasn’t what he was trying to prove, but the proof could be done without conditionals like this:

If: p(H)p(T) = p(T)p(H)

And: p(H)p(T) + p(T)p(H) = 1

Then: p(H)p(T) = p(T)p(H) = 0.5

by clickety_clack

5/9/2026 at 3:40:52 PM

Thats how i noodled thru it internally

by cdaringe

5/9/2026 at 9:55:36 AM

Thanks for your explanation. I did not get it in the first read, and was too lazy to think, until saw your comment.

Just want to point out, that one is actually doing the experiment with a biased coin, then one must ignore all pairs.

e.g in case a coin which is heavily biased, say .9 H and .1 T. One should start with ignoring all the HH pairs, and start only at odd index. Lest, one picks a value like HHHHT (in the case the 2nd HH pair was not skipped, instead they greedily picked up the first HT, which will make the experiment HT biased).

by aws_ls

5/9/2026 at 6:25:37 AM

Afaics it's just basic commutativity – p(H)p(T) = p(T)p(H) – since instances are independent.

Same, of course, holds for flipping it multiple times. But there you get more than Head or Tail (binomnk(n, k)).

by taegee

5/9/2026 at 1:36:13 PM

I was doubting this for a minute as I wondered with a significantly biased coin towards the head side would you be more likely to get HT. With probability problems like Monty Hall I like to think about extreme cases like say it's 99 heads to every 1 tails. You'd expect HT 0.99% of the time. Ditto TH.

by sporkland

5/9/2026 at 2:03:27 PM

You can’t flip coins until you get the first different outcome… You have to flip twice each time, until you get a pair with different outcomes.

by dash2

5/9/2026 at 3:37:28 PM

Oh wow that’s really amazing. What’s the source - I love Von Neumann.

by pyuser583

5/9/2026 at 4:50:19 AM

Not very random if it's only TH or HT. Trivial to brute force with no more than two tries!

by throwaway89865

5/9/2026 at 11:53:55 PM

I remember hearing about an interview problem from a while back, and the trick was to use exclusive-or. Now I understand why.

by MathMonkeyMan

5/9/2026 at 10:50:10 AM

(Note that this still assumes that each biased-coin toss is i.i.d.)

by hun3

5/8/2026 at 6:36:05 PM

If I understand it the Lava lamps are 90% PR/fun. They have a lot of other sources for entropy that scales better.

by victorbjorklund

5/8/2026 at 8:38:32 PM

Yes, they also have wave machines, pendulums, and mobiles :)

https://blog.cloudflare.com/harnessing-office-chaos/

https://blog.cloudflare.com/chaos-in-cloudflare-lisbon-offic...

by pverheggen

5/9/2026 at 8:09:13 AM

Wouldn’t thermal noise in a resistor make more practical sense?

by geon

5/9/2026 at 1:45:31 PM

I prefer cosmic microwave background radiation (CMBR) as my RNG of choice

by drzaiusx11

5/11/2026 at 12:21:09 AM

until someone uses microwave lol

by iberator

5/9/2026 at 12:42:13 AM

The original from SGI back in the mid 90's, before CPUs had RDRAND instructions etc... was a an actually practical solution.

At the time I was at the Internet company that originally got online-gaming banned in the US, we were looking at CCDs and Cesium emitters that required a license etc...

While I am not sure, it seems cloudflare basically implemented one after SGI's[0] patent expired.

The patent and the licensing cost and adding SGI was a major blocker for us doing it, the startup closed before we found a real solution. But the best PRNGs like Blum Blum Shub were way too slow at the time. But things did improve quickly at that time.

[0] https://patents.google.com/patent/US5732138A/en

by nyrikki

5/9/2026 at 3:38:52 PM

SGI was pretty amazing. I know some folks who worked there - Cray too. There’s a loyalty that just doesn’t exist any more - and arguably isn’t earned anymore.

by pyuser583

5/8/2026 at 6:57:52 PM

Ant farm ? Hamster wheels ? Anything critter-driven should provide some entropy.

by euroderf

5/8/2026 at 8:57:25 PM

Speaking of ants, Fourmilab (i.e. John Walker, of Autodesk fame) used to provide a random number generator powered by background radiation: https://www.fourmilab.ch/hotbits/

by throw-the-towel

5/8/2026 at 7:02:14 PM

I once read that noise of camera in total darkness is apparently a good source.

by BSVogler

5/8/2026 at 9:27:15 PM

You can already have a good entropy source from a single resistor.

https://en.wikipedia.org/wiki/Johnson%E2%80%93Nyquist_noise

by amelius

5/9/2026 at 3:39:46 PM

This is what gets me - entropy is hard, but not that hard. I get it goes against everything a computer is built to do, but so does telling time.

by pyuser583

5/9/2026 at 3:14:01 AM

Would a CRT TV tuned to channel 3 and no RF input be a good source?

by wpm

5/10/2026 at 6:01:44 AM

I imagine that there might still be a way to swing by with RF equipment and tip the scales in your favor. And if you're important enough, I'm sure there'll be someone motivatd enough to do this. After all, Polymarket was motivating enough for someone to take a hair dryer to a weather station...

https://www.theguardian.com/world/2026/apr/23/hairdryer-or-l...

by nxobject

5/9/2026 at 10:39:05 AM

In the sense that RF noise can be a source of entropy: Sorta*. But one doesn't need the whole thrift-store television set to do that; the visual aspect of a CRT displaying analog video snow just adds style points**.

*: Sorta, because if someone discovers that the entropy is derived from an analog TV tuned to channel 3, then they also know how to influence it from outside.

**: Style points can have value; it's OK to have fun with work. But that's a secondary function.

by ssl-3

5/11/2026 at 12:20:33 AM

better to just switch to... random channel every while :) Not perfect but something.

by iberator

5/8/2026 at 7:12:23 PM

The noise probably makes the lava lamp wall just as effective as pointing the camera at the Mona Lisa - the lamps themselves are not that unpredictable frame-to-frame.

by unilynx

5/8/2026 at 8:18:13 PM

For the record, the lamps and camera are present in their lobby afaik, so you can actually go there, stand in front of them, and slightly affect the entropy.

A cool parlor trick, certainly.

by LocalH

5/8/2026 at 8:40:54 PM

https://www.random.org/ Uses atmospheric noise. These dudes use dice? https://youtube.com/shorts/ncoDq5EcPFg?si=lI6f9cw8dWcaDZ4Y

by conception

5/8/2026 at 8:04:18 PM

https://www.idquantique.com/random-number-generation/product...

by FuriouslyAdrift

5/8/2026 at 8:54:26 PM

The lava lamps are just for show.

You can get entropy just by plugging an oscilloscope into a pile of dirt and cranking the gain up.

by dheera

5/8/2026 at 9:13:16 PM

Any high-gain amplifier can be used, with its input connected to a resistor or a diode.

For instance you can use the microphone input of a PC, together with an additional external amplifier made with an audio amplifier integrated circuit or an operational amplifier integrated circuit and with a diode or a resistor at its input. The microphone input of PCs provides a 5 V voltage that can be sufficient as a power supply for a noise source plugged in it.

Such a true RNG can be made on a small PCB with an audio jack, so you can plug it into any PC with microphone input and have a true RNG that you can trust better than the RNG included in modern Intel and AMD CPUs. In the past, many AMD CPUs had defective internal RNGs. Moreover, both for Intel and for AMD it is impossible to verify whether the internal RNG does what it claims to do or it generates predictable pseudo-random numbers.

by adrian_b

5/9/2026 at 9:34:05 AM

Meh. The problem is that it might start receiving you local radio station and end up deterministic enough to screw you. So you need to shield the dirt properly.

by tliltocatl

5/9/2026 at 8:34:32 AM

> This is why CloudFlare has done what they did with the lava lamp wall.

Interesting. I wonder how true it actually is that they use it like they claim here: https://www.cloudflare.com/learning/ssl/lava-lamp-encryption.... It's in one of their lobbies, so doesn't that make it susceptible to an attack in some way? I'm not knowledgeable enough to know, but I figured if they actually used that method, they'd have a more controlled environment.

I also don't fully understand it. A large part of that wall is static. And the camera isn't going to pick up on the stochastic properties of the lava as much as exists in the real world. So it feels like their images will be very statistically similar.

by bmitc

5/9/2026 at 7:44:44 PM

It's probably just one of many sources. Just by being in one physical location it would be vulnerable to a network outage (ignoring any potential for attacks)

by Melatonic

5/8/2026 at 8:14:50 PM

Old games are RTA viable to RNG manip: https://m.youtube.com/watch?v=Bgh30BiWG58

by __s

5/8/2026 at 7:59:50 PM

Yep - I've seen legitimate-looking dups on bad hardware, and "there are a ton of trailing zeros" is also an incredibly common duplicate mode for some UUID libraries (like earlier Go ones that didn't validate the "requested N bytes, returned 3, you must re-request to get N-3 more" return values. it doesn't happen on most hardware or OSes, so people never check it, so it just comes up in production some day with tens of thousands of collisions).

by Groxx

5/8/2026 at 5:06:24 PM

Thanks for the insight! Mind expanding on what alternatives are being used in high reliability systems instead of UUIDv4?

by thecloud

5/8/2026 at 6:00:59 PM

In high-reliability systems a criterion for identifier design is easy detection of defective identifiers. This includes buggy systems and adversarial manipulation.

The problem with UUIDs that rely on entropy sources is that it is computationally expensive to detect if the statistical distribution of identifiers is diverging from what you would expect from a random oracle. I've written systems that can detect entropy source anomalies but you'll want to turn it off in production.

It is pretty cheap to sanity check most non-probabilistic identifier schemes. UUIDs that use broken hash algorithms (e.g. UUIDv3/5) or leak state (e.g. UUIDv7) are exposed to adversarial exploitation.

The identifier scheme is dependent on the use case. Does the uniqueness constraint apply to the instance of the object or the contents of the object? Is the generation of identifiers federated across untrusted nodes? How large is the potential universe of identifiers?

The basic scheme I've seen is a 128-bit structured value that has no probabilistic component. These identifiers can be encrypted with AES-128 when exported to the public, guaranteeing uniqueness while leaking no internal state. The benefit of this scheme is that it is usually drop-in compatible with standard UUID even though it is technically not a UUID and the internal structure can carry useful metadata about the identifier if you can decrypt it.

Federated generation across untrusted nodes requires a more complex scheme, particularly if the universe of identifiers is extremely large. These intrinsically have a collision risk regardless of how the identifiers are generated.

All of the standardized UUID really weren't designed with the requirements of scalable high-reliability systems in mind. They were optimized for convenience and expedience which is a perfectly reasonable objective. Most people don't need an identifier system engineered for extreme reliability, even though there is relatively little cost to having one.

by jandrewrogers

5/8/2026 at 7:36:57 PM

> leak state (e.g. UUIDv7)

But according to PostgreSQL, UUIDv7 provides better performance in the database, so is this essentially a trade off between security and speed?

by eaf7e281

5/8/2026 at 7:48:55 PM

Yes, because UUIDv7 gives up some random bits in order to include the timestamp, which is done in a way that makes UUIDv7s quick to sort by timestamp.

by jubilanti

5/8/2026 at 9:53:44 PM

How does including the timestamp expose me to adversarial exploitation?

by ai_slop_hater

5/8/2026 at 11:56:40 PM

It reveals the time you created the UUID, for one. That can lead to a bunch of problems.

by danpalmer

5/9/2026 at 1:16:38 PM

The same way using an auto increment integer ID does, but imagine that integer also leaked your created timestamp column too.

by devmor

5/9/2026 at 12:10:26 AM

I’ve not come across any.

by goalieca

5/9/2026 at 12:57:09 PM

How does a high-reliability system have a broken /dev/random? You're better off fixing it rather than trying to fix every downstream component that uses it. You can put your AES-128 counter there if you can count reliably.

by dchest

5/8/2026 at 5:23:39 PM

The latest UUID (7?) Uses half random gen, half timestamp. This not only makes it sortable by creation, but would also make a collision like this impossible.

by filcuk

5/8/2026 at 5:45:22 PM

It's still possible in most implementations of UUIDv7.

UUIDv7 assigns the first 48 bits for the timestamp in milliseconds. You can generate a lot of UUID's in a millisecond though!

Then you have another 12 bits that you can use as you wish; "rand_a". The spec has a few methods they suggest on how to use these bits including 12 bits of random data, using it for sub-millisecond timestamps, or creating a monotonic counter, but each have their downsides:

- Purely random data means you can still run into collisions and anything within the same millisecond is unordered

- Sub millisecond you can run into collisions; there's nothing stopping you from generating two UUID's with the same 62 bits of rand_b data in the same sub-millisecond timestamp.

- Monotonic counters can overflow before the next tick, then what? Rollover? Once you roll over it's no longer monotonic and you can generate the same random data within the same monotonic cycle. Also; it's only monotonic to the system that's generating the UUID. If you have a distributed system and they each have their own monotonic cycles then you'll be generating UUID's with the same timestamp + monotonic counter, and again, are relying on not generating the same random data.

You can steal some of the 62 bits in rand_b if you want as well; you can use rand_a for sub-millisecond accuracy, and then use a few bits of rand_b for a monotonic counter. There's still a chance of collision here, but it's exceedingly low at the expense of less truly random data at the end.

If you want truly collision free, you'd also need to assign a couple of bits to identify the subsystem generating the UUID so that the monotonic counter is unique to that subsystem. You lose the ordering part of the monotonic counter this way though, but I guess you could argue that in nearly 100% of cases the accuracy of sub-millisecond order in a distributed system is a lie anyways.

by stanmancan

5/8/2026 at 7:13:30 PM

I think by the time you're building a system that needs to generate (and persist!) billions of identifiers per millisecond, you're solidly past the point where all your design decisions need to be vetted for whether they make sense on your extremely exotic setup.

by naniwaduni

5/8/2026 at 11:48:05 PM

But 12 bits is not "billions of identifiers" -- it's 4096. Once you exhaust that counter in the same millisecond, you are still relying on a gamble that your random source will not generate the exact same bit sequence for the previous same counter value. And this thread started out with the OP explaining that random collisions are much more common than we'd like them to be, for various reasons.

by tremon

5/8/2026 at 7:01:28 PM

We have a dedicated snowflake id generator service that returns batch ids. It's also distributed, each service adds its own instance number to the id. When it overflows it just blocks for the next ms. For our traffic, it's never a bottleneck.

by rootlocus

5/8/2026 at 9:51:45 PM

Something I use on my own distributed system (where I wanted 64-bit IDs), is use 32 bits for the time in seconds (with an epoch from 2020, so good until 2088), 8 bits for the device ID and 24 bits for a serial number (resets to 0 every time the seconds increments).

That's generally enough IDs per second for most of my edge nodes, but the central worker nodes need more, so I give them a different split and use 4 bits for the device ID and 28 bits for serial number instead.

If a node overflows its serial number that second, I kind of cheat and increment the seconds field early. Every time this happens, I persist the seconds field to the database, and when the app restarts, it starts its seconds count at the last persisted seconds plus one. If the current time in seconds is greater than the last used seconds, I also update it and reset the serial number. Works remarkably well for smoothing out very occasional spikes in ID generation while still approximately remaining globally sortable.

I also "waste" a bit of the 32-bit time field by considering it to be signed, even though it's not really because I don't expect this system to last long enough to reach times where the MSB gets set. But if I ever change my system, I'll set that bit and everything will stay ordered. I'll probably reset the epoch at that point too.

by ralferoo

5/8/2026 at 5:38:58 PM

Considering the context I think it's worth pointing out that it's technically not impossible - it's just even less likely.

Everything in crypto is always a probability - never a certainty

by ffsm8

5/8/2026 at 5:45:30 PM

True, but it makes the specific collision the post observed completely impossible.

by nitsky

5/8/2026 at 5:56:07 PM

I left a more detailed comment on the parent, but it's definitely not impossible!

by stanmancan

5/8/2026 at 6:40:18 PM

The scenario in this post is that the first uuid was created one year before the duplicate uuid. That isn’t possible with v7

by ryanmonroe

5/8/2026 at 6:50:44 PM

You're heavily leaning on "collision like this" to relate to the exact time stamps for your statement to be true.

It's equality possible to interpret the "like this" to the collision itself, without a focus on the 1 year distance between the creation dates.

So I guess both views are valid.

by ffsm8

5/8/2026 at 11:15:57 PM

The inclusion of a timestamp in v7 makes collisions impossible unless the generating systems think that the time is the same down to the millisecond, which makes the temporal distance quite relevant.

by calfuris

5/8/2026 at 11:39:34 PM

Plenty of systems end up generating multiple UUID's in a single millisecond.

The issue with UUIDv7 is that you also have significantly less entropy since you only have a 62 bits (sometimes less, depending on implementation) of "random" data. So while the time aspect of format lowers the chances of collisions, generating two UUIDv7's in the same millisecond (depending on implementation) have a significantly higher chance of collision than two UUIDv4's.

It's still incredibly unlikely, but it's also incredibly unlikely you generate two matching UUIDv4's, but it does happen.

TLDR; It's possible to generate matching UUIDv7's, don't assume otherwise.

by stanmancan

5/10/2026 at 3:34:11 PM

I answered this in another HN topic just the other day: https://news.ycombinator.com/item?id=48061098

But essentially, using UUID v7 you actually have less risk of collisions than with UUID v4.

Because of the birthday paradox, if you have N bits of randomness, you can expect a collision approximately after (2^((N/2)-1)) random numbers.

With v4, you have 122 bits of entropy over all time, so will see a collision after 2^60 allocations, approx 1.2 x 10^18.

With v7, you sacrifice 48 bits of entropy to give you 74 bits of entropy every millisecond, so you will see a collision after approximate 2^36 allocations per millisecond, approx 6.8 x 10^10 per millisecond.

You could argue that the risk of a collision is too high per millisecond because it's likely that 68 billion UUIDs are generated every millisecond. And maybe I'd agree. But the counter argument is that with v4 you'd expect a collision after 2^24 milliseconds, or 280 minutes, allocating at the same rate of 68 billion UUIDs per millisecond.

Obviously "all time" is longer than "280 minutes", so v7 is actually statistically less likely to cause collisions than v4, even though it seems counter-intuitive because it has a smaller space devoted to entropy. The key insight is that the time provides bits that are guaranteed to be unique, so only collisions within the same timestamp are significant, and every bit used to provide known-unique values is worth 2 bits of entropy.

by ralferoo

5/10/2026 at 5:08:34 PM

Sorry if I worded poorly but you’re definitely less likely to run into a collision with v7, but it’s not impossible, which is what I was trying to point out.

Thanks for a more articulate answer!

by stanmancan

5/8/2026 at 6:57:53 PM

Surely the scenario where he generates the same number of items as he did between 2025 and now, but did it in 1 tick of v7 UUIDs also runs into it?

by JamesSwift

5/8/2026 at 8:34:10 PM

The scenario being the collision itself, the time period isn’t particularly relevant aside from it occurring much quicker than expected.

by stanmancan

5/9/2026 at 4:16:38 PM

Almost impossible, it depends on how fast they're being generated and the precision of the timestamp. The real problem is two years later when someone finds and removes that usleep(10000); /* sleep 10 µs */ that was the hard speed brake needed for the UUID generator, and suddenly duplicate IDs start showing up a few times per day or something similar.

by erlkonig

5/9/2026 at 12:39:04 AM

The spec doesn't require the use of actually random numbers though.

by majorchord

5/8/2026 at 6:54:01 PM

UUIDv7 is arguably better, because it is entropy plus time.

by matt-p

5/8/2026 at 8:09:19 PM

It is what I usually use for its sorting, but some people don't want to leak time info.

by otherme123

5/9/2026 at 12:39:34 AM

Entropy is not a requirement in the UUID spec.

by majorchord

5/8/2026 at 5:10:32 PM

Sequences, generally.

by lazide

5/9/2026 at 12:02:40 PM

That depends on your definition of high-availability. If high availability includes distributed writers, (global) sequences are not the best solution because generating unique sequence values requires synchronisation between all writers. In those cases, you might need to explicitly partition the ID space so that individual writers are guaranteed not to get in each others' hair.

by tremon

5/9/2026 at 1:23:51 PM

That is merely a sequence generation strategy.

by lazide

5/8/2026 at 5:10:31 PM

How is UUIDv4 to blame for a broken source of entropy? Or am I misinterpreting your words?

by perching_aix

5/8/2026 at 5:35:34 PM

I wouldn't say it's "to blame", but it is more susceptible to bad RNG.

If the RNG is bad, you'll get more benefit from adding non-random bits than you would from additional badly RNG'd bits.

The probability of future collisions also rises the more IDs you generate. If you incorporate non-random bits, you can alleviate that:

- timestamps make the collision probability not grow over time as you accumulate more existing UUIDs that could collide

- known-distinct machine IDs make the collision probability not grow as you add more machines

by hmry

5/8/2026 at 6:18:40 PM

I never blamed UUIDv4 for broken entropy sources. A broken entropy source breaks UUIDv4 even if you are using it correctly.

There is a long history of broken entropy sources showing up in real systems. No matter how hard people try to prevent this it keeps happening. Consequently, a requirement for high-quality entropy sources is correctly viewed as an unnecessary and avoidable foot-gun in high-reliability software systems.

by jandrewrogers

5/8/2026 at 5:15:59 PM

Presumably they mean using randomness as unique IDs.

by hombre_fatal

5/9/2026 at 12:34:13 AM

Reading the UUID spec leads me to believe that good entropy is not even a requirement for any version:

> Implementations SHOULD utilize a cryptographically secure pseudorandom number generator (CSPRNG) to provide values that are both difficult to predict ("unguessable") and have a low likelihood of collision ("unique").

From https://www.rfc-editor.org/rfc/rfc9562.html#unguessability

So I don't think technically we can say entropy or random numbers at all are even "required for UUIDv4 to work as advertised."

by ranger_danger

5/9/2026 at 2:24:36 PM

Any PRNG, including a CSPRNG is simple to predict if you know its inputs.

You need entropy to seed your CSPRNG.

by toast0

5/9/2026 at 3:32:32 PM

I think you misunderstood the meaning of the word "SHOULD" in the spec.

It means it's not strictly necessary, as in, a PRNG is not a requirement in order to support UUIDs in a compliant way.

To me this means UUID itself is not a viable solution if randomness is a requirement for you, because even if one claims they are using a UUID implementation that is compliant with the spec, and it is in fact compliant, that doesn't mean it's actually random at all.

by ranger_danger

5/9/2026 at 3:57:31 AM

For a while we’ve been fixing telemetry-reported crash bugs in the project I maintain, and now hardware bugs are showing up with some frequency. I was amazed how common they are. Sometimes data values (e.g. SP register) are corrupted, but other times even infallible operations (e.g loads of rodata constants) crash, indicating that the instruction itself was corrupted. So, yeah, I believe you’ll eventually see UUID collisions, but not because the underlying cryptanalysis was wrong.

by adonovan

5/8/2026 at 8:04:59 PM

> UUIDv4 is explicitly forbidden for a lot of high-assurance and high-reliability software systems for this reason.

Hmm. What do those systems do for cryptography? Just assume it won't work and not rely on it at all?

by Hizonner

5/8/2026 at 8:18:15 PM

In these kinds of systems the cryptographic components often aren't even accessible from the software. It isn't a thing you need to worry about.

This makes it easier to audit for use of entropy sources in the software since there really isn't a valid use case for it.

by jandrewrogers

5/9/2026 at 3:28:27 PM

> We just had an actual UUID v4 collision...

(...)

> This is surprisingly common

And there goes though the window my blind faith with UUID v4 :)

I recently read about UUID v7 (https://en.wikipedia.org/wiki/Universally_unique_identifier#...) that became my favourite random identifier.

by BrandoElFollito

5/11/2026 at 12:17:47 AM

what else so you suggest instead of uuuid4?

by iberator

5/8/2026 at 9:38:41 PM

[dead]

by aaron695

5/8/2026 at 7:35:42 PM

Super simple to detect and try again.

by erikerikson

5/8/2026 at 7:58:30 PM

A collision is simple to detect but it requires you to actually check, which is expensive at scale. The entire point of UUIDv4 is that you don't have to check for collisions because it should never happen. But if you don't check and it does happen you are in UB territory which is generally very bad.

A risk of collision before it happens is non-trivial to detect but this is really what you'd want.

by jandrewrogers

5/8/2026 at 10:49:57 PM

Only expensive if you have unsorted keys or lack an index. Neither of which are unscalable.

by erikerikson

5/8/2026 at 11:06:38 PM

You must have missed the “at scale” part. There is nothing inexpensive about extra network hops, cache misses, and page faults implied by your solution. Indexing at scale is almost always lossy for performance reasons. The location where you insert a new record is frequently not the same location as where you have to search for an existing record.

It is resource amplification all the way down. In a lot of systems that index these keys the cost of that check is several times that of doing a blind insert.

by jandrewrogers

5/8/2026 at 11:10:25 PM

No I didn't miss it.

DynamoDb works fine, using CQRS if necessary.

by erikerikson

5/8/2026 at 11:52:07 PM

literally the whole point of randomly generating UUIDs is that you don't need to check for collision. that's what the "U"s are for. that is the abstraction that is supposedly being provided. "using <insert Amazon AWS Certification Test Answer #7>" is not in any way a "scalable solution" for that with no other context. nor is just throwing out <random Martin Fowler concept #27>. the whole point is that it is a global (well, per name, "universal") abstraction that can, in practice, have holes that make it so you can't use it "universal"-ly.

by keeganpoppen

5/9/2026 at 12:08:06 AM

I totally appreciate what you are complaining about. It's always been part of the documentation for a UUID. Having had Martin Fowler as a colleague and meeting with him weekly for a bit, I'd expect him to nod along with what I've written. It's standard knowledge and part of the technical corpus. As is actually distributed unique ID generation which is also not hard.

by erikerikson

5/8/2026 at 11:06:29 PM

AKA centralising a decentralised identifier generator?

by orf

5/8/2026 at 11:12:22 PM

There are better approaches like pre -avoiding collisions but generating tends to be more expensive than checking.

by erikerikson

5/11/2026 at 9:39:58 AM

UUID is used where checking is difficult, think distributed devices offline at a plantation. How could checking be easier in that case? It would require infrastructure that doesn't exist. There are many other cases where it's easier to handle collisions.

by soni96pl

5/11/2026 at 1:46:28 PM

Agreed. While it's an uncommon scenario it is a significant case and UUID shouldn't be used in that case because as you write checking doesn't work well. Better to use an alternative such as coordinate/reserve monotonic producer IDs that get paired with monotonic, per-producer monotonic sequence IDs to produce guaranteed unique, well-ordered IDs.

[edit: in IOT it's common to issue x509 certificates as a type of authentication which could be used instead of using producer IDs. Solutions always have to be paired to use case.]

by erikerikson

5/8/2026 at 11:14:59 PM

In what world is generating a UUID more expensive than checking for duplicates? at any scale?

Walk me through that please

by orf

5/8/2026 at 11:18:44 PM

Yeah, that was a little sloppy but it's generating is more expensive than not generating. In more words, generating an id and validating uniqueness is more expensive than only validating uniqueness.

by erikerikson

5/11/2026 at 9:42:43 AM

   Wild edge case. Curious if they ever found the root cause.

by tommoneytools

5/11/2026 at 1:49:25 PM

Welcome to Hacker News.

I suspect you meant to reply to a different comment. Regardless the most plausible speculation I've read here is that the RNG used to generate the UUID is low quality.

by erikerikson

5/8/2026 at 11:53:19 PM

exactly lmao. that is exactly what is being presented as "scalable <full stop>". sigh.

by keeganpoppen

5/9/2026 at 12:12:31 AM

No one has yet defined the scale but almost all of the real world scenarios people are actually encountering would be handled by either of the offered solutions.

by erikerikson

5/9/2026 at 2:36:22 AM

In this specific case. In the case of trace IDs (an example of which is [1]) where the equivalent of UUIDs are explicitly used to avoid coordination, it’s hard to imagine how you’d reliably detect and retry.

[1] https://news.ycombinator.com/item?id=48033853

by squirrellous

5/9/2026 at 3:45:58 AM

A lot of databases have a uniqueness constraint that is basically a register level compare and replace. Others have a if_not_exists which is nearly the same. If you're not targeting a serious throughput use case, it's enough. If you are then there are lots of solutions/alternatives that completely avoid coordination. On the other hand, maybe tracing protocols are robust to out of order delivery. If that won't do them sequence numbers tied to monotonic sequence IDs should be plenty. If not then I'd need very serious conversations to be convinced you're not wasting everyone's time

by erikerikson

5/8/2026 at 10:39:03 AM

Funny story no one will believe, but it’s true. A good friend of mine joined a startup as CTO 10 years ago, high growth phase, maybe 200 devs… In his first week he discovered the company had a microservice for generating new UUIDs. One endpoint with its own dedicated team of 3 engineers …including a database guy (the plot thickens). Other teams were instructed to call this service every time they needed a new ‘safe’ UUID. My pal asked wtf. It turned out this service had its own DB to store every previously issued UUID. Requests were handled as follows: it would generate a UUID, then ‘validate’ it by checking its own database to ensure the newly generated UUID didn’t match any previously generated UUIDs, then insert it, then return it to the client. Peace of mind I guess. The team had its own kanban board and sprints.

by throwaway_19sz

5/8/2026 at 6:48:15 PM

> One endpoint with its own dedicated team of 3 engineers

> The team had its own kanban board and sprints.

My early jobs were at startups startups with limited resources. Every decision to build something or hire someone was carefully made after much consideration. This story would have looked like fiction to me at the time.

Later in my career I joined a startup like this where every new concern someone could think up turned into a new microservice with new hires to form a new team. It didn't matter how small it was, everything was a reason to hire new people and form a new team. I sat in meetings where the express goal of the quarter was communicated as growing the engineering team.

It was as weird time. We had this same situation where there were 3-4 person teams who had their own sprints and planning sessions where they would come up with more ways to make work for themselves. Some of them moved so slow that they could spend entire sprints doing tiny changes. Others were working on the most over-engineered solutions you'd ever seen for trivial problems.

There was one meeting where I suggested we re-assign some people on a stable project to work on something that we needed urgently, but I got shut down. That would have removed another excuse to hire more people, which would have conflicted with someone's KPIs to grow the engineering team to a specific number

by Aurornis

5/8/2026 at 7:21:50 PM

> My early jobs were at startups startups with limited resources. Every decision to build something or hire someone was carefully made after much consideration. This story would have looked like fiction to me at the time.

This was pre-2015

> Later in my career I joined a startup like this where every new concern someone could think up turned into a new microservice with new hires to form a new team. It didn't matter how small it was, everything was a reason to hire new people and form a new team. I sat in meetings where the express goal of the quarter was communicated as growing the engineering team.

This was post-2015

---

Am I right?

You're describing exactly what I've tried to express in various comments. There was a point in the latter half of the 2010s when it became genuinely hard to find tech work where you were building useful stuff. Startups become increasingly absurd and the focuses of their engineering teams even more so.

In 2019 I was working for a company who were so desperate to hire new engineers at one point they decided to just start offering jobs to candidates which failed interviews. It was absolutely insane.

by kypro

5/8/2026 at 8:43:35 PM

Ah, the heady days when we shipped a new AWS service with a team of 40, and when I came into work the next day we had 120 people and 80 of them were just inventing work out of whole cloth…

by swiftcoder

5/8/2026 at 10:56:49 PM

I need to hear more stories, I'm begging

by momojo

5/8/2026 at 7:32:55 PM

> someone's KPIs to grow the engineering team to a specific number

Sigh!

Specific numbers!

I believe a more common specific number is the yearly EBITDA or ARR (or some other acronyms in this alley I care zero about to memorize) nowadays, for investor's sake. Like in our company. Since we were acquired - and some time before - the only talk in company meetings are EBITDA, ARR, compared to a number dreamed up by someone and to be reached in 5 years time. Specific financial results in specific timeframe. Our goals are specific numbers being above today's numbers by a chosen margin. The company talk are marketing campaigns and reach, campaign efficiency measurements, pricing strategies, subscription centric licensing, sales strategies, churn, and other slang around customer bullying I also do not care about, also organizational streamlining - what a loaded word! -, bla bla bla, all for the specific sacred number put up on the pedestal.

What we have zero talk about? Functionality, engineering.

I seriously do not understand these people. Why are they fiddling around with selling software in a niche sensitive to global economic fluctuations insted of selling ... I don't know. Shoes? Or better yet sugary water ... no, better is vitamin water ... no, the trendiest is protein water. That is something that needs no balanced functionality and engineering that is laborous so it is resource intensive to achieve. And is in the way of reaching the sacred number put up there. Engineers are in the way towards our goals. We are pulling back the cart! We are cost center now!!

I do not stay long.

by mihaaly

5/8/2026 at 12:44:55 PM

At some point someone optimizes the system to a global company-wide incrementing 128 bit counter. Instead of needing a costly database lookup against a growing database the microservice just fetches the current counter, increments it by one and hands out the new value. Easy, fast O(1) operation.

This even allows you to shard the service to provide high availability and distribute the service globally to reduce latency. Just give each instance a dedicated id range it can hand out. I'd suggest reserving some of the high bits to indicate data center id, and a couple more bits for id-generator instance within that dc.

Wait a second, this starts to look familiar ... does Twitter still do that, or did they eventually switch?

by wongarsu

5/8/2026 at 5:08:27 PM

Define a random 128 bit key that you will never change. Use that key to encrypt 128 bit integers in sequence using AES-128, each one comes out as a, for all practical purposes, unique unpredictable ID.

by kuratkull

5/9/2026 at 4:22:41 AM

> each one comes out as a, for all practical purposes, unique unpredictable ID

I don't have much cryptography experience, but this seems _suuuuper_ suspicious. I think the "for all practical purposes" is doing a lot of lifting here? If it was this easy, surely this is what we'd use, and there wouldn't be UUID v4 to begin with.

by pinkmuffinere

5/9/2026 at 4:38:58 AM

The value of uuid is the lack of coordination. “…integers in sequence…” requires quite a bit of coordination if you have more than one computer ;)

by kristjansson

5/8/2026 at 4:43:57 PM

Twitter snowflakes haven't changed. Most of the bits go to the timestamp, which I guess is a global incrementing counter as you described

by sheept

5/8/2026 at 4:43:08 PM

> At some point someone optimizes the system to a global company-wide incrementing 128 bit counter.

Some UUID versions include time, so there's a bit of a counter in that.

by throw0101c

5/9/2026 at 4:40:58 AM

What is the arrow of time if not a single global monotonically increasing sequence?

by kristjansson

5/8/2026 at 12:50:26 PM

I've seen similar, buried deep within a major SV tech co.

Their process was a bit more complex because the master list of in-use UUIDs was stored in an external CMDB service run by a different department. They got a daily dump of that db, so were able to check that when generating a "provisional" id. Only once it had been properly submitted to the CMDB did it became "confirmed".

They had guardrails in place to prevent "provisional" ids being used in production, and a process for recycling unused "confirmed" ids. Oh, and they did regular audits which were taken very seriously by management.

Last I heard, they were 18 months into a 6 month project to move their local database cache to Zookeeper...

by roryirvine

5/8/2026 at 8:36:09 PM

They should upgrade to Zookeeper II: Zookeepier.

https://www.youtube.com/watch?v=_F-RyuDLR4o

by DonHopkins

5/8/2026 at 5:05:27 PM

I can believe it, and I often wondered "can I win the UUID misfortune lottery" I wonder if this is equally common with Microsoft's flavor aka GUIDs.

by giancarlostoro

5/8/2026 at 5:52:33 PM

GUIDs are UUIDs are effectively the same thing... the issues often come down the the means of generation and storage... where UUID have versions with specific implementation details that aren't always followed, MS has internal implementations that also aren't always followed. Also worth being aware of are COMB, SequencialIDs (MS-SQL) and other serialization approaches as well as how they affect indexes in practice.

Alternatives include sequencial number generator services, or sequence services that may be entirely sequencial, etc, but may lead to out of order inserts in practice.

Also, generally worth considering UUIDv7 assuming your sotrage and indexing use the time portion at the front of the index process.

by tracker1

5/8/2026 at 6:54:24 PM

You would think they could automate the entire process by “creating-ahead” a certain number of UUID values in the DB, storing them in memory to reduce DB latency, and then recording the assignment to the DB once it had been assigned.

And the microservice could easily be crafted to only accept assignment requests from other known endpoints.

by rekabis

5/8/2026 at 5:27:24 PM

We have had a service to add two numbers. What make you think this is not realistic? :-)

by mrbonner

5/8/2026 at 5:31:37 PM

I too have witnessed a "add two numbers" service! Turns out you can be too extreme with rules for isolating out business logic..

by morkalork

5/8/2026 at 7:18:31 PM

Same! It had validation on each number before adding them. Poor design, but that's how it worked.

by Schiendelman

5/9/2026 at 7:00:38 AM

I find this so hard to believe, but I've nearly always worked in small groups/companies. Can you, or any of the commenters above, explain why the reasoning that leads to such a service isn't rejected by, well, common sense? Some super-special requirements?

by tgv

5/9/2026 at 7:37:42 AM

In the case I mentioned at https://news.ycombinator.com/item?id=48062322, it was because the Infrastructure org had grown out of what had previously been Datacenter Operations.

So they had a team of SWEs who knew the system they were responsible for was absurd, but they weren't able to adequately explain that to the senior management folk who came from that DCOps culture and held asset management & configuration tracking to be paramount. The uniqueness was seen less as an inherent property, and more as a constraint that needed to be enforced.

My team of DevOps-y proto-Platform Engineers struggled with the org's culture in similar ways, so I had a lot of sympathy for the situation they found themselves in and how they were handling it. I believe their Zookeeper-based system was intended to be more of generic lightweight config registry which would eventually have replaced the gigantic SOAP-based CMDB nightmare - basically Consul a year or two before Consul existed.

The reason why they struggled to get it into production was that it would have been so obviously useful that they kept having additional requirements and use cases forced into their "MVP". That sort of scope creep, driven by tech leadership wanting to make their mark on a successful project, is also pretty common in large orgs.

by roryirvine

5/9/2026 at 3:08:30 PM

Fortunately, I've neverencountered that. But still, I can see the usefulness of a guaranteed globally unique UUID, at least for certain purposes. However, a service to add numbers baffles me. The operations needed to create, send, receive and check the message are so much more complex than addition...

I must say, I did experience some lousy tech+sales leadership in one company, which was indeed the biggest I ever worked at. A decent product with a well understood scope was completely scrapped and rewritten. Some team spent more than a year on the (waterfall) design of the new system, which was then scrapped too. When I joined, there was an 8 man team for just the message bus for the new new system. Which didn't even work correctly. The whole was flexible, but in nearly every other aspect inferior to the original product. And it needed much heavier hardware.

by tgv

5/11/2026 at 11:21:44 AM

Sure. In this case, this started as a method with two parameters; each were validated internally before addition.

The validation was long running, as it required checking two other services to confirm both of the numbers were OK.

Because of issues calling those services, instead of two nasty synchronous calls, it turned into calling a microservice asynchronously and using a callback. Then that microservice was owned by the team that owned those two other services.

Don't underestimate the power of Conway's law.

by Schiendelman

5/9/2026 at 12:56:54 AM

[dead]

by halfcat

5/8/2026 at 6:05:08 PM

I get the microservice to ensure this. But 3 people dedicated to it? I guarantee you they spent their days trudging dungeons, playing CoD and ping pong.

by CodeWriter23

5/9/2026 at 12:51:55 AM

You need at least 3 for this. People go on vacation, turnover, can’t risk losing that critical institutional knowledge.

by halfcat

5/8/2026 at 6:01:57 PM

I'd believe it.

What I'd find harder to believe is that it wasn't really a table with more information than just "list of assigned UUIDs". I'd be really surprised (pleasantly!) if it was only that. I'd figure most startups would make sure that table links to customer info so that they know which customer has a specific UUID, for easy searching and crossreferencing with the main db

by LocalH

5/8/2026 at 6:44:46 PM

That sort of table can be quite handy when every entity in the business's data stew is identified with a UUID, and there is no way of telling just from looking at an identifier what kind of entity it is. Particularly when the business has disparate databases and/or microservices with their own sets of UUIDs.

In such businesses, inevitably, someone will ask you to run process X for widget 8dbcd950-14c1-4877-a8b0-90c081ce033c, and that particular identifier will actually be an ID of some associated data, not the widget. You can push back and say, "That isn't a widget identifier, can you please look up the widget identifier?" It's better to be able to look that ID up in your ID ⮕ entity type lookup table, and say "the ID you provided is a widget production run ID, which produced a copy of widget a84969be-137a-41ca-97c4-515497184df9. Can you confirm this is the widget you need process X done for?", with a link to the product-facing widget page.

(Also handy for the case where some code was intended to log an ID for one entity, but actually logs the ID for an associated entity with the wrong entity type indicated.)

by tomjakubowski

5/10/2026 at 10:24:15 AM

Stripe handle this interestingly, with a prefix to the ID indicating the type of entity.

https://dev.to/4thzoa/designing-apis-for-humans-object-ids-3...

by connorgurney

5/9/2026 at 12:14:23 AM

Senior, Staff and Principal UUID Engineer.

UUID Database Admin.

by tintor

5/8/2026 at 6:38:36 PM

At one of my previous jobs, there was a function `createEntityWithRandomUUID` which would basically do the same thing as a light wrapper around database inserts. If a conflict occurred, it would generate a new ID and try again, up to 5 times I think. No logging to indicate whether any conflict actually ever happened.

by ssalka

5/9/2026 at 4:53:45 AM

No that kind of critical data would be sent to pendo so it could be reported on or shown in a dashboard!

by other_herbert

5/8/2026 at 11:35:20 AM

Who has the balls to form that team? Were they disbanded?

by franktankbank

5/8/2026 at 5:08:27 PM

I will gladly assume that this team was formed after several collisions with UUID's my assumption is that they had tremendous amount of data and enough revenue to justify all of this at least financially. I would have re-evaluated the UUID version used or if adopting Snowflakes would be better at some point.

by giancarlostoro

5/8/2026 at 1:18:00 PM

Pffft - they didn't need to store the whole UUID, just a hash. Dummies.

by ryandvm

5/8/2026 at 1:44:23 PM

They thought of that, but they were still working on hiring a team to maintain the hashing microservice.

by dd8601fn

5/8/2026 at 2:46:35 PM

Hashing microservice deployment was blocked by random generator microservice stuck in Pending because it needed an UUID from UUID microservice which was blocked by hashing.

by mstaoru

5/8/2026 at 5:04:05 PM

"Learned a lot today, love Galactus"

by alserio

5/8/2026 at 3:04:18 PM

already laughing from parent comment this is well done

by mrsvanwinkle

5/8/2026 at 5:24:47 PM

one hash is insufficent, they need k-hashes.

i get the joke, but seriously a bloomfilter would be a good idea.

by _3u10

5/8/2026 at 6:00:29 PM

This is the software industry version of "The Onion".

by dboreham

5/9/2026 at 3:58:59 AM

Any chance this company managed cap tables?

by chefpump

5/8/2026 at 6:17:59 PM

This is usually caused by an insufficently seeded PRNG.

Are you generating the UUID in the backend, or the frontend? Frontend is fundamentally unreliable for many reasons, including deliberate collisions. So if that case you'll need to handle collisions somehow. Though you can still engineer around common sources of collisions, the specifics depend on the environment.

On the other hand making a backend reliable is feasible. What kind of environment is your code running in? Historically VMs sometimes suffered from this problem, though this should be solved nowadays. Heavily sandboxed processes might still run into this, if the RNG library uses an unsafe fallback. Forking processes or VMs can cause state duplication and thus collisions.

by CodesInChaos

5/9/2026 at 12:01:40 AM

I remember hearing about Segment (analytics company) had their entire product based around UUIDs generated in web browsers. There were collisions all over the place, the product was seemingly incapable of producing useful data at a fundamental level because of it. Hopefully they've fixed that now.

by danpalmer

5/8/2026 at 7:16:08 PM

This reminds me of a passage from the book "Pro Git".

<https://git-scm.com/book/en/v2>

"Here’s an example to give you an idea of what it would take to get a SHA-1 collision. If all 6.5 billion humans on Earth were programming, and every second, each one was producing code that was the equivalent of the entire Linux kernel history (6.5 million Git objects) and pushing it into one enormous Git repository, it would take roughly 2 years until that repository contained enough objects to have a 50% probability of a single SHA-1 object collision. Thus, an organic SHA-1 collision is less likely than every member of your programming team being attacked and killed by wolves in unrelated incidents on the same night."

Deliberate collisions are addressed in the following paragraph.

SHA-1 hashes are not random, so the issue of poor pseudo-random number generation doesn't apply as it does to uuidv4. And SHA-1 hashes are 160 bits, vs. 128 for uuidv4.

But I love the idea of unrelated wolf attacks.

by _kst_

5/8/2026 at 11:48:44 PM

Reminds me of this page with an example for understanding how many permutations there are for a shuffled deck of cards: https://czep.net/weblog/52cards.html

> So, just how large is it? Let's try to wrap our puny human brains around the magnitude of this number with a fun little theoretical exercise. Start a timer that will count down the number of seconds from 52! to 0. We're going to see how much fun we can have before the timer counts down all the way. Shall we play a game?

> Start by picking your favorite spot on the equator. You're going to walk around the world along the equator, but take a very leisurely pace of one step every billion years. The equatorial circumference of the Earth is 40,075,017 meters. Make sure to pack a deck of playing cards, so you can get in a few trillion hands of solitaire between steps. After you complete your round the world trip, remove one drop of water from the Pacific Ocean. Now do the same thing again: walk around the world at one billion years per step, removing one drop of water from the Pacific Ocean each time you circle the globe. The Pacific Ocean contains 707.6 million cubic kilometers of water. Continue until the ocean is empty. When it is, take one sheet of paper and place it flat on the ground. Now, fill the ocean back up and start the entire process all over again, adding a sheet of paper to the stack each time you’ve emptied the ocean. Do this until the stack of paper reaches from the Earth to the Sun. Take a glance at the timer, you will see that the three left-most digits haven’t even changed. You still have 8.063e67 more seconds to go. 1 Astronomical Unit, the distance from the Earth to the Sun, is defined as 149,597,870.691 kilometers. So, take the stack of papers down and do it all over again. One thousand times more. Unfortunately, that still won’t do it. There are still more than 5.385e67 seconds remaining. You’re just about a third of the way done.

by mega_dean

5/9/2026 at 4:20:07 AM

Damn, I got the paper stack wet with all that ocean water. Guess I'm starting again from scratch...

by dalmo3

5/8/2026 at 8:48:21 PM

On the other hand, it turns out that pre-image attacks are quite feasible, and as several people who have thoughtlessly committed the pre-image attack test case files to git can attest… quite problematic

by swiftcoder

5/10/2026 at 5:41:16 AM

This idea of everyone producing absurd amounts of git objects is less fantastic now [1]. We're still far from these numbers, but an order of magnitude less far than last year [2].

Also an interesting bit of history here: apparently there was a time when people were already writing books on Git but "one enormous Git repository" wasn't yet the most common mode of using it.

[1] https://news.ycombinator.com/item?id=47932422

[2] https://x.com/kdaigle/status/2040164759836778878

by owl57

5/8/2026 at 9:15:35 PM

Hasn't the Git team been hard at work to optionally offer other hashes, like SHA256, in addition to SHA-1?

by TacticalCoder

5/9/2026 at 7:23:07 AM

They have been not at work on doing that.

by ted_dunning

5/8/2026 at 9:40:27 AM

What you're talking about is so extremely rare that it's much more likely that the entire Earth is destroyed by an asteroid right this inst...

by adyavanapalli

5/8/2026 at 5:01:07 PM

It is not quite as rare. I calculated it to be less common than being hit by a meteorite, and added a section about that and the Birthday Paradox to Wikipedia, to the article about UUIDs. It got removed / replaced a few years ago however. (If my source was correct, there was actually a woman hit by a meteorite, but she survived, with a leg injury.)

If you do have a UUID collision, chances are extremely high that it's either a software bug, or glitch in the computer. It could be a cosmic ray. Cosmic rays messing with the computer memory or CPU are actually relatively common.

by thomasmg

5/8/2026 at 12:43:43 PM

About as rare as an asteroid typing an ellipsis and clicking the add comment button.

by delichon

5/8/2026 at 4:44:35 PM

Well, this joke dates back to (at least) the dial-up days where {#`%${%&`+'${`%& NO CARRIER

by throw0101c

5/8/2026 at 4:50:25 PM

That’s just a result of jounce from localized gravity effects and atmospheric pressure disturbances in the moments before impact.

Think the ultrasonic typing hacking scene in Pantheon combined with the keyboard bouncing due to rumbling.

by xerox13ster

5/8/2026 at 2:14:31 PM

Well it would be statistically even rarer for that UUID collision to happen and the earth to be destroyed by an asteroid.

by sebazzz

5/8/2026 at 7:19:07 PM

It's very common if you improperly seed, as others in the thread brought up! Or in your framing, as rare as earth getting hit if it were surrounded by a sci-fi density asteroid field.

by spindump8930

5/8/2026 at 3:04:49 PM

For a single database using UUIDs, yes, it's astronomically rare. But it's quite a different thing to say that no computer system on Earth has ever experienced a UUID collision. The number of systems out there is also astronomical.

by crazylogger

5/8/2026 at 4:34:10 PM

>The number of systems out there is also astronomical.

Not even close

by nathanmills

5/8/2026 at 7:34:42 PM

Sure it does. Planets are astronomical, and we only have 8 of those in our solar system.

by moi2388

5/8/2026 at 5:10:57 PM

Some discussion here:

https://github.com/uuidjs/uuid/issues/546

Eg:

> FWIW, I just tested crypto.getRandomValues() behavior on googlebot and it is also deterministic(!)

by e12e

5/8/2026 at 11:45:35 PM

That makes sense. I'm not sure why anybody would generate UUIDs in browsers though, it seems to defeat the purpose.

by D2OQZG8l5BI1S06

5/9/2026 at 12:02:45 AM

Tell that to Segment. Hopefully they've fixed that, but they didn't seem to think it was a problem years ago (spoiler: it was a big problem).

by danpalmer

5/9/2026 at 1:15:27 PM

When you do CQRS, you basically have to generate some form of unique identifier in the client. Why not a UUID in the browser?

by perlgeek

5/8/2026 at 11:22:43 AM

According to the many-worlds interpretation of quantum mechanics, there's bound to be one branch of universe where every UUID is the same. Can you imagine what those guys are thinking?

by Geee

5/8/2026 at 4:47:47 PM

Not only that, there's vastly more where every UUID except one is the same, but they never got to that one because they didn't ever use them.

Or where the first two are unique, but every following one is one of the first two.

by BobaFloutist

5/9/2026 at 12:17:27 PM

One where all uuids are always different whenever the person who wrote the implementation tries it, regardless of computer. And whenever it’s someone else, always the same.

by beng-nl

5/8/2026 at 1:28:10 PM

This is why I am not a fan of the Everett approach

by nyantaro1

5/9/2026 at 1:53:25 PM

They probably just generate "the UUID" then add an increasing number on the end so it's unique. Problem solved.

by suprjami

5/8/2026 at 4:34:42 PM

"Huh, this is just an identity function. Cool. Let's move on."

by zeeveener

5/8/2026 at 1:52:12 PM

Something off on how the RNG is initialized? Lack of entropy?

If the rng is not customized it will use:

    const rnds8 = new Uint8Array(16);
    export default function rng() {
        return crypto.getRandomValues(rnds8);
    }

getRandomValues doesn't specify a minimum amount of entropy.

by juancn

5/8/2026 at 1:57:08 PM

It's a near certainty that something is badly wrong with the RNG, and, yes, probably in how it's seeded.

It's probably messing up the cryptography, too.

by Hizonner

5/8/2026 at 4:39:29 PM

But defaults should be sane and safe. RNG isn't the sort of thing you want to be messing up. Every JS dev was taught that Math.random is not safe by default, but the crypto package is.

by Onavo

5/8/2026 at 8:20:43 AM

I fully agree. It makes no sense. Yet...

The only guesses I'm having is that we originally generated UUIDv4s on a user's phone before sending it to the database, and the UUID generated this morning that collided was created on an Ubuntu server.

I don't fully know how UUIDv4s are generated and what (if anything) about the machine it's being generated on is part of the algorithm, but that's really the only change I can think of, that it used to generated on-device by users, and for many months now, has moved to being generated on server.

by mittermayr

5/8/2026 at 9:38:38 AM

You let users generate a UUID?

To be honest, the chance that you are doing something weird is probably higher than you experiencing a real UUID conflict.

How did your database 'flag' that conflict?

by AntiUSAbah

5/8/2026 at 9:43:30 AM

user-generated (as in: on the user's phone) was only at the very early stages of this product, and we've since moved to on-server. It's a cash-register type of app, where the same invoice must not be stored twice. So we used to generate a fresh invoice_id (uuidv4) on the user's device for each new invoice, and a double-send of that would automatically be flagged server-side (same id twice). This has since moved on to a server-only mechanism.

The database flagged it simply by having a UNIQUE key on the invoice_id column. First entry was from 2025, second entry from today.

by mittermayr

5/8/2026 at 6:04:26 PM

Assuming the phone is using the default JS engine, it's whatever is being shimmed for node:crypto package's random bytes method... which is likely weaker.

I wrote a different implementation that cheats by using browser's methods of getting a uuid.

https://github.com/tracker1/node-uuid4/blob/master/browser.m...

by tracker1

5/8/2026 at 5:59:51 PM

If the server or the user's phone had the wrong time and if the date is used in generating the ID...

by bitsandbits

5/8/2026 at 6:31:45 PM

uuidv4 is random. uuidv7 includes time.

by whatevaa

5/8/2026 at 12:31:38 PM

If it's UUIDv4 and you validate that the UUID is valid and not conflicting I don't really see the issue with user-generated UUIDs. Being able to generate unique keys in an uncoordinated manner is the main selling point of UUIDs

Sure, it's something I'd flag in any design to spend two minutes to talk about potential security implications. But usually there aren't any

by wongarsu

5/8/2026 at 1:23:38 PM

Validation etc. every thing which should not be controlled by a user, will not be controlled by a user.

by AntiUSAbah

5/8/2026 at 7:20:12 PM

The whole point of UUIDv4 is that you don't need to check if it's conflicting and can just use them right away. This falls apart if you let untrusted sources of UUIDv4's enter your system IMO

by JambalayaJimbo

5/8/2026 at 6:00:28 PM

Likely a unique index... duplicate insert on a primary or 1:! foreign key. I am currently shimming out a process that will add a trackingid for a job service, and just had my method stub retorn Guid.Empty... second time I ran my local test it blew up on the duplicate key... then I switched it to null, then it blew up again... I neglected to exclude null from the unique index on the foreign key.

In any case, it's easy enough to do. I mostly use UUDv7, COMB or NEWSEQUENTIALID ids myself though.

by tracker1

5/8/2026 at 10:58:42 PM

The smart way would be to check if the id is in use, and generate a new one... Repeat a few times if you're extremely unlucky, and bail out with an error if you have the absolute worst rng. It also works for locally generated ids as well.

by nubinetwork

5/8/2026 at 12:36:54 PM

If it was two on-device generated UUIDs I could see a collision happening. There have been instances of cheap end devices not properly seeding their random number generators, leading to colliding "random" values. And cases of libraries using cheap RNGs instead of a proper cryptographic RNG, making it even worse

But on a server that shouldn't happen, especially not in 2026 (in the past, seeding the rngs of VMs used to be a bit of an issue). Even if one UUID was badly generated, a truly random UUID statistically shouldn't collide with it. You'd need an issue in both generators

by wongarsu

5/8/2026 at 6:07:24 PM

The library is using node:crypto, but with a phone target, that's likely shimmed with a JS implementation...

by tracker1

5/8/2026 at 8:46:52 AM

The UUIDv4 collision is statistically extremely unlikely. What is more likely is both systems used the same seed. This might be just a handful of bytes, increasing the chance of collision to one in billions or even millions.

by stubish

5/8/2026 at 6:08:41 PM

The shim for node:crypto in the browser is likely a weaker implementation in JS than the node implementation... you can cheat and use the browser itself to get a UUIDv4...

    function uuid4() {
      var temp_url = URL.createObjectURL(new Blob());
      var uuid = temp_url.toString();
      URL.revokeObjectURL(temp_url);
      return uuid.split(/[:\/]/g).pop().toLowerCase(); // remove prefixes
   }

by tracker1

5/8/2026 at 10:40:50 AM

Better check what crypto.js is actually doing in your exact setup. Weak polyfills exist...

by lazyjones

5/8/2026 at 4:50:05 PM

Good moment to revisit this fun article: https://jasonfantl.com/posts/Universal-Unique-IDs/

If the entire universe were turned into a giant computer and did nothing but generate uuids until its heat death, how many bits would you need for the ID space?

by dweez

5/8/2026 at 6:08:22 PM

If you're gonna go there, this is obligatory https://www.decisionproblem.com/paperclips/

by CodeWriter23

5/8/2026 at 8:54:54 PM

"But are you worried that every human on Earth will be hit by a meteorite right now? That probability is also non-zero, yet it is so infinitesimally small that we treat it as an impossibility."

This might be a bad example because one meteorite could take out the world and given enough time is likely to.

by ipaddr

5/8/2026 at 6:04:14 PM

Are your UUIDs generated client side or server side? If it's client side, it could be due to a crawling bot. Googlebot for example executes Javascript using deterministic "randomness".

by beejiu

5/8/2026 at 9:18:48 PM

Googlebot's lack of randomness was the conclusion of a previous incident for that package https://github.com/uuidjs/uuid/issues/546

by adzm

5/8/2026 at 11:24:49 PM

Yeah, the answer almost certainly has to be this, or that they were using an old version of the package which didn't use the system RNG correctly (the current version appears to do it correctly, but I didn't dive into older versions), or their project has loaded an old broken polyfill re-implementing the JS crypto API, or they were running this on a hosting setup that does something jank like resuming the same VM snapshot with its RNG state on multiple servers. This category of explanation is many orders of magnitude more likely than a true random collision.

by AgentME

5/8/2026 at 4:37:52 PM

Most plausible cause: uuid package depends on some random number generator package, which has recently been compromised in order to make “random” numbers predictable. As a result, many crypto (ssl + currency) projects are compromised due to a supplychain attack.

by jbverschoor

5/8/2026 at 5:02:29 PM

Changed 3 weeks ago:

uuid/src/rng.ts : the random array is const. Every call will share the same random number. Subsequent call will update your old random code, so if you generated something important... good luck

The old code used to do a slice() which creates a new copy.

Might be unintentional. Although I have no idea how this would pass any tests, as you would think to test generating 2 randomnumbers and hope they are not the same.

by jbverschoor

5/8/2026 at 5:09:53 PM

Didn't actually want to write a test myself.. but I miss Claudia confirmed it. Pretty concearning.

Synchronous / serial calls:

   import rng from './rng';
   
   const a = rng();
   console.log('a after first call: ', Array.from(a));
   
   const b = rng();
   console.log('a after second call:', Array.from(a));
   console.log('b after second call:', Array.from(b));
   
   console.log('a === b (same reference)?    ', a === b);
   console.log('a equals b (same contents)?  ', a.every((v, i) => v === b[i]));

output:

   a after first call:  [
     101, 193, 125,  19, 142,
     136, 181, 140, 209, 224,
     176, 153, 179, 248, 246,
     166
   ]
   a after second call: [
       4,  29, 48, 215, 162,  60,
      64,  23, 78, 137,   2, 186,
     230, 249, 70, 224
   ]
   b after second call: [
       4,  29, 48, 215, 162,  60,
      64,  23, 78, 137,   2, 186,
     230, 249, 70, 224
   ]
   a === b (same reference)?     true
   a equals b (same contents)?   true

and aynchronous calls:

   import rng from './rng';
   
   async function getId() {
      const bytes = rng();
      await new Promise(r => setTimeout(r, 0)); // yield to the event loop
      return Array.from(bytes);
   }
   
   const [id1, id2] = await Promise.all([getId(), getId()]);
   console.log('id1:', id1);
   console.log('id2:', id2);
   console.log('identical?', id1.every((v, i) => v === id2[i]));

output:

   id1 captured:  [
      61, 116, 151,  35, 153,
      75, 105,  15,  59, 235,
     162, 215, 224, 115,  31,
     122
   ]
   id2 captured:  [
      13,  3,  84,  28, 22, 176,
     160, 70,  67, 246,  1,  37,
      38, 61, 171,  23
   ]
   id1 after await: [
      13,  3,  84,  28, 22, 176,
     160, 70,  67, 246,  1,  37,
      38, 61, 171,  23
   ]
   id2 after await: [
      13,  3,  84,  28, 22, 176,
     160, 70,  67, 246,  1,  37,
      38, 61, 171,  23
   ]
   ---
   final id1: [
      13,  3,  84,  28, 22, 176,
     160, 70,  67, 246,  1,  37,
      38, 61, 171,  23
   ]
   final id2: [
      13,  3,  84,  28, 22, 176,
     160, 70,  67, 246,  1,  37,
      38, 61, 171,  23
   ]
   identical? true

by jbverschoor

5/8/2026 at 6:57:52 PM

Shouldn't your test follow the pattern of how rng() is actually being used in the uuid.ts code internally?

Your test is more-or-less contrived to fail given the tradeoff to avoid repeated memory allocations but that doesn't say much about the actual usage in uuid generation since it's not exported for general purpose use.

Presumably they had some hot path somewhere where rng() is called in a loop and this optimization made sense with awareness that it could be misused as in your example breaking the contract ensuring randomness, which (hopefully) they're not actually doing anywhere.

Unless I'm missing something replacing the package over this with a less vetted implementation seems excessive and possibly even counterproductive.

by toraway

5/8/2026 at 8:56:28 PM

I don't believe so. Sure it's not an issue after some checks, but it's very easy to shoot yourself in the foot like that. I get the micro-optimization for the allocation.. But it's not clear / documented. At the minimum, the function should be renamed to reflect the inner workings.

The function is a module, and it doesn't do what you'd expect.

by jbverschoor

5/8/2026 at 5:12:05 PM

https://github.com/uuidjs/uuid/blob/e1f42a354593093ba0479f0b...

became

https://github.com/uuidjs/uuid/blob/f2c235f93059325fa43e1106...

Welp.. time to patch and update everything again. Another day, another npm-package headache. Very odd()

Attack vector: call the rng(), and send the result somewhere. You now have now overwritten someone elses "random number" and know about it. The fun things you can do with those numbers!

by jbverschoor

5/8/2026 at 5:27:58 PM

Seems to be "safe" because of it's not exported, and the results get used in a different way. Still is a bug in my book.

by jbverschoor

5/8/2026 at 10:34:34 AM

It's not happening by chance, there is a bug somewhere.

From what I skimmed the package should just call to the js runtime's crypto.randomUUID(). I think it should always be properly seeded.

I think it is extremely unlikely that the runtime has a bug here, but who knows? What js runtime do you use?

by leni536

5/8/2026 at 2:28:54 PM

Gotta be a seeding issue. If it's not, and you can prove it, you're about to be a little famous probably :P

by merlindru

5/8/2026 at 9:48:19 AM

Poorly seeded prng.

by tumdum_

5/8/2026 at 10:17:10 AM

most likely the culprit indeed

by jdthedisciple

5/8/2026 at 10:29:44 AM

But I used nonstandard nonces!

by nswango

5/8/2026 at 8:05:40 AM

1 in 4.72 × 10²⁸

1 in 47.3 octillion.

i'd be suspecting a race condition or some other naive mistake, otherwise id be stocking up on lottery tickets.

(lol at the other user posting at the same time about the lottery ticket.. great minds and all that.)

by serf

5/8/2026 at 11:18:35 AM

I've always looked at it the the other way - being that lucky would mean you have even less chance of something else lucky happening, good time to save your money

by petee

5/8/2026 at 12:54:06 PM

The lottery ticket part makes no sense. Statistically if such an improbable event just happened to him, then chance of it happening again should be even more improbable.

by k4rli

5/8/2026 at 4:40:57 PM

This is probably (ha) a troll thread, but in case anyone here is among today's lucky (ha) 10,000, https://en.wikipedia.org/wiki/Independence_(probability_theo...

by sowbug

5/8/2026 at 4:40:39 PM

The chance of him winning the lottery is identical to before, however the reward if he wins is slightly greater.

He would win the lottery money + he gets to tell people who don’t understand independence this incredible story!

by jaccola

5/8/2026 at 5:01:20 PM

No, the events are independent. If you have a UUID collide, your chance of winning the lottery if you enter it is exactly the same as it was before the UUID collision.

by angoragoats

5/8/2026 at 6:13:57 PM

> If you have a UUID collide, your chance of winning the lottery is exactly the same as it was before the UUID collision.

True, but only if you were already going to play the lottery anyway.

If you don't normally play the lottery and the UUID collision combined with superstition is what enticed you to play, then the UUID collision will have raised your chances of winning the lottery from 0% to slightly higher than 0%.

by georgemcbay

5/8/2026 at 6:33:03 PM

Colloquially, when I say "your chance of winning the lottery" what I mean is "your chance of winning the lottery given that you enter." And I think you probably know this. But I've updated my post to be clear.

by angoragoats

5/8/2026 at 8:57:58 AM

Please, do not use b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd, I checked my database and I was using it already.

by jordiburgos

5/8/2026 at 11:35:36 AM

I always thought generating UUIDs at random was insane. I now only use LLMs. The prompt is: "generate a UUID. Make sure no one ever used it anywhere in their code or database. Check your work and think hard about each step. Do not output any reasoning or plain English, only th UUID itself".

You're welcome.

by rich_sasha

5/8/2026 at 2:31:43 PM

Actually asking ChatGPT this query led it giving me this UUID "550e8400-e29b-41d4-a716-446655440000" which happens to be a very common example UUID

by mh2753

5/8/2026 at 5:06:38 PM

The LLM is mechanistically unable to pick something actually random and outside of its training distribution, so... yep.

by wolttam

5/8/2026 at 5:56:39 PM

If you ask it to construct a UUID character by character you should get a somewhat random one, just because of temperature.

by antonvs

5/9/2026 at 12:19:13 AM

This actually worked well when I asked Gemini to generate a random color, character by character. I was getting Indigo/Electric Indigo a lot if I just asked for a random color on new sessions.

by pwython

5/8/2026 at 6:51:06 PM

But all LLM output is token by token, which isn't too far from character by character in the case of a UUID. Why is this different? I do not know.

by recursive

5/8/2026 at 7:50:35 PM

Actually, asking this multiple times to ChatGPT gives me different UUIDs every time, and it checked with a web search that they are not found in public data.

by smokel

5/9/2026 at 4:05:09 PM

That's because of tokens (and temperature). You could piece back the tokens to parts of existing tokens in public data. And given enough iterations, GPT will probably start showing noticeable patterns (since it's not actually random).

by zimmund

5/8/2026 at 9:22:05 AM

I knew it, we're all getting the same cheap UUIDs and the good ones are reserved for the big dogs.

by mittermayr

5/8/2026 at 9:39:35 AM

uuid.uuidv4() recently switched to "adaptive entropy" instead of "xmax entropy" in an effort to save costs on non-premium users.

by Galanwe

5/8/2026 at 5:57:30 PM

You mean you’re not already entropymaxxing? n00b

by antonvs

5/8/2026 at 9:42:21 AM

I'm using 16b55183-1697-496e-bc8a-854eb9aae0f3 and probably some more too. I suppose if we all post our list here, then we can all check for duplicates?

by robshep

5/8/2026 at 9:51:57 AM

You can check https://everyuuid.com/ for collisions.

by jsnell

5/8/2026 at 9:47:38 AM

We should all send our already-generated UUIDs to a shared database, we could just put it on Supabase with a shared username/password posted on HN, so we can all ensure that after generating a UUIDv4 locally, it's not used by anyone else. If it's in the database, we know it's taken.

It's a super simple mechanism, check in common worldwide UUID database, if not in there, you can use it. Perhaps if we use a START TRANSACTION, we could ensure it's not taken as we insert. But that's all easy, I'll ask Claude to wire it up, no problem.

by mittermayr

5/8/2026 at 10:24:30 AM

But then I will claim I have already used all the UUIDs in my spreadsheets, and my lawyer will send cease&desist letters to every database.

by broken-kebab

5/8/2026 at 9:48:20 AM

A site previously posted here could be useful: https://everyuuid.com/

by volemo

5/8/2026 at 10:14:28 PM

Full list here: https://everyuuid.com/

by sedatk

5/8/2026 at 11:01:56 AM

That UUID should have my name sticker on it. Don't your UUIDs have name stickers?

by classified

5/8/2026 at 6:24:37 PM

It's much more likely that you hit an "impossible bug" due to a bit flip somewhere.

Imagine the database having the old UUID in a memory buffer due to a recent index scan, and a bit flip happened somewhere in the logic which basically copied the old UUID into the memory location of the new UUID, or some buffer addresses got swapped, or the operation which allocated the new UUID received a memory buffer containing the old one, and due to a bit flip the memcpy operation was skipped, or something along that line.

Facebook wrote extensively about this, stuff like "if (false) {do_x(); )" and do_x being called. For example their critical RocksDB kv store has extensive redundant protections to defend against such "impossible bugs".

by dist-epoch

5/8/2026 at 8:05:02 PM

Multiple times have I blamed compilers, cosmic rays, quantum effects, or at the very least an obscure kernel bug, before realizing that I was the source of a bug.

A collision at 15,000 records is so unlikely that I would first suspect something else. Duplicate processing, replayed requests, reused objects, misleading logs, or another code path reusing the identifier.

Could you share a bit more of the surrounding code so we can check?

by smokel

5/8/2026 at 8:55:13 PM

I wrote about real world collisions, including that particular library last year (https://alexsci.com/blog/uuid-oops/).

There are a bunch of constraints that must be strictly held for UUIDs to be collision resistant, I'd guess there is a problem with your random number generator.

by 8organicbits

5/9/2026 at 8:07:15 AM

All the comments I've been able to read are missing the elephant in the room: no high-quality entropy source can turn a "should" into a "must".

If you want something that is difficult to guess, ask the cryptography guys. But if you need something that is -_guaranteed_ unique, you must build it yourself.

by pif

5/8/2026 at 10:08:18 AM

Buy some lava lamps

by glaslong

5/8/2026 at 10:11:53 PM

> Duplicate UUIDs (Googlebot)

> This module may generate duplicate UUIDs when run in clients with deterministic random number generators, such as Googlebot crawlers. This can cause problems for apps that expect client-generated UUIDs to always be unique. Developers should be prepared for this and have a strategy for dealing with possible collisions, such as:

> - Check for duplicate UUIDs, fail gracefully

> - Disable write operations for Googlebot clients

https://github.com/uuidjs/uuid/commit/91805f665c38b691ac2cbd...

by sedatk

5/12/2026 at 4:15:57 PM

Would love feedback on my HTML-to-PDF API — pdfkitt.dev

by jabeer

5/9/2026 at 4:35:59 PM

This didn't happen to me... yet. Here's what I found 2 days ago deep in a production PHP codebase:

  private static function createUUID(){
   $md5 = md5(uniqid('', true));
   return substr($md5, 0, 8 ) . '-' .
    substr($md5, 8, 4) . '-' .
    substr($md5, 12, 4) . '-' .
    substr($md5, 16, 4) . '-' .
    substr($md5, 20, 12);
  }

Holy cow, how didn't this horror come and bite us in the juicy parts ? I don't know.

by wazoox

5/8/2026 at 4:32:59 PM

the vm you're running on virtualized all the entropy away.

by baq

5/8/2026 at 5:32:40 PM

This seems very likely to be the case.

Something tangentially cool which is related: https://eu.mouser.com/new/leetronics/leetronics-infinite-noi...

by Imustaskforhelp

5/9/2026 at 11:39:18 AM

The rule of thumb is simple:

Consider if your ID can contain a timestamp besides a random value. The answer is usually yes. UUIDv7 is fine.

If you've spend the time to really work through the whole problem and have written down a proof how that leads to unacceptable info leak: Congratulations your system is complex and slow enough that you might as well take a strong cryptographic hash or UUIDv5 if you're lazy.

by athrowaway3z

5/8/2026 at 5:00:16 PM

> I thought this is technically impossible

No, very technically possible... though, with good randomness, very, very unlikely.

But nothing technically prevents a UUIDv4 from generating a duplicate value.

by sbuttgereit

5/8/2026 at 6:00:58 PM

Ultimately it comes down to your entropy source. I always generate and insert in a loop for this reason, if there is a collision, I therefore handle that gracefully.

by nu11ptr

5/8/2026 at 12:52:30 PM

This is why I prefer to use a random base32 string over UUID. At least you get a proper 128 bit entropy instead of just a 122 bit entropy as with UUIDv4. That's a 64x difference in collision probability. I always thought UUIDs were a toy, not for serious use. If you control the strings, you can even make a longer ID.

Also, numerous applications that use a unique ID per record frequently need to check for ID collisions. I know I do for a short URL generator.

by OutOfHere

5/8/2026 at 8:30:48 AM

Just a stupid question, but why not append the date, even in seconds as hex. It's just a few bytes and would guarantee that everything OK now will be OK in the future?

by beardyw

5/8/2026 at 9:03:41 AM

You can just use a different UUID variant which includes timestamp data instead (e.g. v1 or v7), there are also variants which include the MAC address.

by flohofwoe

5/8/2026 at 4:47:11 PM

Might as well just use uuidv7

by itsyonas

5/8/2026 at 8:42:33 PM

But since the randomness is obviously borked, it was much better to use v4 and find out about it after just 15K records instead of X million records later.

by ASalazarMX

5/8/2026 at 8:35:53 AM

yeah, any sort of additional semi-random data could've helped prevent this, I'm sure. That, however, is also kind of the idea of UUIDv4, it has lots of randomness and time built in already.

by mittermayr

5/8/2026 at 9:05:03 AM

UUID v4 consists of only random bits, no timestamp info.

by flohofwoe

5/8/2026 at 10:48:11 PM

Wrong, they have 122 random bits out of 128. The other six bits are to say “hello I am a UUIDv4”.

by Lammy

5/8/2026 at 9:22:57 AM

oh, interesting, I didn't know that and this could possibly be part of the problem perhaps depending on what's used as the seed.

by mittermayr

5/8/2026 at 12:12:53 PM

But surely hashing the date still allows for a future collision. Leaving the date as is means it will never collide after that one second has passed.

by beardyw

5/8/2026 at 7:11:17 PM

You could do that, but now you're like 90% of the way to maintaining a monotonically increasing number you that could just use as a unique ID instead without any randomness required (and without the additional 128 bits for collision protection via the appended UUID).

So your ID would take like 64 bits for the time unique to the nanosecond plus 128 bits for the UUIDv4 = 192 bits which is a pretty beefy sized ID.

(I know you said just append a second count but you will want a predictable/fixed size for your data structure in pretty much any use case so need to decide the upper bound and precision ahead of time)

Especially when the alternative is a 128 bit UUIDv4 that's guaranteed unique with proper usage of high quality RNG or a 128 bit UUIDv7 if you have a clock (that's needed for your method anyway) that will be much more forgiving of a flaky source of randomness and more sortable than your monotonic-ish ID for 1/3 fewer bits.

Basically, stapling anything onto a UUID is a waste of space if you don't trust it, so might as well drop it completely and use a significantly smaller source of randomness at that point.

by toraway

5/8/2026 at 4:38:44 PM

UUID 7 does not hash the date. It uses 48 bits to store a millisecond resolution timestamp. This allows you to sort uuids by time.

by kayodelycaon

5/8/2026 at 9:39:16 AM

> but why not append the date

And use uuid v5 to hash it :)

by pan69

5/9/2026 at 4:52:10 AM

I had dup uuids causing soak test failures in a Linux based distributed system. After long investigation it turned out there was a kernel bug (race condition) that meant two processes on MP system reading from /dev/random at the same could (very rarely, like 1 in a million) get the same bytes when reading the device.

I'd look at rng initialisation first.

by xyzzy123

5/9/2026 at 10:52:03 AM

All the probability mathematics aside, the real world we live in is probably a lot less random even with the best hardware random number generators.

I've moved on to something like TSID(where security isn't a factor) or uuidv7 to make sure this never really occurs in practice rather than over engineering the code with retries.

by evnix

5/8/2026 at 10:40:00 AM

> I thought this is technically impossible

Actually it's not impossible, but very very improbable.

P.S. You should play a lottery/powerball ticket

P.P.S. Whenever I use the word improbable, the https://hitchhikers.fandom.com/wiki/Infinite_Improbability_D... comes in mind

by NKosmatos

5/8/2026 at 2:15:51 PM

> P.S. You should play a lottery/powerball ticket

Actually, they should not. That collision and winning the lottery would be even rarer.

by sebazzz

5/8/2026 at 5:02:02 PM

Assuming they are independent events, OP is not more nor less likely to win the lottery now that before running in the collision. I actually have more question if you claim the events in question are NOT independent!

by lgeorget

5/8/2026 at 12:48:52 PM

Inconceivable!

by rithdmc

5/8/2026 at 8:39:11 AM

Would the UUID v7 be more collision proof? Hard to say because it takes time into account but then the number of entropy bits are reduced hence the UUID generated exactly at the same time have more chance of a collusion because number of entropy bits are a much smaller space hence could result in collusions more easily.

Thoughts?

by wg0

5/8/2026 at 7:10:11 PM

UUID v7 relies on knowing what time it is.

Speculation: The most likely scenario for a UUID v7 collision is if UUIDs are generated during a system boot sequence, before the system clock is set to the current time. It's always 1970 somewhere. There are still 62 random bits, and optionally another 12 random bits, but those too could be problematic if the system hasn't generated enough entropy yet.

by _kst_

5/8/2026 at 9:35:39 AM

You open up every millisecond a new block. Should be even more unlikely

by AntiUSAbah

5/8/2026 at 5:49:58 PM

Or there is some other explanation, eg. somebody messed with the request manually, or with the db.

by mdavid626

5/8/2026 at 4:56:05 PM

This is first time I have experienced some vindication that choosing CUID2[1] for one of my projects was actually a good idea.

1. https://github.com/paralleldrive/cuid2

by sudb

5/8/2026 at 11:04:42 PM

I lost all confidence in the infallability of software RNG when I was working on an assignment for Data Structures a million years ago (2000?). The assignment was simple: simulate a 2D random walk where you randomly go NSEW, and run 100 cases, collecting stats as to how long it takes to return to the origin.

Super easy assignment, wrote it up probably in C++ (maybe just C?), and ran it on my linux box (probably Debian potato). It finished super quick and gave me an average of like 5.6 steps to return to the origin or something. Cool!

I copied it over to my account on the department's HP-UX machines where I was supposed to run and submit it to my instructor. Compiled fine. And then it... just ran forever. I was doing rand() % 4 or something, and the HP-SUX RNG had crazy bias in its last 2 bits, and it just walked away forever, never returning to the origin. Well crap!

Got an A for my writeup, though!

by QuercusMax

5/9/2026 at 4:47:37 AM

Is the uuid generated in the frontend or backend? If frontend, I’d wager the likeliest explanation is that the client code or request was messed with to inject a previously known uuid rather than an entropy issue.

by 0xfffafaCrash

5/9/2026 at 4:11:06 PM

I keep telling the dev teams I'm on that with enough data points, all those random numbers will (probably) eventually collide, and *then* we'll see how robust their software really is. At least your database flagged it, and hopefully nothing major exploded.

And yet, plenty of experienced devs, including team leads and CIOs, are convinced it's impossible. As in, they absolutely don't write code to deal with the condition. So a bad RNG can randomly destroy the system far sooner than expected at any time, and it won't be noticed, caught, re-genned, or anything, with concurrent corruption being entirely possible. They're fine with it. I feel like these are the same guys who don't check to see if malloc() succeeds.

I like to ask them, "If it's impossible, you're using too many bits, right?". I haven't talked any of them into hedging with a brownian motion detector, or a lava lamp or something for better randomness yet, but I'm still trying.

by erlkonig

5/8/2026 at 6:38:18 PM

A check inside the generator function is the best way I've found to avoid this. Wrap uuid or whatever random generator with a check against an ID cache. If it already exists, just run the generator recursively.

by rglover

5/8/2026 at 8:56:08 AM

The chance of a UUIDv4 collision is very low, but it is never zero.

If everything is done properly, then this is very likely the one and only time anyone involved in the telling or reading of this account will ever experience this.

by naikrovek

5/8/2026 at 9:12:48 AM

Classic gamblers fallacy!

by dalmo3

5/8/2026 at 4:47:56 PM

Ironically one of the few comments in this thread that isn’t necessarily the gamblers fallacy!

The chance anyone involved saw or heard about the first one was near zero, now they’ve seen this one the chance they see another is still near zero (I.e unchanged).

by jaccola

5/9/2026 at 6:30:13 AM

One of the most dangerous words in engineering is “statisticaly impossible” At enough scale edge cases stop to be theoretical and start become production events.

by latentframe

5/9/2026 at 5:30:25 PM

Why is nobody asking why the database wasn’t configured to enforce uniqueness on this column? This should have been a log discovery.

by timbritt

5/8/2026 at 10:38:28 AM

Reminds me of some code I saw running in production. Every time we added a new entry, we were pulling all the UUIDs from this table, generating a new UUID, and checking for collisions up to 10 times.

by not_math

5/8/2026 at 9:23:38 PM

You forgot to use https://www.random.org/ as your source of randomness :)

by zie

5/8/2026 at 6:02:55 PM

> I thought this is technically impossible, and it will never happen,

In an eternal universe, even the most unlikely of events will happen an infinite number of times.

by nozzlegear

5/8/2026 at 5:41:13 PM

Meta, but if I had a question like this, I'd likely have asked on Twitter or Reddit first. I'll keep in mind using HN as an alternative Q&A site.

by sqquima

5/8/2026 at 8:26:15 PM

Were the chances than an npm package is crap factored in?

by coldtea

5/9/2026 at 3:37:11 AM

Glad to be reading the comments here because I also had this happen to me once and thought I must have been going insane.

by radial_symmetry

5/8/2026 at 5:42:50 PM

Always let your db generate uuids. On postgres this is easy since v18 it supports uuid v7!

There is no need to set uuids through javascript or node imo

by danfritz

5/8/2026 at 5:50:44 PM

There's plenty of reasons to set a unique identifier before database save, or to want a unique identifier that doesn't have a 1-to-1 relationship with your object.

For example, in the idempotent kafka consumer pattern we set a unique ID in the header of every kafka message at the time of message publishing. We then have our consumers do a quick check of the ID against their data store to see if they have processed the message before or not. This way there is no impact if a consumer sees the same message twice. This allows us more flexibility during rebalancing events or replaying old offsets.

by hx8

5/9/2026 at 12:33:02 AM

Not every application uses a DB you know, there are other reasons to use a UUID

by Cantthink1029

5/11/2026 at 1:36:51 PM

const blacklisted = ['b6133fd6-70fe-4fe3-bed6-8ca8fc9386cd'];

by swah

5/8/2026 at 11:03:16 PM

This is like one of the hardest things for people to understand. Even the best randomness guarantees fuck all. Entropy-based IDs are collision-resistant not collision-proof.

by BugsJustFindMe

5/8/2026 at 5:29:23 PM

Fun thing about random is that these things happen. UUIDv7 is less prone to this as it includes both a time component and random. I’ve been using ULID in a few project which has similar attributes to uuidv7 but more space efficient.

by shortercode

5/9/2026 at 3:40:46 PM

What’s the prefered uuid these days?

by pyuser583

5/8/2026 at 11:02:35 AM

Although incredibly rare, it's not impossible so probably best to just plan for collisions. A simply retry should suffice. But I agree I feel like something is going on somewhere else ...

by lyfeninja

5/9/2026 at 4:31:20 PM

pg 18 supports uuidv7 natively and your default should be unique and uuid7()

by darqis

5/8/2026 at 10:26:03 AM

Why not to have timestamp-uuid instead ?

by AndreyK1984

5/8/2026 at 10:37:57 AM

How confident are you that your machines clocks are in perfect sync? What about the risk of clock drift + correction, or hardware issues?

by dgellow

5/8/2026 at 12:04:59 PM

Not GP, but: not confident. How confident would I be to avoid a (slightly lower entropy) UUID collision while also avoiding a clock desync landing on the exact same logged millisecond? Very, which is how confident I was about not encountering an UUID collision before this thread, so very++ I guess.

by croon

5/8/2026 at 8:23:12 PM

I get why sync of mutiple machines matters for ordering and causality, but why is it a problem for uniqueness?

by kdps

5/9/2026 at 11:28:39 AM

I just read UUIDv7, something I wasn't familiar with until now. That does address everything I had in mind yes. I assumed timestamp-uuid meant purely based on the timestamp precision, in which case you can get conflicts if you don't have a reliable clock across your system.

But UUIDv7 looks pretty awesome:

> What is a Version 7 UUID made from?

> unix_ts_ms: 019e0c7bfe93

> ver: 7

> rand_a: 0b6

> var: b

> rand_b: 55a99e023294673

So please ignore my previous comment :)

by dgellow

5/9/2026 at 2:25:32 PM

That's insane..

by byteflow

5/8/2026 at 7:36:26 PM

> technically impossible

Not at all! Just very unlikely. It's about odds and statistics. Not physics.

by nhumrich

5/8/2026 at 8:43:58 PM

This undersells the word unlikely. It is very, very, very, very unlikely.

by ASalazarMX

5/8/2026 at 10:30:45 AM

Buy a lottery ticket

by ares623

5/8/2026 at 7:22:14 PM

just uuidv5

by zuzululu

5/8/2026 at 7:49:43 PM

Almost all pseudo-random number generators are absolute garbage. They need you believe they work because the NSA needs backdoors and to foolproof ransomware attacks. This isn't surprising at all to me.

by kittikitti

5/12/2026 at 11:20:07 AM

[dead]

by pedro_ascenso

5/8/2026 at 6:40:22 PM

[flagged]

by dividendflow

5/8/2026 at 8:25:31 AM

[dead]

by ESAM_C

5/8/2026 at 8:04:06 AM

[flagged]

by samdhar

5/8/2026 at 10:09:21 AM

Statistically speaking, does extremely unlikely mean impossible? If it were replicable I'd raise my eyebrow, otherwise it's fair game, no?

As someone that enjoys the unterminable complaints about RNG in the video game scene, I would never trust any human's rationalization of random outcomes.

by uncircle

5/8/2026 at 10:15:19 AM

> Statistically speaking, does extremely unlikely mean impossible?

No, it means extremely unlikely. Collisions can occur, as op just found out, but the chances are so abysmally small that most people don't care.

Any application I have worked on, I always had a pre-save check to see if the UUID was already present and generate a new one if it was. Don't think it ever triggered unless a bug was introduced somewhere but good practice anyway.

by mschild

5/8/2026 at 10:11:51 AM

You are replying to an AI bot

by nubg

5/8/2026 at 10:19:31 AM

Would be cool to have a plugin that shows % of bot per user, based on their history of comments.

by harperlee

5/8/2026 at 10:18:37 AM

There could be a problem with the way the system generates entropy for randomness.

by ashleyn

5/8/2026 at 10:11:14 AM

Question to fellow HNers, do you recognize that this comment was written by AI?

by nubg

5/8/2026 at 10:17:49 AM

No, to be honest. However, as soon as it was pointed out, I checked again and it made sense.

In my opinion, these kind of intuitions have to grow over time. And every time it’s pointed out, you learn. So please, keep pointing it out :).

by prakka

5/8/2026 at 10:16:03 AM

I guess not, and I feel dirty now. I'm logging off for the day.

by uncircle

5/8/2026 at 10:23:47 AM

I did not. Post-conditioning by your comment and the other one,I can see some signs such attempting to be unusually comprehensive. The 'atoms in your liver' could be an awkward human trying to be poetic about scales.

I still don't see idiomatic markers of AI so that's scary if your claim is correct.

by tirutiru

5/8/2026 at 10:46:22 AM

Interesting enough, I skipped it when scrolling through the comments the first time. I think I instinctually do that to most karma whoring comments, no matter if manual or LLM generated.

Only noticed it because I did another pass and saw the replies talking about "AI".

by nottorp

5/8/2026 at 11:13:34 AM

Yes but as a feeling (hunch?) not as something my brain analysed and reached a conclusion.

Weird how I'm already somewhat conditioned to spot it on a intuitive level.

by piva00

5/8/2026 at 10:16:47 AM

Kind of. It reads a bit too much like tech support you'd get when asking one for help.

by mschild

5/8/2026 at 10:31:40 AM

when it started going on about all the different cases in the second bullet point... yeah

by ssenssei

5/8/2026 at 10:36:52 AM

Yes, stupid comparison with atoms in the liver and a bullet list below? I stopped reading.

by speedgoose

5/8/2026 at 6:04:37 PM

This is why it’s stupid to assume a randomly generated ID is unique just because it is random.

by MagicMoonlight

5/8/2026 at 11:02:13 AM

> We're using this: https://www.npmjs.com/package/uuid

Why? There's a built-in for this.

https://nodejs.org/api/crypto.html#cryptorandomuuidoptions

by sublinear

5/8/2026 at 5:10:56 PM

That's what the package uses. And if `crypto.randomUUID()` doesn't exist, it falls back to `crypto.getRandomValues()`, which per the documentation isn't AS strong:

https://developer.mozilla.org/en-US/docs/Web/API/Crypto/getR...

So by using the package you actually lose visibility of cases where `crypto.randomUUID()` would fail.

by OptionOfT

5/8/2026 at 8:40:10 PM

> I thought this is technically impossible, and it will never happen

I always hated this meme/mindset, because if you dig in to the history of them you'll see that their original purpose was to collide. They were labels to identify messages in Apollo's distributed computing architecture. UID and later UUIDs were a reversible way to mark an intersection point between two dimensions.

Any two nodes in a distributed system would generate the same UID/UUID for the same two inputs, and a recipient of an identified message could reverse the identifier back into the original components. They were designed as labels for ephemeral messages so the two dimensions were time and hardware ID (originally Apollo serial number, later 802.3 hwaddress etc).

I think a lot of the confusion can be traced to the very earliest AEGIS implementation where the Apollo engineers started using “canned” (their term, i.e. static or well-known) UIDs to identify filesystems. Over time the popular usage of UUID fully shifted from ephemeral identifiers where duplicates were intentional toward canned identifiers where duplicates were unwanted and the two dimensions were random-and-also-random.

by Lammy