LLMs can unmask pseudonymous users at scale with surprising accuracy

3/4/2026 at 7:17:12 AM

I thought this would be more about stylometry but it's mostly about users literally posting the same identifiable information across multiple services, including in one example their age, dog name, profession.

It's all classic dox profiling techniques. Even the things like spelling differences being regional signals and commonality to specific things being discussed.

It's why one has to think about what is being posted to which community if using different identities, rather than posting the same things across all of them. Though any such effort would be a waste if reliant on some non-public info that later was exposed in a database breach which tied together previously unrelated profiles.

by Springtime

3/4/2026 at 8:43:31 AM

I’m curious if an LLM-based defense for this could be made. Like a browser plugin that warns you if you type identifiable information (like occupation) into a text field, and highlights turns of phrase that are “unusual” enough to be identifiable.

by setopt

3/4/2026 at 2:46:21 PM

or something which just inserts random untrue details about you every now and again, like they do in Alaska, where I live.

by everyday7732

3/4/2026 at 11:36:24 PM

I'm afraid that if you do this, you won't just stand out among regular users, but you'll actually shine for such llm systems.

by user205738

3/4/2026 at 7:32:31 PM

Ah yes I remember seeing you at the Alaskan local underwater basket weavers meetup, you know the one for our profession.

by Neywiny

3/4/2026 at 5:59:28 AM

There was a tool shared here that could show which accounts belong to the same person based on the writing patterns. Can't remember the name, but it found my old accounts on HN pretty accurately.

by firefoxd

3/4/2026 at 11:54:48 AM

https://news.ycombinator.com/item?id=33755016

Way simpler than hnprofile from the sibling comment. This one used cosine similarity between user vocs - https://web.archive.org/web/20221126225241/https://stylometr...

by eps

3/4/2026 at 3:35:33 PM

Oh yes, this is the one I was referring to. Too bad it's shutdown.

by firefoxd

3/4/2026 at 6:12:43 AM

Hnprofile.com which has since closed down - lettergram was the author - https://news.ycombinator.com/item?id=17942981

by ElCapitanMarkla

3/4/2026 at 12:43:11 PM

> “This is a pretty new capability; previous approaches on re-identification generally required structured data, and two datasets with a similar schema that could be linked together.”

Right up there with Skynet, for me, has been the idea of disparate databases all being linked up by bad actors.

It appears as though DOGE illegally obtained taxpayer data from the IRS. I don’t trust DOGE to safeguard anything.

And the penalties do not seem to be very severe outside of HIPPA.

https://democracyforward.org/news/press-releases/new-details...

by xtiansimon

3/4/2026 at 4:28:41 AM

The internet is getting less interesting by the day.

by zppln

3/4/2026 at 5:10:01 AM

The future is offline.

by RiverCrochet

3/4/2026 at 5:28:27 AM

*selfhosted

by senectus1

3/4/2026 at 4:57:42 PM

I mean literally offline.

Computers were quite prevalent and useful before the notion of cloud computing, or the Internet.

Before the public was unleashed upon the Internet in 1992-1994, there were methods of information storage, indexing, searching, and exchange that didn't rely on real-time communications mediums operated by third parties. Example: CD-ROMs looked promising, the early 90's was smack in the middle of the "Multimedia" hey-day and gobs and gobs of data on nearly any subject was available and browseable at your perusal.

Of course it wasn't globally searchable, but there wasn't anything stopping anyone from making a master global index of CD-ROMs, selling it, and perodically updating it. Somebody (multiple somebodies) probably did. Libraries have been doing that for many decades. Replace chat with in-person meetups. Computer clubs were a thing in the 70s and 80s. DVDs still exist. DVD drives are $20 at my local Wal-Mart. SD cards are cheap and massive (1.5TB SD cards are a thing now).

Operating systems didn't always support TCP/IP. It's still something you can just turn off on a few of them.

by RiverCrochet

3/4/2026 at 5:34:07 PM

Pros and cons to this. It would re-centralize media again, stochastically dictating what we talk about. The Epstein case would have been sealed some five or so years ago without the onslaught of publicly visible interest in it.

by newsy-combi

3/4/2026 at 5:33:09 AM

*analog

by burningChrome

3/4/2026 at 2:53:08 PM

Ha ha, been moving to all three lately.

(My hobby experiments have been a small analog computer these past 4 or so months.)

by JKCalhoun

3/4/2026 at 6:32:43 AM

*doomed

by rissotorous

3/4/2026 at 10:03:33 AM

*short

by nehal3m

3/4/2026 at 12:05:14 PM

*lunch

by rickydroll

3/4/2026 at 11:56:47 AM

*timeless

by gregw2

3/4/2026 at 12:54:04 PM

*infinitely nested

by rich_sasha

3/4/2026 at 6:54:34 AM

[dead]

by ohy333ah

3/4/2026 at 7:10:25 AM

Anonymous account unmasking represents a new threat to anonymity. not just this technique with llms, but the earlier text similarity one.

But I think it would be generally easier to counter in the same way.

Use an llm or heuristics to pose as someone else.

not only do you erase your traces, you add false positives in to the system which reduces the overall effectiveness of these techniques in the future. A bit of poisoning the well.

I hope eventually an easy to use tool, with maybe a small local llm, can make it easy enough to do this, so that any future deanonymization attacks would be too untrustworthy to rely on

by kanemcgrath

3/4/2026 at 10:27:21 AM

Like with browser fingerprinting, making it too unique is also an issue.

It may actually be a fine line. You may be flagged as an LLM later if your style is too generic and identified if your style is too unique.

by notTooFarGone

3/4/2026 at 9:20:26 PM

Stylometry is just the most legible version of this. The harder-to-defend surface: posting time patterns, topic clusters, cross-platform phrase matching, interaction graphs. LLMs synthesize weak signals at scale in a way no single analyst could, which makes the threat model fundamentally larger than "change how you write." Most OPSEC advice is written for the pre-LLM world.

by shubhamintech

3/4/2026 at 1:29:58 PM

So tell an LLM what you would like the post to say, and then post the output?

LLM as the sickness and the cure...

by futune

3/4/2026 at 1:34:11 PM

This is the first thing that comes to mind. However I wonder if not only the “general” vocabulary can be anonymized but also the underlying concepts and references, because they point to a particular place too.

by AreShoesFeet000

3/4/2026 at 6:43:24 AM

To state the obvious, we all need person, local tools to warn us when we’re making opsec errors.

by Lio

3/4/2026 at 6:45:14 AM

There's a default Unix tool for that: https://www.man7.org/linux/man-pages/man1/yes.1.html

(Above 99% accuracy)

by littlestymaar

3/4/2026 at 3:29:27 PM

Figured this is going to happen. And it will just get worse.

I can already see palantir as the new man in the middle. Telling services: this guy with the same IP just posted xxx on yyy

by Bombthecat

3/4/2026 at 7:39:35 AM

As a 32 year old Ghanaian woman living in Luang Prabang and studying as an ophthalmologist, this gives me some food for thought!

by petesergeant

3/4/2026 at 2:58:16 PM

My dogs Lacey and Baxter say "Hi!"

by JKCalhoun

3/4/2026 at 2:31:37 PM

So um, can an AI also inject enough noise at the internet for me to make it harder to unmask me?

Should I like, just as Claude Code to come up with this idea this weekend?

by Balgair

3/4/2026 at 6:49:17 AM

[dupe] Discussion on source: https://news.ycombinator.com/item?id=47139716

by ChrisArchitect

3/4/2026 at 5:32:28 AM

> If you request deletion of your Hacker News account, note that we reserve the right to refuse to (i) delete any of the submissions, favorites, or comments you posted on the Hacker News site

Probably not GDPR-compliant then if comments can be deanonymised by LLMs.

by nprateem

3/4/2026 at 5:53:25 AM

My understanding is that the GDPR “right to be forgotten” applies to personal data. Are publicly available comments considered personal data?

by WalterGR

3/4/2026 at 6:48:25 AM

If they can help to deanonymize you, they must contain something personal. Writing pattern are pretty personal, certain spelling errors too, or the choose of words.

by croes

3/4/2026 at 6:57:47 AM

Absolutely anything relating to an anonymous person could help deanonymization, so that implies that anything relating to any person is personal data. Is that the GDPR’s position?

by WalterGR

3/4/2026 at 6:36:21 AM

Well, the username attached to them would surely be.

by Hamuko

3/4/2026 at 6:52:25 AM

From ico.org.uk: “ It is important to note that opinions and inferences are also personal data, maybe special category data, if they directly or indirectly relate to that individual”

From gdpr-info.eu: “ Subjective information such as opinions, judgements or estimates can be personal data.”

So yes. HN is in violation of the GDPR. I had already filed a complaint about this policy at my local GDPR authority.

by moi2388

3/4/2026 at 2:23:17 PM

If you are posting public comments, then these comments are available publicly... like, what did you expect!?

by garaetjjte

3/4/2026 at 3:17:18 PM

Yes they are. The GDPR doesn’t say you can’t post it.

Under article 17 of the GDPR, EU citizens have the right to be forgotten, in which case this data needs to be deleted.

by moi2388

3/4/2026 at 6:33:27 AM

This is probably the worst piece of policy on whole HN. It has a evil feel to it. If HN wasn't so interesting/valueable, this would be the single reason NOT to use it at all.

by lynx97

3/4/2026 at 10:18:41 AM

Why take away people's choice to use a forum with permanent comments? I know my comments will be here forever, but so will other people's comments. That's what makes HN valuable.

The alternative is what you see on reddit. A lot of threads from the past have posts deleted or overwritten with some script. You now have to dig through archive sites to find the comments, and you usually do find them.

I participate in Signal chats with self-destructing messages, too. But I post different things here and on Signal, under different usernames. Heck, after a few weeks I'll make another account here, anyway.

Even if you somehow deanonymize me, it's a risk I willingly took when I started posting.

Finally, if you go after HN for deleting comments, will you go after the many archive sites?

by diacritical

3/4/2026 at 7:39:07 AM

All these comments live forever in HN datasets that people download anyway

by WithinReason

3/4/2026 at 4:37:26 AM

Only if said users happen to commit OPSEC failures themselves. LLMs aren't magic...

If someone can figure out who I am or what city I live in just by this username or my comments (with proof), I'll personally send you 500,000 JPY. I'm quite confident that's not going to happen though.

The paper referenced in the article does not even explain their exact testing methodology (such as the tools or exact prompts used) because they claim it would be misused for evil. In other words, "trust me bro."

Also see the previous discussion here: https://news.ycombinator.com/item?id=47139716

by ranger_danger

3/4/2026 at 7:15:24 AM

Anyone who says that they can maintain perfect opsec over an extended period of time is seriously mistaken. A sufficiently motivated investigator with enough resources will join the dots eventually. The would-be evader has to be lucky every time whereas the investigator only has to be lucky once.

by seanhunter

3/4/2026 at 5:59:33 AM

You are American, although you've discussed Ryanair before, which isn't exactly American. You have a number of comments and posts about Japan, which is strange, although you do drive a Japanese car.

by iso-logi

3/4/2026 at 6:09:12 AM

A JDM car, probably, to be precise. I think they lived in Japan for at least a little while, e.g.: https://news.ycombinator.com/item?id=44679406#44686142

by daemonologist

3/4/2026 at 4:45:30 AM

You live on Earth. Now that I won let’s go double or nothing. I bet I can guess where you got dem shoes at.

by onionisafruit

3/4/2026 at 6:04:19 AM

He got them on his feet? He got them on the street?

by linkjuice4all

3/4/2026 at 5:11:53 AM

I'm pretty sure they can use the meta data the pull from your various interactions with search and the text you post online. These services build fingerprints of your habits using these techniques to follow you everywhere. At some point in the chain they could easily connect this fingerprint to your identity as soon as you log into and account that contains a piece of identifying information about you. The threat is real. I can foresee someone programming a terminal or app that obfuscates online behavior to avoid this fingerprinting in the future.

Unless I am misreading something. Take a look at surveillance capitalism to see what's possible right now. It's going to be 100x worse as LLMs become more advanced.

It's not the things you post online, it's the nuances behind the way you type and other ways to determine behavior that allows them to be able to build these kinds of profiles.

by trinsic2

3/4/2026 at 5:21:40 AM

Who is they? Which services?

From what I can tell, the article/paper in question does not appear to utilize any of the techniques you mention, but I'd be interested to learn more about it.

> it's the nuances behind the way you type

I found this paper which talks about some of those methods.

https://www.audiolabs-erlangen.de/content/04_fraunhofer/assi...

For example the "Text" section on page 91.

by ranger_danger

3/4/2026 at 6:00:48 AM

The big companies that sell prediction products to advertisers. Google, Amazon, Microsoft, Apple, Meta... all of them are involved. I didn't read the paper but this is a known method they have been using for awhile to track people across sites of many types from social networking to online shopping sites

by trinsic2

3/4/2026 at 3:54:02 PM

[dead]

by JPY_PLS

3/4/2026 at 4:43:30 AM

With low precision, you're in Japan. But I don't need the JPY. of course that could be obfuscation.

by ggm

3/4/2026 at 4:45:01 AM

The currency is not related to my location, I picked a random one, but thanks anyway :)

by ranger_danger

3/4/2026 at 5:34:51 AM

They said low precision. That might mean Spain, the US, etc

by nprateem

3/4/2026 at 5:36:15 AM

They refer to JP and language often enough in their search history and they state they are an american, and on 5G internet. I think working beyond this is doxxing. They could be anywhere.

by ggm

3/4/2026 at 5:07:12 AM

Someone took the bait

by huddert

3/4/2026 at 5:14:34 AM

What does 'of course that could be obfuscation' mean to you? because it doesn't mean 'took the bait' to me.

by ggm

3/4/2026 at 6:34:35 AM

[flagged]

by huddert

3/4/2026 at 6:26:11 AM

Everyone commits opsec failures eventually. With LLMs linking anonymous accounts it just makes it even more likely to be caught.

by comrh

3/4/2026 at 6:40:06 AM

You are ranger_danger

by big-chungus4

3/4/2026 at 6:02:25 AM

I skimmed some of your comments, You seem to be in the US, at least mid30s, you bought a .dev domain and run your own email? I would think those are possible leads. You really don't think you slipped up once or twice in 5 years of posting? I think an llm would go through all your posts and context of the posts to get. and that would be easier to check if you used any other social media with the same name and see if the accounts have similarities.

by tayo42

3/4/2026 at 6:00:53 AM

40 year old software dev in Detroit Michigan?

Not that I care, and that could be wildly off, but opsec is a wide term… and Claude one shot that… so safe out there bro, AI is wild

by Footprint0521

3/4/2026 at 6:25:48 AM

I think Claude is guessing (educatedly - northern midwest does seem plausible). There's probably enough for the feds to track them down, but not me or an LLM.

by daemonologist

3/4/2026 at 1:20:26 PM

[dead]

by JPY_PLS

3/5/2026 at 4:14:51 AM

[dead]

by JPY_plsss

3/4/2026 at 6:11:30 AM

One solution is to flood the network with LLM slop and hide among the noise.

by bitbasher

3/4/2026 at 6:31:25 AM

slop-steganography is that a name || a verb ?

by signa11

3/4/2026 at 5:06:54 AM

[flagged]

by akssassin907