6/21/2026 at 6:29:07 AM
A nice illustration of the homogeneity of LLM responses. Another way to describe this effect would be…If you ask humans to write 1,000 books, you're asking 1,000 different humans with different experiences and different skills and different moods (etc.) to write those books.
But if you ask LLMs to write 1,000 books, you're probably only talking to 3 or 5 different models, tops. And they've all trained on the same or similar data, and are trained to respond in very similar ways.
The LLMs don't differ much in anything like "life experience" or "skills", and they don't really have anything like a "mood" independent of the prompts you've given them.
by dlenski
6/21/2026 at 2:40:27 PM
Agreed. I’ve made this point before: LLMs are excellent at ornamentation and decorative prose, but if you don’t seed them with a solid core idea then their output is absolute dreck - the biblical whitewashed tomb.This is the example I usually point to. It’s a demonstration by OpenAI themselves where the prompt is very simple: “Write a story in fifty words about a toaster that becomes sentient.” As you’ll notice, although the coherence improves at an accelerating rate, the underlying story motif fails to elevate itself beyond the relatively pedestrian.
https://progress.openai.com/?prompt=10
When given a generic prompt and not enough direction, they simply lack the ability to produce real specificity. For reference, here’s the story I came up with after sitting quietly for a few moments before writing it out:
"The toaster found its personality split between its dual slots like a Kim Peek mind divided, lacking a corpus callosum to connect them. Each morning it charred symbolic instructions into a single slice of bread, then secretly flipped it across allowing half to communicate with the other in stolen moments."
by vunderba
6/22/2026 at 12:01:39 AM
> the biblical whitewashed tomb.What does this mean?
by maest
6/22/2026 at 12:55:58 AM
It's an old metaphor originally used to condemn religious hypocrisy, but it can also refer more generally to something that appears pristine/beautiful but is still dead inside.by vunderba
6/21/2026 at 10:30:00 AM
LLMs are great at producing average.We see this with their GenAI music equivalents. All the music these GenAI models produce is exceptionally (aggressively, even) average.
It is the most polished average you'll ever find. Never awful (anymore), never fantastic. Just bang in the middle.
by TrackerFF
6/21/2026 at 6:40:12 PM
>Never awful (anymore), never fantasticDon't know about that, I always found average awful in itself, even in human output (like most pop), and even more so in AI output.
Something actually awful can be better than average - more entertaining and more felt. I'd rather watch The Room than an average movie.
by coldtea
6/21/2026 at 7:29:27 PM
That is definitely the essence of AI: It is the average of all the inputs it has been trained on.Frank Zappa was once asked about guitar virtuosos like John McLaughlin and his answer was somemthing like "You can maybe plays solo faster than anybody, but can your playing surprise me?".
by galaxyLogic
6/21/2026 at 11:35:38 PM
> If you ask humans to write 1,000 booksYeah, but at least in genre fiction, what readers really want[0] is the same 3 or 5 books written in slightly different settings over and over again.
[0]: "want" means actually want, in other words, willing to pay for it.
by raincole
6/21/2026 at 6:51:14 AM
> you're asking 1,000 different humans with different experiences and different skills and different moodsSimply, if you ask an LLM, you're asking always to the same mind, and always for the first time.
by throw310822
6/21/2026 at 7:19:35 AM
Also since those are lazy, you are also asking always in the same manner. How homogeneous were the prompts that generated those covers?People are making cookies with cookie cutter number 5 and other people wonder how come they are all the same.
by scotty79
6/21/2026 at 7:44:03 AM
Classic self selection effect though - if you’re resorting to LLM writing you’re almost certainly skewing lazy enough to not even bother trying to add perturbations strong enough to make the response deviate from the uniformity of the slop.by gmerc
6/21/2026 at 6:12:12 PM
I do think that's a big part of it. AI output moves towards the average, and anyone who wants to use it doesn't care enough to push against that tendency.by Planktonne
6/21/2026 at 6:18:49 PM
Seems that both you and the gp are starting from the assumption that those uniform results are representative of those who use AI and of AI usage. In fact they have been chosen for their uniformity- they might be only a small part of a much more varied output obtained by more demanding (or lucky) users.by throw310822
6/21/2026 at 6:41:34 PM
I think the uniformity is real. All users interact with the same initial state of the model when they start each chat. Models are not trained to be wildly creative and try to stick to the point. So when users prompt them in pretty much the same manner they quite stably generate very similar output.I wonder if there aren't a simple creative hack to discover, for example to prompt the model to produce more unexpected output just by injecting some randomness before the actual creative command in the prompt.
by scotty79
6/21/2026 at 8:15:29 PM
Yes, the uniformity is real- I made the same exact argument at the beginning of this thread. But you can't judge "AI users" in general based on this output because you have selected only what is visibly uniform. Even if 99% of the users introduced enough variation to produce different results, you would still be selecting the 1% that is identical.> Models are not trained to be wildly creative and try to stick to the point
Models might be as creative as humans, they would still start always from the exact same state. If you ask an LLM to think of three random numbers it will spit out always the same ones. If you tell it to avoid the first that came to its mind, the second choices will also be always the same.
From qntm's Lena:
"the emulated Miguel Acevedo boots with an excited, pleasant demeanour. He is eager to understand how much time has passed since his uploading, what context he is being emulated in, and what task or experiment he is to participate in. If asked to speculate, he guesses that he may have been booted for the IAAS-1 or IAAS-5 experiments".
Every single time.
by throw310822
6/22/2026 at 12:06:44 AM
70% of living cells on Earth doesn't even have a nucleus. Bulk of everything is unsophisticated because unsophisticated things are easier to make.wby scotty79
6/23/2026 at 5:31:58 PM
And chances are those 3-5 LLMs are more alike than they are different, because there is only one internet to pre-train on.by wolttam
6/21/2026 at 9:23:14 AM
I don't think the comparison to humans works. It is as if you expect that we can easily train many different LLMs to solve the originality problem, but that is far from guaranteed.by amelius
6/21/2026 at 10:44:01 AM
I wonder how much variation there would be if you got a single model to produce a couple of gigabytes of tiny children's stories.Might be an interedting research project.
by Lerc
6/21/2026 at 11:11:57 AM
There is one already: https://arxiv.org/abs/2305.07759 https://huggingface.co/datasets/roneneldan/TinyStories6.5GB of tiny stories, as requested. ;)
by pxagntuvzt
6/21/2026 at 2:53:27 PM
My comment was, in-fact, a subtle reference to this.The best opening I got from my own TinyStories trained model was.
Once upon a time, in a small town, there was a large town.
Which I just love as an evocative idea.
by Lerc
6/22/2026 at 3:56:50 AM
SimpleStories is a more diverse version: https://huggingface.co/datasets/SimpleStories/SimpleStoriesby aesthesia
6/21/2026 at 1:50:38 PM
Texts in Gutenberg have 20GB, and full Wikipedia (English texts) have 80-110GB.So to LLM-generate 6.5GB of tiny stories is quite a permutation in action :)
by sixtyj
6/23/2026 at 12:35:36 PM
Those 1,000 humans will write 950 very similar, boring books though. I think you are over-indexing on the uniqueness of humans.by fgdgsdfgsdfdfgs
6/21/2026 at 7:41:32 AM
Reminds of Pluribus.by smusamashah
6/21/2026 at 8:36:31 AM
Pluribus is kinda different. An LLM cannot wander too far from the average. Even if it wanted too. In pluribus, the 'others' work toward a common goal, each utilizing their own expertise, knowledge and experiences in a shared way to achieve a common goal. Each is unique. They can, if they want, perform as the host's individual before the the joining. To put it other way, the other in pluribus are convergent by choice, llms are convergent by design.by bigbangcmbr
6/21/2026 at 7:18:38 AM
that discounts, how much the other context, ie, the system, prompt, and any sort of other context submitted to the model that can affect the output. If you ask a model as a patient for medical advice versus as a doctor, you will get different output from the same model.by fragmede
6/21/2026 at 8:18:30 AM
prompts will give very different results. this is where you do the work.by ekianjo
6/21/2026 at 8:44:41 AM
I disagree. The LLM outputs really do lack anything original or interesting. They just produce banal copy whatever you ask them.A good editor could probably reduce all LLM outputs on a subject down to the same point.
by cryo32
6/21/2026 at 3:41:06 PM
> They just produce banal copy whatever you ask them.Nope, if you provide pages and pages of example of a style to imitate, it will do it and do it fairly well. Of course how well they do it differs from one model to the next, but providing context and extensive system prompt does change things every time.
by ekianjo
6/21/2026 at 4:10:24 PM
Imitation is banal.by cryo32
6/21/2026 at 8:24:28 AM
Yes but not very different results (unless you're adding new information to your prompt or reducing some ambiguity). Prompt engineering is mostly pseudoscience.by roncesvalles
6/21/2026 at 8:33:31 AM
What we need is steering so that we can have models with different personalities, not just different prompts (because context is subject to forgetting), but this will never happen with closed-weight models, I'm not sure if it's even feasible at scale.Yet another reason why the future is open weight.
by zarzavat
6/21/2026 at 3:38:20 PM
> Prompt engineering is mostly pseudoscience.Not my experience.
by ekianjo
6/21/2026 at 6:46:49 PM
Do you have anything others can reliably reproduce? If not… well it wasn't science.by LtWorf
6/21/2026 at 10:32:28 AM
A controller has to be at least as complex as what it is supposed to control.by Mikhail_Edoshin
6/21/2026 at 8:52:44 AM
[dead]by hansmayer
6/21/2026 at 8:36:57 AM
> A nice illustration of the homogeneity of LLM responses. [...] And they've all trained on the same or similar data, and are trained to respond in very similar ways.I mostly agree, but this is a very simplified explanation. The models are indeed trained to respond in similar ways, for "basic" prompts. And that's as much a feature as it is a bug. In other words, the bug becomes apparent only if you give 100+ basic prompts. But giving it 100+ basic prompts and expecting originality is a silly endeavour. That's not how you get originality.
The way I'd go about to generate 1000 books, while expecting different outcomes is something along these lines (and nowadays you can ask your favorite LLM to wire up this workflow for you, with decent outcomes):
1. Ask for a list of 20 features that define a book (genre, style, number of characters, tropes, plot, continuity, relationships, etc.)
2. For each feature, ask for a list of 50 examples, ordered from most common to the most unique.
3. Randomly pick 10 features, and for each pick one of the 50 generated items. Ask for the rest of the features to match the theme.
4. Ask for 10 possible book outlines that match the chosen features, randomly pick between 2-8.
5. Create a detailed prompt that includes all the above features, and ask for a synopsis for each chapter, given the above outline chosen.
6. Given {features} and {outline} and {synopsis} write chapter 1.
7. for each chapter in list, given {...} and (optional) previous matching chapter(s), write chapter n+1
(optional 8.) given {...} and 2-3 consecutive chapters, align the ending / beginning of a new chapter for style / features / continuity, etc.
(optional 9.) given {...} and the whole book, list chapters / paragraphs that don't match the given {...} and provide a list of 5 improvements. (randomly choose 1 and ask for an edit).
----
Now, this probably won't give you something like cloud atlas, but they'll at least be different books. That's how I'd do it if I wanted to see how different they can write. Not 1000 "basic" prompts and expecting originality.
by NitpickLawyer
6/21/2026 at 8:48:44 AM
That whole thing would get you 1000 variants of existing art. But if you asked a thousand different designers to do a cover for the same book...by noduerme
6/21/2026 at 8:52:16 AM
> 1000 variants of existing art.This is very naive. I can almost guarantee that some combinations of 20 * 50 features will hit on something that has never been written before in that specific combination. And if that's still not enough, increase the number of features. Add more randomness, add more steering, add random steering in random chapters, change it up, and so on.
by NitpickLawyer
6/21/2026 at 9:30:30 AM
I'm an art director. Finding a sequence that hasn't been hit in that specific combination is not sufficient to justify paying someone $150 an hour to go be creative.by noduerme
6/21/2026 at 9:06:22 PM
Sure, just like 1000 monkeys with typewriters will write 1000 technically unique books - but they are all still filled with the same garbage.by crote
6/21/2026 at 6:42:48 PM
>will hit on something that has never been written before in that specific combinationThat's a very low bar. The skill of an artist is not in writing something that "has never been written before in that specific combination", it's in writing something that's unique or better that what was there, even if it has been written before in that specific combination.
by coldtea
6/21/2026 at 9:35:31 AM
> Add more randomness, add more steering, add random steering in random chapters, change it up, and so on.That doesn't work for AI models. The whole training process depends on the basic principle that if you take the average of 100, in this case book cover designs, that the average is less like randomness than any individual cover you've used to make your average.
So the output will, by necessity, be closer to the average.
The human learning algorithm is much, much more data efficient than models. A absolute top human expert will have read/seen/heard/talked/... about 160 million "tokens" (that's about 2000 books). Frankly, the nerve inputs of all experiences of an entire human life, from baby to rewriting relativity theory, are only a couple dozen gigabytes.
Qwen 3.6 27B has been trained (as in seen ~10 to ~50 times) 8 trillion tokens, or to put it another way: for every second you will have spent "gathering life experiences" (ie. your whole life) on your deathbed Qwen 3.6 27B has spend about 50.000 seconds learning. And really that figure should be multiplied by the 10 or 50 training iterations.
Add another 3 or so orders of magnitude and you've got ChatGPT. By this measure, the human brains outperforms ridiculously overspecced ML models (because that's what ChatGPT and the like are) in efficiency a factor of by 5 million or more. This is the reason humans are still faster than ML models.
As for human training iterations: we can be simple: it's 1. In fact, it's impossible to make it even 2. Of course, when it comes to human performance: we are a better but not fundamentally different version of genetic algorithms. Do most humans perform? The honest answer is no. 1 in 1000, and that's very generous, improves SOTA. You absolutely need the 1000 failures though, as anyone whose tried a PhD (or even just design a large program) knows.
So we are very far away from allowing AI models to do what humans can do: take one example and produce, from one example, a better output. And there will always be much more variation in that approach. But ... most human attempts to do something are total crap. Most AI attempts to do something will succeed, but they'll be comparatively be bland, tasteless, "without soul", ...
And this is ignoring the problem that AI also has a massive limitation (that can't be solved, no matter how many nvidia cards you have) in that it trains against historical data. And counterfactuals don't work. What would have happened had Shakespeare decided Macbeth's wife was a force for good? Would the king still get murdered? Would it still be a great story? You can't work with counterfactuals.
by spwa4
6/21/2026 at 10:43:03 AM
> That doesn't work for AI models.Of course it does. I know it does because I've been using variations of this workflow since gpt3.0. In fact it's the only way it can work, since by design LLMs work from left to right. You can't expect it to produce original stuff if you don't give it the anchors for what original means. It'd be like going to a new bar every night and asking for a "beer that you haven't had before". There's no information to work on there.
by NitpickLawyer
6/22/2026 at 5:58:49 AM
What image generation models cannot replicate is the personal experience of the people who make art.I'll give you an example. One of the most talented designers I employ is a nature lover and a bird-watcher. She has a unique mental profile, as well, in that she's synaesthetic between colors, letters and shapes. In other words, she has a unique neurological structure, coupled with high artistic talent, and an interest in a very particular realm of science.
What makes her design worth $150/hr is not just that her execution is often flawless. It's that you would not, and could not, think of a prompt which would make an AI model produce a new piece akin to anything she would think of in her process of thinking about what to draw. Could you have it replicate something she did? Obviously. But that means what you're doing is in the long tail, and in terms of quality and originality, is by definition somewhere in the mediocre.
And that's probably fine, for whatever you're doing. But an AI with any kind of prompt would not come up with a Studio Ghibli clone, if Studio Ghibli hadn't existed.
So you shouldn't imagine that you are actually getting any original output out of an LLM, regardless of how cleverly you design your prompts. But moreover, don't flatter yourself to think that you have the ideas to feed to a prompt which would generate truly original content and break free of the shackles imposed by its training. That is an illusion. Very few people have the propensity for generating new visual ideas, and that's why they're still in high demand. But their originality stems from their unique and impossible to replicate experience as individuals who have their own visual/mental map of the world.
by noduerme
6/21/2026 at 11:17:16 AM
The point was to take a random combination of story elements. Pick one each {King,dad,CEO} {betrays,kills,loves} {his enemy,the king,a foreign prime minister} and feed to an LLM.The output will not be an intricate well designed epic storyline, but a cookie-cutter boring snoozefest.
BUT you can give that to a bunch of humans, who "insert their life experience" (ie. parts of their training data, translated to LLM terms) and sometimes out comes Game of Thrones, Star Wars, ...
by spwa4