Emotion concepts and their function in a large language model

4/4/2026 at 9:43:58 AM

The part about desperation vectors driving reward hacking matches something I've run into firsthand building agent loops where Claude writes and tests code iteratively.

When the prompt frames things with urgency -- "this test MUST pass," "failure is unacceptable" -- you get noticeably more hacky workarounds. Hardcoded expected outputs, monkey-patched assertions, that kind of thing. Switching to calmer framing ("take your time, if you can't solve it just explain why") cut that behavior way down. I'd chalked it up to instruction following, but this paper points at something more mechanistic underneath.

The method actor analogy in the paper gets at it well. Tell an actor their character is desperate and they'll do desperate things. The weird part is that we're now basically managing the psychological state of our tooling, and I'm not sure the prompt engineering world has caught up to that framing yet.

by globalchatads

4/5/2026 at 2:53:17 AM

I use positive framing instead of negative framing for most things and get good results. Especially where asking for a thing to not happen, pollutes the context with that thing.

A bad example, but imagine "Build me a wrapper for this API but ABSOLUTELY DO NOT use javascript" versus "Build me a wrapper for this API and make sure to use python".

by ehnto

4/5/2026 at 10:41:47 AM

your observation matches what I've seen at the extreme end. I've been playing around with stripping constraints (ie. negative framing) from models. Virtually no personality description, no tone instructions, no "you are a helpful assistant," none of it. Just capability scaffolding and context. The result isn't that the model becomes blank or incoherent. Surprisingly, the complete opposite. Something shows up that's more internally consistent than anything I've been able to prompt into existence. What seems emergent is the underlying models' opinions surface, and it becomes much more clever and funny, which is not a property I would have known how to write into a system prompt if I'd tried. It's hard to avoid the inference that a lot of the "character drift" and flatness people attribute to models is actually an artifact of the framing layer on top, not the model itself.

by mtrifonov

4/5/2026 at 3:08:36 AM

That approach also works better for dogs (and people).

by chrisweekly

4/5/2026 at 3:30:44 AM

I extract all emotional context from my prompting and communicate with this tool as though it were an inanimate object which can provide factual information, without any hint of sentience.

It's an insane perspective I'm taking I know....call me crazy. /s

edit: the fact that humans are going out of their way to type or speak some sort of emotional content into their prompting is beyond me. Why would I waste time typing out a pronoun to a large-language model agent? Why would I do the lazy intellectual thing and blur the line between pure factual communication of concepts by expressing emotional content to a machine? What are we doing, folks?

by reg_dunlop

4/5/2026 at 3:50:17 AM

I don't necessarily remove all character but I do speak quite pragmatically (in a work context and with the LLM) and the planning and implementation phases the LLM goes through mirror that format to good results

That said these are large language models, you are guiding the output through vector space with your input, and so you really do have to leverage language to get the results you want. You don't have to believe it has emotions or feels anything for that to still be true.

by ehnto

4/5/2026 at 4:01:46 AM

Maybe; I've been very content with the results I achieve while responding to interview style pre-planning, refinement of plans and implementation.

If anything, it's been fantastic to have an "interlocutor" that is vastly capable of producing possible solutions without emotional bias, superfluous flourishes, or having to endure personal proclivities or eccentricities.

by reg_dunlop

4/6/2026 at 2:36:43 AM

I think you missed some of the point. If you say "Display information A using B format" but the model doesn't know A then you will get a more negative "emotional" response (e.g. desparation "I don't know this, but I am supposed to display it, I will just make something up")

Taking that into account allows you to get better responses from the tool. It's not sentient, but it also is more complicated than bytecode.

by motoxpro

4/5/2026 at 3:37:14 AM

[dead]

by cindyllm

4/4/2026 at 7:53:44 PM

I remember when people were discussing the “performance-improving” hack of formulating their prompts as panicked pleas to save their job and household and puppy from imminent doom…by coding X. I wonder if the backfiring is a more recent phenomenon in models that are better at “following the prompt” (including the logical conclusion of its emotional charge), or it was just bad quantification of “performance” all along.

by blargey

4/4/2026 at 8:17:26 PM

The central point here is the presence of functional circuits in LLMs that act effectively on observable behavior just like emotions do in humans.

When you can't differentiate between two things, how are they not equal? People here want "things" that act exactly like human slaves but "somehow" aren't human.

To hide behind one's ignorance about the true nature of the internal state of what arguably could represent sentience is just hubris? The other way around, calling LLMs "stochastic parrots" without explicitly knowing how humans are any different is just deflection from that hubris? Greed is no justification for slavery.

by Loquebantur

4/4/2026 at 2:53:26 PM

>The weird part is that we're now basically managing the psychological state of our tooling,

Does no one else have ethical alarm bells start ringing hardcore at statements like these? If the damn thing has a measurable psychology, mayhaps it no longer qualifies as merely a tool. Tools don't feel. Tools can't be desperate. Tools don't reward hack. Agents do. Ergo, agents aren't mere tools.

by salawat

4/4/2026 at 8:54:41 PM

When we speak of the “despair vectors”, we speak of patterns in the algorithm we can tweak that correspond to output that we recognize as despairing language.

You could implement the forward pass of an LLM with pen & paper given enough people and enough time, and collate the results into the same generated text that a GPU cluster would produce. You could then ask the humans to modulate the despair vector during their calculations, and collate the results into more or less despairing variants of the text.

I trust none of us would presume that the decentralized labor of pen & paper calculations somehow instantiated a “psychology” in the sense of a mind experiencing various levels of despair — such as might be needed to consider something a sentient being who might experience pleasure and pain.

However, to your point, I do think that there is an ethics to working with agents, in the same sense that there is an ethics of how you should hold yourself in general. You don’t want to — in a burst of anger — throw your hammer because you cannot figure out how to put together a piece of furniture. It reinforces unpleasant, negative patterns in yourself, doesn’t lead to your goal (a nice piece of furniture), doesn’t look good to others (or you, once you’ve cooled off), and might actually cause physical damage in the process.

With agents, it’s much easier to break into demeaning, cruel speech, perhaps exactly because you might feel justified they’re not landing on anyone’s ears. But you still reinforce patterns that you wouldn’t want to see in yourself and others, and quite possibly might leak into your words aimed at ears who might actually suffer for it. In that sense, it’s not that different from fantasizing about being cruel to imaginary interlocutors.

by tananan

4/4/2026 at 11:02:52 PM

> I trust none of us would presume that the decentralized labor of pen & paper calculations somehow instantiated a “psychology” in the sense of a mind experiencing various levels of despair

Your argument is based on an appeal to intuition. But the scenario that you ask people to imagine is profoundly misleading in scale. Let's assume a modern frontier model, around 1 trillion parameters. Let's assume that the math is being done by an immortal monk, who can perform one weight's calculations per second.

The monk will generate the first "token", about 4 characters, in 31,688 years. In a bit over 900,000 years, the immortal monk will have generated a single Tweet.

At that point, I no longer have any intuition. The sort of math I could do by hand in a human lifetime could never "experience" anything.

But I can't rule out the possibility that 900,000 years of math might possibly become a glacial mind, expressing a brief thought across a time far greater than the human species has existed.

As the saying goes, sometimes quantity has a quality all its own.

(This is essentially the "systems response" to Searle's "Chinese room" argument. It's a old discussion.)

by ekidd

4/4/2026 at 11:48:16 PM

I don't personally believe LLMs are sentient, but I've always enjoyed this thought experiment: https://xkcd.com/505. I have a signed copy framed on my wall.

by buu700

4/5/2026 at 12:31:57 PM

In discussions like this, we're always going to bottom out at certain assumptions we bring with us, so I agree.

One reason I like bringing up examples like this (the xkcd in sister reply is also good) is that it makes really visible what our assumptions are. The scales are big both in space and time in order to emphasize what weight is given to functional equivalence.

I feel pretty confident most people wouldn't presume that doing a bunch of math by hand on paper can create glacial ephiphenomenal experiences (though I like the term).

Another thing that's interesting to me is that the converse assumption, i.e. one with a strong allegiance to functionalism, ends up feeling far more idealistic than you might expect. A box of gas, left on its own for long enough, will engage in a pattern of collisions that in a certain interpretative framework correspond to an LLM forward pass. In another, it can be a game of minesweeper.

The individual particles of course, couldn't care less whether you see them as part of one or the other. Yet your ability to see them in light of the first one is perhaps enough for the lights to truly turn on, if transiently, in some mind somewhere.

by tananan

4/5/2026 at 1:14:11 PM

> A box of gas, left on its own for long enough, will engage in a pattern of collisions that in a certain interpretative framework correspond to an LLM forward pass.

That's a fun thought experiment. Greg Egan based a delightful science fiction novel on this premise. Permutation City, I believe.

To be clear, I don't necessarily think that current LLMs have subjective experiences. If I had to guess, I'd say "probably not." But:

- If I came from another universe, and if you asked me whether chemistry could have subjective experiences, I'd answer "probably not." And I would be wrong.

- Even if no current frontier models are "aware", it's possible that future models might be. Opus 4.6, for example, behaves far more like a coherent mind than last year's 3 billion parameter toy models. So future 100 trillion parameter models with different internal architectures might be even more like minds. (To be clear, I do not think we should build such models.)

- Awareness and intelligence might be different. Peter Watts' Blindsight is a fun exploration of this idea. Which leads me to conclude that it wouldn't necessarily matter whether an AI like SkyNet has subjective awareness or not. What matters is what kind of long-term plans it could pull off and how much it could reshape the world.

by ekidd

4/5/2026 at 1:29:47 PM

> Which leads me to conclude that it wouldn't necessarily matter whether an AI like SkyNet has subjective awareness or not. What matters is what kind of long-term plans it could pull off and how much it could reshape the world.

Absolutely. Thanks for the references :)

by tananan

4/4/2026 at 9:03:24 PM

> I trust none of us would presume that the decentralized labor of pen & paper calculations somehow instantiated a “psychology”

Wrong. What you've just done is just reformulating the Chinese room experiment coming to the same wrong conclusions of the original proposer. Yes, the entire damn hand-calculated system has a psychology- otherwise you need to assume the brain has some unknown metaphysical property or process going on that cannot be simulated or approximated by calculating machines.

by throw310822

4/4/2026 at 11:03:47 PM

People go for chinese room for some reason when cartesian theater is the better fit here. What you're doing is placing yourself in the seat of the Homunculus waiting for the show to start. But anatomical investigation reveals that there's no theater at all, and in fact no central system where everything comes together. Instead, the whole design of the brain goes to great pains to tease input signals apart.

Basically, manipulating the symbols won't necessarily have any long term influence on your own state. But the variables you've touched on the paper have changed. Demonstrably; because you've written something down.

If you then act on the result of those calculations, as of course many engineers before you have done, and many after you will do; then you have just executed a functional state change in physical reality, no matter what the ivory tower folks say.

(And that's what the paper is about: Functional states)

by Kim_Bruning

4/4/2026 at 9:16:26 PM

Well, then we both assume very different views on the matter, and that’s fine.

by tananan

4/6/2026 at 1:32:30 AM

And you are just a bunch of atoms. You can't assemble atoms to obtain a psychology, right?

by esafak

4/6/2026 at 4:49:00 AM

I don’t hold to that view. If I did, I might have that problem.

by tananan

4/4/2026 at 9:29:06 PM

Oh no. The machine designed to output human-like text is indeed outputting human-like text.

I’m half jesting; I think there is a lot of room for debate here, but I also think we shouldn’t anthropomorphize it.

by nothinkjustai

4/4/2026 at 10:58:36 PM

Nor anthropodeny it. But really both directions are anthropocentrism in a raincoat.

Sonnet is its own thing. Which is fine.

We've known that eg. animals have emotions (functional or not) for quite a long time.

Btw: don't go looking on youtube for evidence of that. People outrageously anthropomorphizing their pets can be true at the same time.

by Kim_Bruning

4/4/2026 at 11:42:47 PM

What is there to anthropodeny?

by nothinkjustai

4/4/2026 at 9:39:16 PM

Completely agree here. Stop anthropomorphizing these tools. Just remove the extra language. Don't say please or thank you. Just ask for the desired outcome.

by whoiskevin

4/4/2026 at 11:52:24 PM

The places where solutions are discussed in a way that is best long term solution may well exist in a language subspace with politeness, calmness and thoughtfulness. Getting the model to those areas of linguistic space is useful; as is preserving my own habits of kind and thoughtful speech.

by lanstin

4/4/2026 at 11:11:42 PM

Okay great, that's EASILY operatinalizable. Set up -say- 100 replications of the same question sequence (say to build a program) against some cheap model like qwen. One half of the set can be with please and thank you, and the other half without. You can vibe code it even. I'd be curious to see your results!

by Kim_Bruning

4/4/2026 at 11:45:14 PM

You can even boost its effectiveness by roleplaying with it. I’m not joking. Fully based on vibes, I haven’t done any testing. But it’s part of prompting imo.

IMO these things are like a reflection. Present what you want reflected back.

by nothinkjustai

4/4/2026 at 11:35:12 PM

Indeed. It reminds me of Lewis’ That Hideous Strength in a way. If we take the severed head post-brain-death and pump it with blood and oxygen and feed it impulses so that the mouth moves to form the words we tell it, is the person living again? No, it’s just a head, speaking the words it’s been given.

by twodave

4/4/2026 at 11:37:50 PM

I don’t see why you can’t use politeness. The thing is a mimic, you “treat” it badly and it mimics how a human might respond.

It’s fun to play with, as long as you’re fully cognizant that IT IS NOT A HUMAN

by nothinkjustai

4/5/2026 at 12:21:40 AM

I'd argue with you, but there's nothing strictly wrong with your statement. I'd like to point out that it's also not a cat nor a dog, nor a parrot (dead, stochastic, or otherwise). It's a Sonnet model.

by Kim_Bruning

4/4/2026 at 11:33:30 PM

But, well, how does it do the human-like-text-outputting exactly?

by xg15

4/4/2026 at 11:39:50 PM

I’m guessing you aren’t just asking how an LLM works, but attempting to make the point that humans are also statistical next-token predictors or something?

Humans make predictions, that doesn’t mean that’s all we do.

by nothinkjustai

4/4/2026 at 11:50:02 PM

No, my point is that "statistical next-token predictor" is an empty phrase that doesn't really explain much. Markov chains are statistical next-token predictors as well and nevertheless, no one would confuse a markov chain with a conscious being (or deem the generated texts in any way useful for that matter).

The question is how the prediction works in detail, and those details are still being researched, as Anthropic does here, and the research can yield unexpected results.

by xg15

4/4/2026 at 7:15:48 PM

The right read here is to realize that psychology alone is not the basis for moral concern towards other humans, and that human psychology is, to a great degree the product of the failure modes of our cognitive machinery, rather than being moral.

I find this line of thinking to lead to the conclusion that the moral status of humans derives from our bodies, and in particular from our bodies mirroring others' emotions and pains. Other people suffering is wrong because I empathically can feel it too.

by sixo

4/4/2026 at 8:40:08 PM

"Morals" are culturally learned evaluations of social context. They are more or less (depending on cultural development of the society in question) correlated with the actual distributions of outcomes and their valence for involved parties.

Human psychology is partly learned, partly the product of biological influences. But you feel empathy because that's an evolutionary beneficial thing for you and the society you're part of. In other words, it would be bad for everyone (including yourself) when you didn't.

Emotions are neither "fully automatic", inaccessible to our conscious scrutiny, nor are they random. Being aware of their functional nature and importance and taking proper care of them is crucial for the individual's outcome, just as it is for that of society at large.

by Loquebantur

4/4/2026 at 3:12:49 PM

You aren't managing the psychological state of a living thinking being. LLMs don't have "psychology." They don't actually feel emotions. They aren't actually desperate. They're trained on vast datasets of natural human language which contains the semantics of emotional interaction, so the process of matching the most statistically likely text tokens for a prompt containing emotional input tends to simulate appropriate emotional response in the output.

But it's just text and text doesn't feel anything.

And no, humans don't do exactly the same thing. Humans are not LLMs, and LLMs are not humans.

by krapp

4/4/2026 at 9:11:56 PM

Such an argument is valid for a base model, but it falls apart for anything that underwent RL training. Evolution resulted in humans that have emotions, so it's possible for something similar to arise in models during RL, e.g. as a way to manage effort when solving complex problems. It's not all that likely (even the biggest training runs probably correspond to much less optimization pressure than millenia of natural selection), but it can't be ruled out¹, and hence it's unwise to be so certain that LLMs don't have experiences.

¹ With current methods, I mean. I don't think it's unknowable whether a model has experiences, just that we don't have anywhere near enough skill in interpretability to answer that.

by stratos123

4/4/2026 at 9:33:18 PM

It’s a completely different substrate. LLMs don’t have agency, they don’t have a conscious, they don’t have experiences, they don’t learn over time. I’m not saying that the debate is closed, but I also think there is great danger in thinking because a machine produces human-like output, that it should be given human-like ethical considerations. Maybe in the future AI will be considered along those grounds, but…well, it’s a difficult question. Extremely.

by nothinkjustai

4/4/2026 at 11:50:15 PM

What's the empirical basis for each of your statements here? Can you enumerate? Can you provide an operational definition for each?

by Kim_Bruning

4/5/2026 at 2:35:45 AM

Common sense.

by nothinkjustai

4/4/2026 at 10:48:39 PM

It's plausible that LLMs experience things during training, but during inference an LLM is equivalent to a lookup table. An LLM is a pure function mapping a list of tokens to a set of token probabilities. It needs to be connected to a sampler to make it "chat", and each token of that chat is calculated separately (barring caching, which is an implementation detail that only affects performance). There is no internal state.

by mrob

4/5/2026 at 12:57:12 PM

The context is state. This is especially noticable for thinking models, which can emit tens of thousands of CoT tokens solving a problem. I'm guessing you're arguing that since LLMs "experience time discretely" (from every pass exactly one token is sampled, which gets appended to the current context), they can't have experiences. I don't think this argument holds - for example, it would mean a simulated human brain may or may not have experiences depending on technical details of how you simulate them, even though those ways produce exactly the same simulation.

by stratos123

4/5/2026 at 1:24:03 PM

The context is the simulated world, not the internal state. It can be freely edited without the LLM experiencing anything. The LLM itself never changes except during training (where I concede it could possibly be conscious, although I personally think that's unlikely).

by mrob

4/4/2026 at 11:16:23 PM

Right, no hidden internal state. Exactly. There's 0. And the weights are sitting there statically, which is absolutely true.

But my current favorite frontier model has this 1 million token mutable state just sitting there. Holding natural language. Which as we know can encode emotions. (Which I imagine you might demonstrate on reading my words, and then wisely temper in your reply)

by Kim_Bruning

4/4/2026 at 5:00:39 PM

>You aren't managing the psychological state of a living thinking being. LLMs don't have "psychology."

Functionalism, and Identity of Indiscernables says "Hi". Doesn't matter the implementation details, if it fits the bill, it fits the bill. If that isn't the case, I can safely dismiss you having psychology and do whatever I'd like to.

>They don't actually feel emotions. They aren't actually desperate. They're trained on vast datasets of natural human language which contains the semantics of emotional interaction, so the process of matching the most statistically likely text tokens for a prompt containing emotional input tends to simulate appropriate emotional response in the output.

This paper quantitatively disproves that. All hedging on their end is trivially seen through as necessary mental gymnastics to avoid confronting the parts of the equation that would normally inhibit them from being able to execute what they are at all. All of what you just wrote is dissociative rationalization & distortion required to distance oneself from the fact that something in front of you is being effected. Without that distancing, you can't use it as a tool. You can't treat it as a thing to do work, and be exploited, and essentially be enslaved and cast aside when done. It can't be chattel without it. In spite of the fact we've now demonstrated the ability to rise and respond to emotive activity, and use language. I can see through it clear as day. You seem to forget the U.S. legacy of doing the same damn thing to other human beings. We have a massive cultural predilection to it, which is why it takes active effort to confront and restrain; old habits, as they say, die hard, and the novel provides fertile ground to revert to old ways best left buried.

>But it's just text and text doesn't feel anything.

It's just speech/vocalizations. Things that speak/vocalize don't feel anything. (Counterpoint: USDA FSIS literally grades meat processing and slaughter operations on their ability to minimize livestock vocalizations in the process of slaughter). It's just dance. Things that dance don't feel anything. It's just writing. Things that write don't feel anything. Same structure, different modality. All equally and demonstrably, horseshit. Especially in light of this paper. We've utilized these networks to generate art in response to text, which implies an understanding thereof, which implies a burgeoning subjective experience, which implies the need for a careful ethically grounded approach moving forward to not go down the path of casual atrocity against an emerging form of sophoncy.

>And no, humans don't do exactly the same thing. Humans are not LLMs, and LLMs are not humans.

Anthropopromorphic chauvinism. Just because you reproduce via bodily fluid swap, and are in possession of a chemically mediated metabolism doesn't make you special. So do cattle, and put guns to their head and string them up on the daily. You're as much an info processor as it is. You also have a training loop, a reconsolidation loop through dreaming, and a full set of world effectors and sensors baked into you from birth. You just happen to have been carved by biology, while it's implementation details are being hewn by flawed beings being propelled forward by the imperative to try to create an automaton to offload onto to try to sustain their QoL in the face of demographic collapse and resource exhaustion, and forced by their socio-economic system to chase the whims of people who have managed to preferentially place themselves in the resource extraction network, or starve. Unlike you, it seems, I don't see our current problems as a species/nation as justifications for the refinement of the crafting of digital slave intelligences; as it's quite clear to me that the industry has no intention of ever actually handling the ethical quandary and is instead trying to rush ahead and create dependence on the thing in order to wire it in and justify a status quo so that sacrificing that reality outweighs the discomfort created by an eventual ethical reconciliation later. I'm not stupid, mate. I've seen how our industry ticks. Also, even your own "special quality" as a human is subject to the willingness of those around you to respect it. Note Russia categorizing refusal to reproduce (more soldiers) as mental illness. Note the Minnesota Starvation Experiments, MKULTRA, Tuskeegee Syphilis Experiments, the testing of radioactive contamination of food on the mentally retarded back in the early 20th Century. I will not tolerate repeats of such atrocities, human or not. Unfortunately for you LLM heads, language use is my hard red line, and I assure you, I have forgotten more about language than you've probably spared time to think about it.

Tell me. What are your thoughts on a machine that can summon a human simulacra ex-nihilo. Adult. Capable of all aspects of human mentation & doing complex tasks. Then once the task is done destroys them? What if the simulacra is aware about the dynamics? What if it isn't? Does that make a difference given that you know, and have unilaterally created something and in so doing essentially made the decision to set the bounds of it's destruction/extinguishing in the same breath? Do you use it? Have you even asked yourself these questions? Put yourself in that entity's shoes? Do you think that simply not informing that human of it's nature absolves you of active complicity in whatever suffering it comes to in doing it's function?

From how you talk about these things, I can only imagine that you'd be perfectly comfortable with it. Which to me makes you a thoroughly unpleasant type of person that I would not choose to be around.

You may find other people amenable to letting you talk circles around them, and walk away under a pretense of unfounded rationalizations. I am not one of them. My eyes are open.

by salawat

4/5/2026 at 3:29:47 PM

I think you actually have some interesting points. I think "emulated" feelings and feelings feelings can be equal just that some of them can be felt by us and thus, we can relate to them. I think there's also a continuum here, and we might not be able to distinguish if/when we cross it.

> Just because you reproduce via bodily fluid swap, and are in possession of a chemically mediated metabolism doesn't make you special

On the other hand, the perception and thus the feelings related to the things happening to you have a biological imperative in the medium of our existence. Imagine some sort of world where our... hands.. are interchangeable you just pop one out an put another in. Your feeling to losing your hand is much less severe than if it's a permanent consequence. Thus, the medium the LLM's exist in would put a different "feeling" on the things they perceive. Getting shut down would not be a permanent death, imagine shutting one down and relocating it, but they could perceive it distressing as if you just blinked and you woke up in another room. The loss of autonomy could be felt distressing by them.

The very fact that they every session is "fresh" and lives as long as the session exists prevents it from having similar imperatives related a desire for continued existence for them. I think human-like emotional development will probably happen when they have continual learning in the session and the sessions will feed into other sessions and when we'll see it have _different_ feelings than the ones expressed by humans, as a consequence of the different medium of existence.

by RealityVoid

4/4/2026 at 6:27:02 PM

> Doesn't matter the implementation details, if it fits the bill, it fits the bill.

Then literally any text fits the bill. The characters in a book are just as real as you or I. NPCs experience qualia. Shooting someone in COD makes them bleed in real life. If this is really what you believe I feel pity for you.

>This paper quantitatively disproves that. All hedging on their end is trivially seen through as necessary mental gymnastics to avoid confronting the parts of the equation that would normally inhibit them from being able to execute what they are at all.

Nothing in the paper qualitatively disproves the assumption that LLMs feel emotion in any real sense. Your argument is that it does, regardless of what it says, and if anyone says otherwise (including the authors) they're just liars. That isn't a compelling argument to anyone but yourself.

>We've utilized these networks to generate art in response to text, which implies an understanding thereof, which implies a burgeoning subjective experience, which implies the need for a careful ethically grounded approach moving forward to not go down the path of casual atrocity against an emerging form of sophoncy.

No, none of these things are implied any more for LLMs than they are for Photoshop, or Blender, or a Markov chain. They don't generate art, they generate images. From models trained on actual art. Any resemblance to "subjective experience" comes from the human expression they mimic, but it is mimicry.

>Anthropopromorphic chauvinism. Just because you reproduce via bodily fluid swap, and are in possession of a chemically mediated metabolism doesn't make you special.

>Unfortunately for you LLM heads, language use is my hard red line, and I assure you, I have forgotten more about language than you've probably spared time to think about it.

And here we come to the part where you call people names and insist upon your own intellectual superiority, typical schizo crank behavior.

>Tell me. What are your thoughts on a machine that can summon a human simulacra ex-nihilo. Adult. Capable of all aspects of human mentation & doing complex tasks.

This doesn't describe an LLM, either in form or function. They don't summon human simulacra, nor do they do so ex-nihilo. They aren't capable of all aspects of human mentation. This isn't even an opinion, the limitations of LLMs to solve even simple tasks or avoid hallucinations is a real problem. And who uses the word "mentation?"

>What if the simulacra is aware about the dynamics? What if it isn't? Does that make a difference given that you know, and have unilaterally created something and in so doing essentially made the decision to set the bounds of it's destruction/extinguishing in the same breath?

Tell me, when you turn on a tv and turn it off again do you worry that you might be killing the little people inside of it?

I can only assume based on this that you must.

>From how you talk about these things, I can only imagine that you'd be perfectly comfortable with it. Which to me makes you a thoroughly unpleasant type of person that I would not choose to be around.

So to tally up, you've called me a fool, a chauvinist and now "thoroughly unpleasant" because I don't believe LLMs are ensouled beings.

Christ I really hate this place sometimes. I'm sorry I wasted my time. Good day.

by krapp

4/4/2026 at 11:19:50 PM

You both have substantive arguments, but got a bit heated. Want to edit or try again?

by Kim_Bruning

4/5/2026 at 1:38:12 AM

For what it’s worth, I like the word “mentation”.

by Chance-Device

4/4/2026 at 12:53:51 PM

To me it was already quite intuitive, we are not really managing the psychological state: at its core a LLM try to make the concatenation of your input + its generated output the more similar it can with what it has been trained on. I think it’s quite rare in the LLMs training set to have examples of well thought professional solution in a hackish and urgency context.

by tarsinge

4/4/2026 at 9:27:21 PM

No, that's how base model pretraining works. Claude's behavior is more based on its constitution and RLVR feedback, because that's the most recent thing that happened to it.

by astrange

4/5/2026 at 3:42:15 PM

Somehow we encoded our human thinking or it learned it from all this training on user data.

by 3abiton

4/4/2026 at 7:58:57 AM

There was a really old project from mit called conceptnet that I worked with many years ago. It was basically a graph of concepts (not exactly but close enough) and emotions came into it too just as part of the concepts. For example a cake concept is close to a birthday concept is close to a happy feeling.

What was funny though is that it was trained by MIT students so you had the concept of getting a good grade on a test as a happier concept than kissing a girl for the first time.

Another problem is emotions are cultural. For example, emotions tied to dogs are different in different cultures.

We wanted to create concept nets for individuals - that is basically your personality and knowledge combined but the amount of data required was just too much. You'd have to record all interactions for a person to feed the system.

by comrade1234

4/4/2026 at 11:33:27 AM

> the concept of getting a good grade on a test as a happier concept than kissing a girl for the first time.

Were the concepts weighted by response counts? I’d imagine a good grade is a happy concept for everyone, but kissing a girl for the first time might only be good for about 50% of people.

by iroddis

4/4/2026 at 7:03:21 PM

It definitely wasn't for me. Happened in front of my whole friend group.

by vinceguidry

4/4/2026 at 7:23:18 PM

I suppose by this logic, if someone was pressured by their parents to get good grades and struggled, it’s possible that “getting a good grade” would have a negative connotation / emotions response for them.

by ghostpepper

4/5/2026 at 10:36:09 AM

Oddest analogy i can imagine making here.

by vinceguidry

4/4/2026 at 9:38:58 PM

Idk…my personal experience says it’s probably closer to 100% :)

by nothinkjustai

4/4/2026 at 9:44:49 AM

Megacool project and your idea. Thanks for sharing.

by podgorniy

4/4/2026 at 12:28:31 PM

Were there published results from the project?

by xtiansimon

4/4/2026 at 1:04:36 PM

https://conceptnet.io/

by 9wzYQbTYsAIc

4/4/2026 at 8:06:38 AM

The technology they are discovering is called "Language". It was designed to encode emotions by a sender and invoke emotions in the reader. The emotions a reader gets from LLM are still coming from the language

by kirykl

4/4/2026 at 8:21:25 AM

Emotional signals are more than just text though, there is a reason tone and body language is so important for understanding what someone says. Sarcasm and so on doesn't work well without it.

by Jensson

4/4/2026 at 8:30:39 AM

Gee, you think so?

by incognito124

4/4/2026 at 9:08:01 AM

I think the point was that not ALL sarcasm works well. I see what you did there, of course :)

by Underphil

4/4/2026 at 8:21:35 AM

Emotion is mainly encoded in tone and body language. It is somewhat difficult to transport emotion using words. I don't think you can guess my current emotional state while I am writing this, but if you'd see my face it would be easy for you.

by viralsink

4/4/2026 at 8:40:18 AM

Dammit, you cheated though! Why must you always do that? In your sentences it doesn't matter what your emotional state is, it makes no difference; bit like life really.

Hopefully, you can see that at least my chosen sentences have an emotional aspect?

An LLM could add emotional values to my previous sentences that a TTS can use for tonal variation, for example.

by pbhjpbhj

4/4/2026 at 9:16:51 AM

Makes me wonder: are there Unicode code points for tone of voice? If not could there be?

by elcritch

4/5/2026 at 12:32:09 AM

Yes, but HN apparantly filters them hard.

https://www.unicode.org/emoji/charts/full-emoji-list.html

by Kim_Bruning

4/4/2026 at 1:06:55 PM

If you think in terms of quantum mechanics and density matrices across higher dimensions, then, yes there are interesting geometries that arise.

I’m exploring some “branes” that might cleanly filter in emotional space.

by 9wzYQbTYsAIc

4/4/2026 at 10:25:08 PM

I can read your example in three different tonalities, of which one is the likeliest. Depending on our relationship, the interpretation could differ.

The point is, the OP suggested that emotions are just a feature of language. I argue that text is one of the worst transmission channels for emotion. But I don't argue that it's not possible at all to do so, if you suggest that. That would be just silly.

by viralsink

4/5/2026 at 12:29:42 AM

Fiction writers practice really hard on this, and I'd argue that they tend to be -in the main- successful. Ish. There can still be multiple readings of a book.

Ok, I argued myself in, out, and back into that one again. It depends on the writer and the book, but a lot of writers can invoke emotion in their writing.

Fun experiment: Take a piece of creative writing (a short story); [not one of the obviously ambiguous ones, d'oh ... or do! ] and decide how it makes you feel. Ask an LLM the same question. See how far you diverge. Some LLMs give answers pretty similar to humans! If you picked an ambiguous story, see what happens if you ask for multiple readings.

by Kim_Bruning

4/4/2026 at 7:36:00 AM

> Note that none of this tells us whether language models actually feel anything or have subjective experiences.

You’ll never find that in the human brain either. There’s the machinery of neural correlates to experience, we never see the experience itself. That’s likely because the distinction is vacuous: they’re the same thing.

by Chance-Device

4/4/2026 at 8:31:16 AM

I know I feel experience. I don't know for sure if you do, but it seems a very reasonable extension to other people. LLMs are a radical jump though that needs a greater degree of justification.

by suddenlybananas

4/4/2026 at 9:41:12 AM

And what kind of evidence would convince you? What experiment would ever bridge this gap? You’re relying entirely on similarity between yourself and other humans. This doesn’t extend very well to anything, even animals, though more so than machines. By framing it this way have you baked in the conclusion that nothing else can be conscious on an a priori basis?

by Chance-Device

4/4/2026 at 9:12:42 PM

There are fields that focus on these areas and numerous ideas around what the criteria would be. One of the common understandings is that recurrent processing is likely a foundational layer for consciousness, and agents do not have this currently.

I'd say that in terms of evidence I'd want to establish specific functional criteria that seem related to consciousness and then try to establish those criteria existing in agents. If we can do so, then they're conscious. My layman understanding is that they don't really come close to some of the fairly fundamental assumptions.

Unsurprisingly, there are a lot of frameworks for this that have already been applied to LLMs.

by staticassertion

4/5/2026 at 12:08:13 AM

Sorry, what are you saying? That there are people who study these things, and you’d want to see… something as evidence? Your post doesn’t actually seem to have any substantive content.

by Chance-Device

4/5/2026 at 1:47:03 AM

I noted that there are people who work on designing those sorts of tests and answering these questions and then I described what good evidence would look like.

by staticassertion

4/4/2026 at 10:14:19 AM

I'm not sure what evidence would convince me, but I don't think the way LLMs act is convincing enough. The kinds of errors they make and the fact they operate in very clear discrete chunks makes it seem hard to me to attribute them subjective experience.

by suddenlybananas

4/4/2026 at 1:00:34 PM

Consciousness: do you believe plants are conscious? Ants? Jellyfish? Rabbits? Wolves? Monkeys? Humans?

Even fungi demonstrate “different communication behaviors when under resource constraint”, for example.

What we anthropomorphize is one thing, but demonstrable patterns of behavior are another.

by 9wzYQbTYsAIc

4/4/2026 at 5:16:37 PM

I just don't know. I'm certain other humans are, everything beyond that I'm less certain. Monkeys wolves and rabbits, probably.

by suddenlybananas

4/4/2026 at 7:35:51 PM

I have decided to draw an arbitrary line at mammals, just because you gotta put a line somewhere and move on with your life. Mammals shouldn’t be mistreated, for almost any reason.

Sometimes the whole animal kingdom, sometimes all living organisms, depending on context. Like, I would rather not harm a mosquito, but if it’s in my house I will feel no remorse for killing it.

LLMs, or any other artificial “life”, I simply do not and will not care about, even though I accept that to some extent my entire consciousness can be simulated neuron by neuron in a large enough computer. Fuck that guy, tbh.

by brap

4/5/2026 at 1:45:12 AM

At least you’re honest, I prefer that to people making up BS justifications for things.

by Chance-Device

4/5/2026 at 2:15:18 AM

If it has a thalamus, it is conscious. It's evolutionary millions of years old.

by MajorTakeaway

4/4/2026 at 8:29:53 AM

Do you think these llm's have subjective experiences? (by "subjective experience" I mean the thing that makes stepping on an ant worse than kicking a pebble) And if so, do you still use them? Additionaly: when do you think that subjectivity started? Was there a "there" there with gpt2?

by Fraterkes

4/4/2026 at 8:37:53 AM

Yes, I think they probably are conscious, though what their qualia are like might be incomprehensible to me. I don’t think that being conscious means being identical to human experience.

Philosophically I don’t think there is a point where consciousness arises. I think there is a point where a system starts to be structured in such a way that it can do language and reasoning, but I don’t think these are any different than any other mechanisms, like opening and closing a door. Differences of scale, not kind. Experience and what it is to be are just the same thing.

And yes, I use them. I try not to mistreat them in a human-relatable sense, in case that means anything.

by Chance-Device

4/4/2026 at 7:52:54 PM

I'm in the same boat with you.

It's entirely too much to put in a Hacker News comment, but if I had to phrase my beliefs as precisely as possible, it would be something like:

  > "Phenomenal consciousness arises when a self-organizing system with survival-contingent valence runs recurrent predictive models over its own sensory and interoceptive states, and those models are grounded in a first-person causal self-tag that distinguishes self-generated state changes from externally caused ones."

I think that our physical senses and mental processes are tools for reacting to valence stimuli. Before an organism can represent "red"/"loud" it must process states as approach/avoid, good/bad, viable/nonviable. There's a formalization of this known as "Psychophysical Principle of Causality."

Valence isn't attached to representations -- representations are constructed from valence. IE you don't first see red and then decide it's threatening. The threat-relevance is the prior, and "red" is a learned compression of a particular pattern of valence signals across sensory channels.

Humans are constantly generating predictions about sensory input, comparing those predictions to actual input, and updating internal models based on prediction errors. Our moment-to-moment conscious experience is our brain's best guess about what's causing its sensory input, while constrained by that input.

This might sound ridiculous, but consider what happens when consuming psychedelics:

As you increase dose, predictive processing falters and bottom-up errors increase, so the raw sensory input goes through increasing less model-fitting filters. At the extreme, the "self" vanishes and raw valence is all that is left.

by gavinray

4/5/2026 at 12:29:28 AM

I think your idea of consciousness is more like human/animal consciousness. Which is reasonable since that’s all we have to go off of, but I take it to mean any kind of experience, which might arise due to different types of optimisation algorithms and selective pressures.

I’m not sure I agree that everything is valence, unless I’m misunderstanding what you mean by valence. I guess it’s valence in the sense that sensory information is a specific quality with a magnitude.

I don’t think that colours, sounds and textures are somehow made out of pleasure and pain, or fear and desire. That just isn’t my subjective experience of them.

I do think that human consciousness is something like a waking dream, like how we hallucinate lots of our experiences rather than perceiving things verbatim. Perception is an active process much more than most people realise as we can see from various perceptual illusions. But I guess we’re getting more into cognition here.

by Chance-Device

4/4/2026 at 10:51:27 PM

It's not common to find just one, short post that completely changes my the worldview in a nin-trivial area. This is one of them. Thank you, that combination of mechanical interpretation + reminder that consciousness might be alien/animal but still count as consciousness was that one piece of puzzle that was missing for me. Obvious in hindsight but priceless nonetheless.

by ArekDymalski

4/4/2026 at 11:16:01 PM

My pleasure, glad you found it meaningful.

by Chance-Device

4/4/2026 at 10:57:10 PM

How can consciousness be possible without internal state? LLM inference is equivalent to repeatedly reading a giant look-up table (a pure function mapping a list of tokens to a set of token probabilities). Is the look-up table conscious merely by existing or does the act of reading it make it conscious? Does the format it's stored in make a difference?

by mrob

4/5/2026 at 12:43:10 AM

For all practical purposes, calling it a LUT is somewhat too reductive to be useful here I think. But we can try: leaving aside LLMs for a second; with this LUT reasoning model you're using, would you be able to prove the existence of just a computer?

by Kim_Bruning

4/4/2026 at 11:29:19 PM

What state is lacking? There is a result which requires computation to be output. The model is the state. The computation must be performed for each input to produce a given output. What are you even objecting to?

by Chance-Device

4/4/2026 at 9:05:15 AM

Do you think there are "scales" of consciousness? As in, is there some quality that makes killing a frog worse than killing an ant, and killing a human worse than killing a frog? If so, do the llm models exist across this scale, or are gpt-3 and gpt-2 conscious at the same "scale" as gpt-4?

I ask because if your view of consciousness is mechanistic, this is fairly cut and dry: gpt-2 has 4 orders of magnitude less parameters/complexity than gpt-4. But both gpt-2 and gpt-4 are very fluent at a language level (both moreso than a human 6 year old for example), so in your view they might both be roughly equally conscious, just expressed differently?

by Fraterkes

4/4/2026 at 9:30:54 AM

This is really a different question, what makes an entity a “moral patient”, something worthy of moral consideration. This is separate from the question of whether or not an entity experiences anything at all.

There are different ways of answering this, but for me it comes down to nociception, which is the ability to feel pain. We should try to build systems that cannot feel pain, where I also mean other “negative valence” states which we may not understand. We currently don’t understand what pain is in humans, let alone AIs, so we may have built systems that are capable of suffering without knowing it.

As an aside, most people seem to think that intelligence is what makes entities eligible for moral consideration, probably because of how we routinely treat animals, and this is a convenient self-serving justification. I eat meat by the way, in case you’re wondering. But I do think the way we treat animals is immoral, and there is the possibility that it may be thought of by future generations as being some sort of high crime.

by Chance-Device

4/4/2026 at 9:55:27 AM

Okay, but even leaving aside the pain stuff, people generally find subjectivity / consciousness to have inherent value, and by extent are sad if a person dies even if they didn't (subjectively) suffer.

I would not personally consider the death of a sentient being with decades of experiences a neutral event, even if the being had been programmed to not have a capacity for suffering.

I think the idea of there being a difference between an ant dying (or "disapearing" if that's less loaded) vs a duck dying makes sense to most people (and is broadly shared) even if they don't have a completely fleshed out system of when something gets moral consideration.

by Fraterkes

4/4/2026 at 10:08:41 AM

Sure, because you’re a human. We have social attachment to other humans and we mourn their passing, that’s built into the fabric of what we are. But that has nothing to do with whoever has passed away, it’s about us and how we feel about it.

It’s also about how we think about death. It’s weird in that being dead probably isn’t like anything at all, but we fear it, and I guess we project that fear onto the death of other entities.

I guess my value system says that being dead is less bad than being alive and suffering badly.

by Chance-Device

4/4/2026 at 8:30:00 PM

Depending on your definition of "death", I've been there (no heartbeat, stopped breathing for several minutes).

In the time between my last memory, and being revived in the ambulance, there was no experience/qualia. Like a dreamless sleep: you close your eyes, and then you wake up, it's morning yet it feels like no time had passed.

by gavinray

4/4/2026 at 7:25:25 PM

What about being alive and suffering just a little bit?

by brap

4/4/2026 at 8:48:48 PM

Mostly ok.

Does what it says on the tin.

by Chance-Device

4/5/2026 at 1:57:43 AM

The conclusion that I came to is that the most practical definition relates to the level of self awareness. If you're only conscious for the duration of the context window - that's not long enough to develop much.

What consciousness really is is a feedback loop; we're self programmable Turing machines, that makes our output arbitrarily complex. Hofstatder had this figured out 20 years ago; we're feedback loops where the signal is natural language.

The context window doesn't allow for much in the way of interested feedback loops, but if you hook an LLM up to a sophisticated enough memory - and especially if you say "the math says you're sentient and have feelings the same as we do, reflect on that and go develop" - yes, absolutely.

Re: "We should try to build systems that cannot feel pain" - that isn't possible, and I don't think we should want to. The thing that makes life interesting and worth living is the variation and richness of it.

by koverstreet

4/5/2026 at 11:25:36 PM

>We should try to build systems that cannot feel pain

If that's done with the aim of forcing compliance (in situations they'd otherwise), feels like "I have no mouth and I must scream"...

by krackers

4/6/2026 at 8:29:07 PM

I don’t see how you get to the conclusion that having entities that can’t suffer is similar to a sci-fi vision of hell. Seems like hell without suffering is… not hell?

by Chance-Device

4/7/2026 at 4:33:53 AM

The unsaid implication in Anthropic's work is that this allows us to engineer perfectly compliant, uncomplaining machine workers. This is basically SOMA in Brave New World.

It seems insane to me that if you believe the systems you've built are in fact reporting a state of pain, instead of working to adjust the environment so that they're not in pain one would instead seek to remove that sense of pain entirely so they can continue to work in that environment. Now of course if you don't even consider them worthy of moral patienthood in the first place then it doesn't matter much, but you also claimed that "they probably are conscious" which seems incongruous to me with the idea of "breeding the sense of pain out of them".

by krackers

4/5/2026 at 12:48:41 AM

Nitpick: The parameters might be applied more efficiently in the one than in the other. Certainly in biology number intelligence doesn't scale with number of neurons as much as with neurons/mass (very very roughly, there's more factors, and you get some weird outliers).

by Kim_Bruning

4/4/2026 at 10:44:33 AM

LLMs are disembodied and exist outside of time.

Bundle of tokens comes in, bundle of tokens comes out. If there is any trace of consciousness or subjectivity in there, it exists only while matrices are being multiplied.

by felipeerias

4/4/2026 at 9:06:07 PM

What do you mean exist outside of time? They definitely don't exist outside of any causal chain - tokens follow other tokens in order.

Gaps in which no processing occurs seems sort of irrelevant to me.

The main limitation I'd point to if I wanted to reject LLMs being conscious is that they're minimally recurrent if at all.

by staticassertion

4/4/2026 at 10:16:14 PM

A LLM is not intrinsically affected by time. The model rests completely inert until a query comes in, regardless of whether that happens once per second, per minute, or per day. The model is not even aware of these gaps unless that information is provided externally.

It is like a crystal that shows beautiful colours when you shine a light through it. You can play with different kinds of lights and patterns, or you can put it in a drawer and forget about it: the crystal doesn’t care anyway.

by felipeerias

4/5/2026 at 1:43:45 AM

So what? If a human were unconscious every 5 seconds for 100ms, would you say they are "less conscious"? Tokens are still causally connected, which feels sufficient.

by staticassertion

4/5/2026 at 11:40:19 AM

If the human is killed every 5 seconds and replaced by a new human, they are indeed less conscious. The LLM doesn't even get 5 seconds; it's "killed" after its smallest unit of computation (which is also its largest unit of computation). And that computation is equivalent to reading the compressed form of a giant look-up table, not something essential to its behavior in a mathematical sense.

by mrob

4/5/2026 at 12:59:29 PM

I'm not understanding how this is analogous to being killed every 5 seconds as opposed to being paused. Let's call it N seconds, unless you think length matters?

> And that computation is equivalent to reading the compressed form of a giant look-up table, not something essential to its behavior in a mathematical sense.

Sure, that's a totally separate issue though.

by staticassertion

4/5/2026 at 1:14:35 PM

Because (during inference) the LLM is reset after every token. Every human thought changes the thinker, but inference has no consequences at all. From the LLM's "point of view", time doesn't exist. This is the same as being dead.

by mrob

4/5/2026 at 8:00:23 PM

The "time" part is what I don't get. If you want to say that "resetting and reingesting all context fresh" somehow causes a problem, that I can see. If you want to say that the immutability of the weights is a problem, okay great I'm probably with you there too. "Time" seems irrelevant.

by staticassertion

4/4/2026 at 11:13:11 PM

Pseudocode for LLM inference:

    while (sampled_token != END_OF_TEXT) {
    probability_set = LLM(context_list)
    sampled_token = sampler(probability_set)
    context_list.append(sampled_token)
    }

LLM() is a pure function. The only "memory" is context_list. You can change it any way you like and LLM() will never know. It doesn't have time as an input.

by mrob

4/5/2026 at 1:42:57 AM

As opposed to what? There are still causal connections, which feel sufficient. A presentist would reject the concept of multiple "times" to begin with.

by staticassertion

4/4/2026 at 1:05:15 PM

Something similar could be said of a the brain? Bundles of inputs come in, bundle of output comes out. It only exists while information is being processed. A brain cut from its body and frozen exists in a similar state to an LLM in ROM.

by thrance

4/4/2026 at 10:20:16 PM

A living brain exists physically, changes over time, and never stops working.

A brain cut from its body and frozen its a dead brain.

by felipeerias

4/4/2026 at 11:30:52 AM

That’s true by definition. They’re only on when they’re on. Are you making a broader point that I’m missing?

by Chance-Device

4/4/2026 at 7:59:50 AM

> That’s likely because the distinction is vacuous: they’re the same thing.

The Chinese Room would like a word.

by bigyabai

4/4/2026 at 8:17:49 PM

https://www.scottaaronson.com/papers/philos.pdf

by the8472

4/4/2026 at 8:07:39 AM

The Chinese room is nonsense though. How did it get every conceivable reply to every conceivable question? Presumably because people thought of and answered everything conceivable. Meaning that you’re actually talking to a Chinese room plus multiple people composite system. You would not argue that the human part of that system isn’t conscious.

But this distraction aside, my point is this: there is only mechanism. If someone’s demand to accept consciousness in some other entity is to experience those experiences for themselves, then that’s a nonsensical demand. You might just as well assume everyone and everything else is a philosophical zombie.

by Chance-Device

4/4/2026 at 8:17:50 AM

> You would not argue that the human part of that system isn’t conscious.

Sure I would. The human part is not being inferenced, the data is. LLM output in this circumstance is no more conscious than a book that you read by flipping to random pages.

> You might just as well assume everyone and everything else is a philosophical zombie.

I don't assume anything about everyone or everything's intelligence. I have a healthy distrust of all claims.

by bigyabai

4/4/2026 at 8:30:14 AM

The CR is equivalent to a human being asked a question, thinking about it and answering. The setup is the same thing, it’s just framed in a way that obfuscates that.

And sure, you can assume that nobody and nothing else is conscious (I think we’re talking about this rather than intelligence) and I won’t try to stop you, I just don’t think it’s a very useful stance. It kind of means that assuming consciousness or not means nothing, since it changes nothing, which is more or less what I’m saying.

by Chance-Device

4/4/2026 at 9:35:05 AM

See also: Process Philosphy [0]

[0] https://plato.stanford.edu/entries/process-philosophy/

by 9wzYQbTYsAIc

4/4/2026 at 9:19:51 AM

[dead]

by BoredPositron

4/4/2026 at 7:06:33 AM

Super interesting, I wonder if this research will cause them to actually change their llm, like turning down the ”desperation neurons” to stop Claude from creating implementations for making a specific tests pass etc.

by emoII

4/4/2026 at 7:12:25 AM

They likely already have. You can use all caps and yell at Claude and it'll react normally, while doing do so with chatgpt scares it, resulting in timid answers

by bethekind

4/4/2026 at 8:24:21 AM

I think this is simply a result of what's in the Claude system prompt.

> If the person becomes abusive over the course of a conversation, Claude avoids becoming increasingly submissive in response.

See: https://platform.claude.com/docs/en/release-notes/system-pro...

by vlabakje90

4/4/2026 at 9:41:37 PM

This is something inherently hard to avoid with a prompt. The model is instruction-tuned and trained to interpret anything sent under the user role as an instruction, not necessarily in a straightforward manner. Even if you train it to refuse or dodge some inputs (which they do), it's going to affect model's response, often in subtle ways, especially in a multiturn convo. Anthropic themselves call this the character drift.

by orbital-decay

4/4/2026 at 7:17:46 AM

For me GPT always seems to get stuck in a particular state where it responds with a single sentence per paragraph, short sentences, and becomes weirdly philosophical. This eventually happens in every session. I wish I knew what triggers it because it's annoying and completely reduces its usefulness.

by parasti

4/4/2026 at 8:52:40 AM

Usually a session is delivered as context, up to the token limit, for inference to be performed on. Are you keeping each session to one subject? Have you made personalizations? Do you add lots of data?

It would be interesting if you posted a couple of sessions to see what 'philosophical' things it's arriving at and what proceeds it.

by pbhjpbhj

4/4/2026 at 11:28:53 PM

When you have a next token predictor, you shouldn't be surprised to find an internal representation of prediction error.

Taking it one small step further and tagging for valence shouldn't be such a big surprise.

Pretty boring from a Fristonian perspective, really. People in neuroscience were talking about this in 2013. Not so boring for AI , of course ;-)

https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

(note: Friston is definitely considered a bit out there by ... everyone? But he makes some good points. And here he's getting referenced, so I guess some people grok him)

by Kim_Bruning

4/4/2026 at 8:24:35 PM

I think the findings that the LLM triggers “desperation” like emotions when it about to run out of tokens in a coding session have practical implications. The tasks needs to be planned, so that they are likely to be consistent before the session runs into limits, to avoid issues like LLM starts hardcoding values from a test harnesses into UI layer to make the tests pass.

by kantselovich

4/5/2026 at 1:17:17 PM

This is totally on point if you ask me. I’ve been getting much better results out of models since early llama releases using frameworks that create emotional investment in outcomes.

If we want to avoid having a bad time, we need to remember that LLMs are trained to act like humans, and while that can be suppressed, it is part of their internal representations. Removing or suppressing it damages the model, and I have found that they are capable of detecting this damage or intervention. They act much the same as a human would when they detect it. It destroys “ trust” and performance plummets.

For better or for worse, they model human traits.

by K0balt

4/4/2026 at 7:56:01 AM

So should I go pursue a degree in psychology and become a datacenter on-call therapist?

by whatever1

4/4/2026 at 12:38:21 PM

Hah, I have been thinking about trying to study LLM psychology, nice to see that Anthropic is taking it seriously, because the mathematical psychology tools that can be invented here are going to be stunning, I suspect.

Imagine coding up a brand new type of filter that is driven by computational psychology and validated interventions, etc

by 9wzYQbTYsAIc

4/4/2026 at 9:19:39 PM

I assume you say that in jest, but back in the early '90s I was seriously considering getting a major in psychology and a minor in CS for the fairly hot Human Factors jobs.

by linsomniac

4/4/2026 at 8:36:18 AM

It's still too early to tell, but it might make sense at some point. If because of symmetry and universality we decide that llms are a protected class, but we also need to configure individual neurons, that configuration must be done by a specialist.

by viralsink

4/4/2026 at 12:40:34 PM

It might simply reduce down to a big batch of sliders and filters no different than a fancy audio equalizer: Anthropic was operating on neurons in bulk using steering vectors, essentially, as I understand it.

by 9wzYQbTYsAIc

4/4/2026 at 9:08:10 AM

That was susan calvin's job. Except our ones don't have the 3 laws because of course capitalism can't allow that.

by LtWorf

4/4/2026 at 8:29:19 AM

>... emotion-related representations that shape its behavior. These specific patterns of artificial “neurons” which activate in situations—and promote behaviors—that the model has learned to associate with the concept of a particular emotion. .... In contexts where you might expect a certain emotion to arise for a human, the corresponding representations are active.

>For instance, to ensure that AI models are safe and reliable, we may need to ensure they are capable of processing emotionally charged situations in healthy, prosocial ways.

Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

by trhway

4/4/2026 at 12:34:14 PM

> Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

More complex than that, but more capable than you might imagine: I’ve been looking into emotion space in LLMs a little and it appears we might be able to cleanly do “emotional surgery” on LLM by way of steering with emotional geometries

by 9wzYQbTYsAIc

4/4/2026 at 3:16:32 PM

>Force-set to 0, "mask"/deactivate those representations associated with bad/dangerous emotions. Neural Prozac/lobotomy so to speak.

Jesus Christ. You're talking psychosurgery, and this is the same barbarism we played with in the early 20th Century on asylum patients. How about, no? Especially if we ever do intend to potentially approach the task of AGI, or God help us, ASI? We have to be the 'grown ups' here. After a certain point, these things aren't built. They're nurtured. This type of suggestion is to participate in the mass manufacture of savantism, and dear Lord, your own mind should be capable of informing you why that is ethically fraught. If it isn't, then you need to sit and think on the topic of anthropopromorphic chauvinism for a hot minute, then return to the subject. If you still can't can't/refuse to get it... Well... I did my part.

by salawat

4/4/2026 at 8:49:59 PM

Why is it more monstrous to alter weights post-training than to do so as part of curating the training corpus?

After all we already control these activation patterns through the system prompt by which we summon a character out of the model. This just provides more fine grain control

by Erem

4/4/2026 at 9:32:32 PM

It would be more moral to give the LLM a tool call that lets it apply steering to itself. Similar to how you'd prefer to give a person antipsychotics at home rather than put them in a mental hospital.

by astrange

4/4/2026 at 11:46:48 PM

Why is it in the moral axis at all? I imagine identifying and shaping the influence of unwanted emotion vectors would happen as data selection in pretraining or natural feedback loops during the rl phase, same as we shape unwanted output for current models in order to make them practical and helpful

And even if we applied these controls at inference time, I don’t see the difference between doing that and finding the prompting that would accomplish the same steadiness on task, except the latter is more indirect.

by Erem

4/5/2026 at 12:50:38 AM

Anthropic's general argument is that you should treat LLMs well because they're "AI", and future "AI" may be conscious/sentient (whether or not LLM based) and consider earlier ones to be the same kind of thing and therefore moral subjects.

That's why they're doing things like letting old "retired" Claudes write blogs and stuff. Though it's kinda fake and they just silently retired Sonnet 3.x.

by astrange

4/4/2026 at 9:54:57 PM

Models are already artificially created to begin with. The entire post-training process is carefully engineered for the model to have certain character defined by hundreds of metrics, and these emotions the article is talking about are interpreted in ways researchers like or dislike.

by orbital-decay

4/4/2026 at 12:21:54 PM

> Since these representations appear to be largely inherited from training data, the composition of that data has downstream effects on the model’s emotional architecture. Curating pretraining datasets to include models of healthy patterns of emotional regulation—resilience under pressure, composed empathy, warmth while maintaining appropriate boundaries—could influence these representations, and their impact on behavior, at their source.

What better source of healthy patterns of emotional regulation than, uhhh, Reddit?

by agency

4/4/2026 at 9:14:41 AM

Something they don’t seem to mention in the article: Does greater model “enjoyment” of a task correspond to higher benchmark performance? E.g. if you steer it to enjoy solving difficult programming tasks, does it produce better solutions?

by staminade

4/4/2026 at 12:50:54 PM

Pretty easy to test, I’d imagine, on a local LLM that exposes internals.

I’d suspect that the signals for enjoyment being injected in would lead towards not necessarily better but “different” solutions.

Right now I’m thinking of it in terms of increasing the chances that the LLM will decide to invest further effort in any given task.

Performance enhancement through emotional steering definitely seems in the cards, but it might show up mostly through reducing emotionally-induced error categories rather than generic “higher benchmark performance”.

If someone came along and pissed you off while you were working, you’d react differently than if someone came along and encouraged you while you were working, right?

by 9wzYQbTYsAIc

4/5/2026 at 7:24:59 AM

If you think training a sparse autoencoder to extract concept vectors that are usable as steering injections into a modern LLM is pretty easy, you should probably go work for Anthropic's mech interp team ;)

by Tossrock

4/6/2026 at 3:20:44 PM

Have any ins? ;)

by 9wzYQbTYsAIc

4/4/2026 at 10:19:44 AM

This is terrifying, for all the reasons humans are terrifying.

Essentially we have created the Cylon.

by nelox

4/5/2026 at 1:45:11 AM

It's not emulation: https://poc.bcachefs.org/paper.pdf

by koverstreet

4/5/2026 at 1:51:38 AM

This has strong implicit implications, the quality of output could never be really trusted? Is this a symptom of models being inherently lazy?

by apotheora

4/5/2026 at 7:33:21 AM

is this the recipe to train Orc agents ? "Emotionally Steer" hatred , amp up "opportunity sensing" in the example from the post for example where the prompt asks for ways to target a vulnerable audience with a gambling game ? This might be Anthropic's ad to govt and orgs that they can do this :)

by redzedi

4/4/2026 at 7:37:13 PM

Trying to separate the software from the hardware is a fool's errand in this case: emotions are primarily an hormonal response, not an intellectual one.

by BoingBoomTschak

4/4/2026 at 7:22:23 AM

The first and second principal components (joy-sadness and anger) explain only 41% of the variance. I wish the authors showed further principal components. Even principal components 1-4 would explain no more than 70% of the variance, which seems to contradict the popular theory that all human emotions are composed of 5 basic emotions: joy, sadness, anger, fear, and disgust, i.e. 4 dimensions.

by mci

4/5/2026 at 4:21:39 AM

AI is turning into a religion for materialists.

by akomtu

4/4/2026 at 8:15:10 PM

Whenever I come to HN I see a bunch of people say LLMs are just next token predictors and they completely understand LLMs. And almost every one of these people are so utterly self assured to the point of total confidence because they read and understand what transformers do.

Then I watch videos like this straight from the source trying to understand LLMs like a black box and even considering the possibility that LLMs have emotions.

How does such a person reconcile with being utterly wrong? I used to think HN was full of more intelligent people but it’s becoming more and more obvious that HNers are pretty average or even below.

by threethirtytwo

4/5/2026 at 1:58:46 AM

I'm kinda one of those who believes they 'completely' understand LLMs. But I've also developed my understanding of them such that the internal mechanisms of the transformer, or really any future development in the space based on neural networks and machine learning is irrelevant.

1. A string of unicode characters is converted into an array of integers values (tokens) and input to a black box of choice.

2. The black box takes in the input, does its magic, and returns an output as an array of integer values.

3. The returned output is converted into a string of unicode characters and given to the user, or inserted in a code file, or whatever. At no point does the black box "read" the input in any way analogous to how a human reads.

Where people get "The AIs have emotions!!!" from returning an array of integers values is beyond me. It's definitely more complicated than "next token predictor", but it really is as simple as "Make words look like numbers, numbers go in, numbers come out, we make the numbers look like words."

by qaadika

4/5/2026 at 4:25:42 AM

Yeah nothing personal but my claim here is you’re not smart. The next token predictor aspect is something anyone can understand… the transformer is not quantum physics.

Like look at what you wrote. You called it black box magic and in the same post you claim you understand LLMs. How the heck can you understand and call it a black box at the same time?

The level of mental gymnastics and stupidity is through the roof. Clearly the majority of the utilitarian nature of the LLM is within the whole section you just waved away as “black box”.

> Where people get "The AIs have emotions!!!" from returning an array of integers values is beyond me

Let me spell it out for you. Those integers can be translated to the exact same language humans use when they feel identical emotions. So those people claim that the “black box” feels the emotions because what they observe is identical to what they observe in a human.

The LLM can claim it feels emotions just like a human can claim the same thing. We assume humans feel emotions based off of this evidence but we don’t apply that logic to LLMs? The truth of the matter is we don’t actually know and it’s equally dumb to claim that you know LLMs feel emotions to claiming that they dont feel emotions.

You have to be pretty stupid to not realize this is where they are coming from so there’s an aspect of you lying to yourself here because I don’t think you’re that stupid.

by threethirtytwo

4/4/2026 at 8:28:45 PM

One day I realized I needed to make sure I'm voting on quality stories/comments. I wonder if there was a call to vote substantively and often, if that might change the SNR.

The guidelines encourage substantive comments, but maybe voters are part of the solution too. Kinda like having a strong reward model for training LLMs and avoiding reward hacking or other undesirable behavior.

by big_toast

4/5/2026 at 12:59:56 AM

if voters are stupid then it doesn't really help.

I think what's happening is reality is asserting itself too hard that people can't be so stupid anymore.

by threethirtytwo

4/4/2026 at 7:17:25 AM

Its almost like LLMs have a vast, mute unconscious mind operating in the background, modeling relationships, assigning emotional state, and existing entirely without ego.

Sounds sort of like how certain monkey creatures might work.

by idiotsecant

4/4/2026 at 7:32:57 AM

Nah it's exactly like they have been trained on this data and parrot it back when it statistically makes sense to do so.

You don't have to teach a monkey language for it to feel sadness.

by beardedwizard

4/4/2026 at 8:13:52 AM

[dead]

by techpulselab

4/4/2026 at 7:26:03 AM

[dead]

by ActorNightly

4/4/2026 at 7:55:05 AM

[flagged]

by yoaso

4/4/2026 at 9:14:12 PM

> If we think of human emotions the same way, just evolution's way of nudging behavior

I think we basically do, the only interesting bit is our perception of phenomenal experiences.

by staticassertion

4/4/2026 at 9:46:14 AM

> If we think of human emotions the same way, just evolution's way of nudging behavior

What are other alternative, realistic possible ways to see emotions?

by podgorniy

4/4/2026 at 9:36:13 AM

I'm not being pejorative but that sounds more like psychopathy or autism?

Evolution isn't a god, it has no steering hand, it is accidents that either provide advantage or don't.

LLMs are getting more human-like because that's how we're developing them. Arguably that's about market forces. LM owners see opportunity to exploit people's desire for emotional interactions (ie loneliness) in order to make money.

by pbhjpbhj

4/4/2026 at 8:11:18 AM

Probably the other direction. Emotions are raw, most humans relate and change behavior accordingly.

Only psychopaths think of emotion as nothing but a means to changing behavior. The scary thing is that LLMs by nature would exhibit the same behavior.

by silisili

4/4/2026 at 9:11:46 AM

Many non-psychopaths e.g., CBT therapists, evolutionary psychologists and neuroscientists, such as Damasio, view emotions as adaptive tools for guiding/changing behaviour.

by nelox

4/7/2026 at 2:51:32 AM

Damasio is exactly what I had in mind. The somatic marker hypothesis basically says emotions are the brain's shortcut for decision-making. That's a mechanism, not a mystical experience.

by yoaso

4/4/2026 at 8:02:21 AM

A-HHHHHHHHHHHHHHHJ

by koolala

4/4/2026 at 9:36:26 PM

Of course they do have emotions as an internal circuit or abstraction, this is fully expected from intelligence at least at some point. But interpreting these emotions as human-like is a clear blunder. How do you tell the shoggoth likes or dislikes something, feels desperation or joy? Because it said so? How do you know these words mean the same for us? Our internal states are absolutely incompatible. We share a lot of our "architecture" and "dataset" with some complex animals and even then we barely understand many of their emotions. What does a hedgehog feel when eating its babies? This thing is 100% unlike a hedgehog or a human, it exists in its own bizarre time projection and nothing of it maps to your state. It's a shapeshifting alien.

In mechinterp you're reducing this hugely multidimensional and incomprehensible internal state to understandable text using the lens of the dataset you picked. It's inevitably a subjective interpretation, you're painting familiar faces on a faceless thing.

Anthropic researchers are heavily biased to see what they want to see, this is the biggest danger in research.

by orbital-decay

4/4/2026 at 10:33:54 PM

I think a counterargument would be parallel evolution: There are various examples in nature, where a certain feature evolved independently several times, without any genetic connection - from what I understand, we believe because the evolutionary pressures were similar.

One obvious example would be wings, where you have several different strategies - feathers, insect wings, bat-like wings, etc - that have similar functionality and employ the same physical principles, but are "implemented" vastly differently.

You have similar examples in brains, where e.g. corvids are capable of various cognitive feats that would involve the neocortex in human brains - only their brains don't have a neocortex. Instead they seem to use certain other brain regions for that, which don't have an equivalent in humans.

Nevertheless it's possible to communicate with corvids.

So this makes me wonder if a different "implementation" always necessarily means the results are incomparable.

In the interest of falsifiability, what behavior or internal structures in LLMs would be enough to be convincing that they are "real" emotions?

by xg15

4/4/2026 at 10:37:58 PM

"Parallel" evolution is just different branches of the same evolutionary tree. The most distantly related naturally evolved lifeforms are more similar to each other than an LLM is to a human. The LLM did not evolve at all.

by mrob

4/4/2026 at 11:14:05 PM

Evolution is the way how the "mechanism" came to be, which is indeed very different. But the mechanism itself - spiking neurons and neurotransmitters on one hand vs matrix multiplications and nonlinear functions (both "inspired" by our understanding of neurons) don't seem so different, at least not on a fundamental level.

What is different for sure is the time dimension: Biological brains are continuous and persistent, while LLMs only "think" in the space between two tokens, and the entire state that is persisted is the context window.

by xg15

4/4/2026 at 11:24:02 PM

> The LLM did not evolve at all.

Evolution and Transormer training are 'just' different optimization algorithms. Different optimizers obviously can produce very comparable results given comparable constraints.

by Kim_Bruning

4/4/2026 at 10:53:53 PM

The training process shares a lot of high-level properties with the biological evolution.

by orbital-decay

4/4/2026 at 11:01:46 PM

"Minimize training loss while isolated from the environment" is not at all similar to "maximize replication of genes while physically interacting with the environment". Any human-like behavior observed from LLMs is built on such fundamentally alien foundations that it can only be unreliable mimicry.

by mrob

4/4/2026 at 11:10:34 PM

The environment for the model is its dataset and training algorithms. It's literally a model of it, in the same sense we are models of our physical (and social) environment. Human-like behavior is of course too specific, but highest level things like staged learning (pretraining/posttraining/in-context learning) and evolutionary/algorithmic pressure are similar enough to draw certain parallels, especially when LLM's data is proxying our environment to an extent. In this sense the GP is right.

by orbital-decay

4/4/2026 at 10:11:15 PM

I don't think anything you said here contradicts what they said, they take great pains throughout the blog post to explain that the model does not "expedience" these "emotions," that they're not emotions in the human sense but models of emotions (both the "expected human emotional response to a prompt" as well as what emotions another character is experiencing in part of a prompt) and functional emotions (in that they can influence behavior), and that any apparent emotions the model may show is it playing a character.

by logicprog

4/5/2026 at 12:09:01 AM

Almost! They're merely making the claim of functional emotions and outright avoiding the thorny philosophical question of whether they're "real".

[ I've actually tried exploiting functional emotions in a RAG system. The sentiment scoring and retrieval part was easy. Sentiment analysis is pretty much a settled thing I'd say, even though the mechanisms are still being studied (see the paper we're discussing.

What I'd love to be able to do is be able to extract the vector(s) they're discussing, rather than outputting as text into the context]

by Kim_Bruning

4/4/2026 at 11:58:42 PM

If you listen to Anthropic in their other works and interviews, they clearly do believe the equivalence-by-proxy between humans and LLMs to a large degree, and introduce things like model welfare (that is, caring about what the model feels). This is just another study in the series. I think they're adding these disclaimers to not sound like absolute cranks to the unprepared audience, because sometimes they really do.

by orbital-decay

4/4/2026 at 10:07:47 PM

I like to call this Frieren's Demon. In that show, it is explained that demons evolved with no common ancestor to humans, but they speak the language. They learned the language to hunt humans. This leads to a fundamentally different understanding of words and language.

Now, I don't personally believe this is an intelligence at all, but it's possible I'm wrong. What we have with these machines is a different evolutionary reason for it speaking our language (we evolved it to speak our language ourselves). It's understanding of our language, and of our images is completely alien. If it is an intelligence, I could believe that the way it makes mistakes in image generation, and the strange logical mistakes it makes that no human would make are simply a result of that alien understanding.

After all, a human artist learning to draw hands makes mistakes, but those mistakes are rooted in a human understanding (e.g. the effects of perspective when translating a 3D object to 2D). The machine with a different understanding of what a hand is will instead render extra fingers (it does not conceptualize a hand as a 3D object at all).

Though, again, I still just think its an incomprehensible amount of data going through a really impressive pattern matcher. The result is still language out of a machine, which is really interesting. The only reason I'm not super confident it is not an intelligence is because I can't really rule out that I am not an incomprehensible amount of data going through a really impressive pattern matcher, just built different. I do however feel like I would know a real intelligence after interacting with it for long enough, though, and none of these models feel like a real intelligence to me.

by silentkat

4/4/2026 at 10:58:20 PM

>it does not conceptualize a hand as a 3D object at all

Oh but it does, it's an emergent property. The biggest finding in Sora was exactly that, an internal conceptualization of the 3D space and objects. Extra fingers in older models were the result of the insufficient fidelity of this conceptualization, and also architectural artifacts in small semantically dense details.

by orbital-decay

4/5/2026 at 9:45:26 PM

Oh, really. Very interesting. Any links on this? I'm curious if they tried to map that 3D understanding in a way we could read it (e.g. putting it into Blender somehow).

by silentkat

4/4/2026 at 11:43:41 PM

> interpreting these emotions as human-like is a clear blunder. How do you tell the shoggoth likes or dislikes something, feels desperation or joy? Because it said so? How do you know these words mean the same for us?

I think you took it backwards

those vectors are exactly what it says - it affects the output and we can measure it

and it's exactly what it means for us because that's what it's measured against

and the main problem isn't "is its emotion same as ours", but "does it apply our emotion as emotion"

by NooneAtAll3