4/23/2025 at 2:13:13 PM
The hard part is that for all the things that the author says disprove LLMs are intelligent are failings for humans too.* Humans tell you how they think, but it seemingly is not how they really think.
* Humans tell you repeatedly they used a tool, but they did it another way.
* Humans tell you facts they believe to be true but are false.
* Humans often need to be verified by another human and should not be trusted.
* Humans are extraordinarily hard to align.
While I am sympathetic to the argument, and I agree that machines aligned on their own goals over a longer timeframe is still science fiction, I think this particular argument fails.
GPT o3 is a better writer than most high school students at the time of graduation. GPT o3 is a better researcher than most high school students at the time of graduation. GPT o3 is a better lots of things than any high school student at the time of graduation. It is a better coder than the vast majority of first semester computer science students.
The original Turing test has been shattered. We're building progressively harder standards to get to what is human intelligence and as we find another one, we are quickly achieving it.
The gap is elsewhere: look at Devin as to the limitation. Its ability to follow its own goal plans is the next frontier and maybe we don't want to solve that problem yet. What if we just decide not to solve that particular problem and lean further into the cyborg model?
We don't need them to replace humans - we need them to integrate with humans.
by ebiester
4/23/2025 at 2:30:48 PM
> GPT o3 is a better writer than most high school students at the time of graduation.All of these claims, based on benchmarks, don't hold up in the real world on real world tasks. Which is strongly supportive of the statistical model. It will be capable of answering patterns extensively trained on. But is quickly breaks down when you step outside that distribution.
o3 is also a significant hallucinator. I spent quite a bit of time with it last weekend and found it to be probably far worse than any of the other top models. The catch is that it its hallucinations are quite sophisticated. Unless you are using it on material for which you are extremely knowledgeable, you won't know.
LLMs are probability machines. Which means they will mostly produce content that aligns to the common distribution of data. They don't analyze what is correct, but only what is probable completions for your text by common word distributions. But when scaled to incomprehensible scales of combinatorial patterns, it does create a convincing mimic of intelligence and it does have its uses.
But importantly, it diverges from the behaviors we would see in true intelligence in ways that make it inadequate for solving many of the kinds of tasks we are hoping to apply them to. The being namely the significant unpredictable behaviors. There is just no way to know what type of query/prompt will result in operating over concepts outside the training set.
by 13years
4/23/2025 at 5:30:20 PM
I don't dispute that these are problems, but the fact that its hallucinations are quite sophisticated to me means that they are errors humans also could reach.I am not saying that the LLMs are better than you analyze but rather that average humans are worse. (Well trained humans will continue to be better alone than LLMs alone for some time. But compare an LLM to an 18 year old.)
by ebiester
4/23/2025 at 7:02:21 PM
Essentially, pattern matching can outperform humans at many tasks. Just as computers and calculators can outperform humans at tasks.So it is not that LLMs can't be better at tasks, it is that they have specific limits that are hard to discern as pattern matching on the entire world of data is kind of an opaque tool in which we can not easily perceive where the walls are and it falls completely off the rails.
Since it is not true intelligence, but a good mimic at times, we will continue to struggle with unexpected failures as it just doesn't have understanding for the task given.
by 13years
4/23/2025 at 3:39:10 PM
> o3 is also a significant hallucinator. I spent quite a bit of time with it last weekend and found it to be probably far worse than any of the other top models. The catch is that it its hallucinations are quite sophisticated. Unless you are using it on material for which you are extremely knowledgeable, you won't know.At least 3/4 of humans identify with a religion which at best can be considered a confabulation or hallucination in the rigorous terms you're using to judge LLMs. Dogma is almost identical to the doubling-down on hallucinations that LLMs produce.
I think what this shows about intelligence in general is that without grounding in physical reality it tends to hallucinate from some statistical model of reality and confabulate further ungrounded statements without strong and active efforts to ground each statement in reality. LLMs have the disadvantage of having no real-time grounding in most instantiations; Gato and related robotics projects exempted. This is not so much a problem with transformers as it is with the lack of feedback tokens in most LLMs. Pretraining on ground truth texts can give an excellent prior probability of next tokens and I think feedback either in the weights (continuous fine-tuning) or real-world feedback as tokens in response to outputs can get transformers to hallucinate less in the long run (e.g. after responding to feedback when OOD)
by benlivengood
4/23/2025 at 7:30:54 PM
Arguing that many humans are stupid or ignorant does not support the idea that an LLM is intelligent. This argument is reductive in that it ignores the many, many diverse signals influencing the part of the brain that controls speech. Comparing a statistical word predictor and the human brain isn’t useful.by ary
4/24/2025 at 1:47:35 AM
I'm arguing that it's natural for intelligent beings to hallucinate/confabulate in the case where ground truth can't be established. Stupidity does not apply to e.g. Isaac Newton or Kepler who were very religious, and any ignorance wasn't due to a fault in their intelligence per se. We as humans make our best guesses for what reality is even in the cases where it can't be grounded, e.g. string theory or M-theory if you want a non-religious example.Comparing humans to transformers is actually an instance of the phenomenon; we have an incomplete model of "intelligence" and we posit that humans have it but our model is only partially grounded. We assume humans ~100% have intelligence, are unsure of which animals might be intelligent, and are arguing about whether it's even well-typed to talk about transformer/LLM intelligence.
by benlivengood
4/23/2025 at 8:33:31 PM
> religion which at best can be considered a confabulation or hallucination in the rigorous terms you're using to judge LLMsNon-religious people are not exempt. Everyone has a worldview (or prior commitments, if you like) through which they understand the world. If you encounter something that contradicts your worldview directly, even repeatedly, you are far more likely to "hallucinate" an understanding of the experience that allows your worldview to stay intact than to change your worldview.
I posit that humans are unable to function without what amounts to a religion of some sort -- be it secular humanism, nihilism, Christianity, or something else. When one is deposed at a societal level, another rushes in to fill the void. We're wired to understand reality through definite answers to big questions, whatever those answers may be.
by TimTheTinker
4/23/2025 at 4:06:01 PM
> LLMs are probability machines.So too are humans, it turns out.
by montroser
4/23/2025 at 7:06:06 PM
We are capable of much more, which is why we can perform tasks when no prior pattern or example has been provided.We can understand concepts from the rules. LLMs must train on millions of examples. A human can play a game of chess from reading the instruction manual without ever witnessing a single game. This is distinctly different than pattern matching AI.
by 13years
4/23/2025 at 9:13:58 PM
Citation needed.by goatlover
4/23/2025 at 3:05:01 PM
Were you using it with search enabled?by KTibow
4/23/2025 at 5:26:40 PM
Humans can be deceptive, but it is usually deliberate. We can also honestly make things up and present them as fact but it is not that common, we usually say that we don't know. And generally, lying is harder for us than telling the truth, in the sense that making a consistent but false narrative requires effort.For LLMs, making stuff up is the default, one can argue that it is all they do, it just happens to be the truth most of the times.
And AFAIK, what I would call the "real" Turing test hasn't been shattered by far. The idea is that the interrogator and the human subject are both experts and collaborate against the computer. They can't cheat by exchanging secrets, but anything else is fair game.
I think it is important because the Turing test has already been "won" by primitive algorithms acting clueless to interrogators who were not aware of the trick. For me, this is not really a measure of computer intelligence, more like a measure of how clever the chatbot designers were at tricking unsuspecting people.
by GuB-42
4/23/2025 at 5:46:39 PM
> we usually say that we don't knowI think this is one of the distinguishing attributes of human failures. Human failures have some degree of predictability. We know when we aren't good at something, we then devise processes to close that gap. Which can be consultations, training, process reviews, use of tools etc.
The failures we see in LLMs are distinctly of a different nature. They often appear far more nonsensical and have more of a degree of randomness.
The LLMs as a tool would be far more useful if they could indicate what they are good at, but since they cannot self reflect over their knowledge, it is not possible. So they are equally confident in everything regardless of its correctness.
by 13years
4/23/2025 at 6:17:47 PM
I think the last few years are a good example of how this isn't really true. Covid came around and everyone became an epidemiologist and public health expert. The people in charge of the US government right now are also a perfect example. RFK Jr. is going to get to the bottom of autism. Trump is ruining the world economy seemingly by himself. Hegseth is in charge of the most powerful military in the world. Humans pretending they know what they're doing is a giant problem.by ragequittah
4/23/2025 at 6:45:13 PM
They are different contexts of errors. Take any of these humans in your example, and give them an objective task, such as take any piece of literal text and reliably interpret its meaning and they can do so.LLMs cannot do this. There are many types of human failures, but we somewhat know the parameters and context of those failures. Political/emotional/fear domains etc have their own issues, but we are aware of them.
However, LLMs cannot perform purely objective tasks like simple math reliably.
by 13years
4/23/2025 at 7:47:51 PM
> Take any of these humans in your example, and give them an objective task, such as take any piece of literal text and reliably interpret its meaning and they can do so.I’m not confident that this is so. Adult literacy surveys (see e.g. https://nces.ed.gov/use-work/resource-library/report/statist...) consistently show that most people can’t reliably interpret the meaning of complex or unfamiliar text. It wouldn’t surprise me at all if RFK Jr. is antivax because he misunderstands all the information he sees about the benefits of vaccines.
by SpicyLemonZest
4/23/2025 at 9:31:15 PM
Yeah humans can be terrible. I am not sure what is the argument here. Does that make it ok to use software that can be just as terrible?by namaria
4/24/2025 at 1:05:17 AM
Depends on the context. I've seen a lot of value from deploying LLMs in things like first-line customer support, where a suggestion that works 60% of the time is plenty valuable, especially if the bot can crank it out in 10 seconds when a human would take 5-10 minutes to get on the phone.by SpicyLemonZest
4/24/2025 at 6:30:15 AM
I too have seen economic value be collected by terrible things that absolutely not exist in my opinion. Your example fits the bill.Profitability is not the absolute measure of what should exist.
by namaria
4/24/2025 at 6:18:43 PM
I'm not sure what you're referring to, since profitability wasn't a metric I used. I agree not all profitable things should exist, but increasing the availability of customer support seems to me like a clearly good thing.Perhaps you're thinking that profit-chasing is the only reason companies don't offer good customer support today? That's not accurate. Providing enough smart, well-resourced human beings to answer every question your customers can come up with is a huge operational challenge, unless your product is absolutely dead simple or you're small enough to make random employees help in their spare time.
by SpicyLemonZest
4/24/2025 at 6:30:55 PM
> I've seen a lot of value from deploying LLMs in things like first-line customer support, where a suggestion that works 60% of the time is plenty valuableValuable to whom? How so?
At that rate I would end my business with such a company.
If you claim such a terrible level of costumer support is valuable, I question your judgement of value.
by namaria
4/24/2025 at 8:31:20 PM
Valuable to customers, because it allows them to get instant advice that will often solve their problem. I strongly suspect that some companies you do business with have already integrated LLMs into their customer support workflow - it's very common these days.by SpicyLemonZest
4/25/2025 at 4:50:57 AM
Hard disagree on instant suggestions with 40% miss rate being valuable. I want support not half backed guesses.by namaria
4/23/2025 at 8:24:59 PM
> most people can’t reliably interpret the meaning of complex or unfamiliar textBut LLMs fail the most basic tests of understanding that don't require complexity. They have read everything that exists. What would even be considered unfamiliar in that context?
> RFK Jr. is antivax because he misunderstands all the information he sees about the benefits of vaccines.
These are areas where information can be contradictory. Even this statement is questionable in its most literal interpretation. Has he made such a statement? Is that a correct interpretation of his position?
The errors we are criticizing in LLMs are not areas of conflicting information or difficult to discern truths. We are told LLMs are operating at PhD level. Yet, when asked to perform simpler everyday tasks, they often fail in ways no human normally would.
by 13years
4/24/2025 at 12:57:08 AM
> But LLMs fail the most basic tests of understanding that don't require complexity.Which basic tests of understanding do state-of-the-art LLMs fail? Perhaps there's something I don't know here, but in my experience they seem to have basic understanding, and I routinely see people claim LLMs can't do things they can in fact do.
by SpicyLemonZest
4/24/2025 at 1:15:32 AM
Take a look at this vision test - https://www.mindprison.cc/i/143785200/the-impossible-llm-vis...It is an example that shows the difference between understanding and patterns. No model actually understands the most fundamental concept of length.
LLMs can seem to do almost anything for which there are sufficient patterns to train on. However, there aren't infinite patterns available to train on. So, edge cases are everywhere. Such as this one.
by 13years
4/24/2025 at 1:41:05 AM
I don't see how this shows that models don't understand the concept of length. As you say, it's a vision test, and the author describes how he had to adversarially construct it to "move slightly outside the training patterns" before LLMs failed. Doesn't it just show that LLMs are more susceptible to optical illusions than humans? (Not terribly surprising that a language model would have subpar vision.)by SpicyLemonZest
4/24/2025 at 4:06:51 AM
But it is not an illusion, and the answers make no sense. In some cases the models pick exactly the opposite answer. No human would do this.Yes, outside the training patterns is the point. I have no doubt if you trained LLMs on this type of pattern with millions of examples it could get the answers reliably.
The whole point is that humans do not need data training. They understand such concepts from one example.
by 13years
4/23/2025 at 4:24:11 PM
I think TA's argument fundamentally rests on two premises (quoting):(a) If we were on the path toward intelligence, the amount of training data and power requirements would both be reducing, not increasing.
(b) [LLMs are] data bound and will always be unreliable as edge cases outside common data are infinite.
The most important observed consequences of (b) are model collapse when repeatedly fed LLM output in further training iterations; and increasing hallucination when the LLM is asked for something truly novel (i.e. arising from understanding of first principles but not already enumerated or directly implicated in its training data).
Yes, humans are capable of failing (and very often do) in the same ways: we can be extraordinarily inefficient with our thoughts and behaviors, we can fail to think critically, and we can get stuck in our own heads. But we are capable of rising above those failings through a commitment to truths (or principles, if you like) outside of ourselves, community (including thoughtful, even vulnerable conversations with other humans), self-discipline, intentionality, doing hard things, etc...
There's a reason that considering the principles at play, sitting alone with your own thoughts, mulling over a problem for a long time, talking with others and listening carefully, testing ideas, and taking thoughtful action can create incredibly valuable results. LLMs alone won't ever achieve that.
by TimTheTinker
4/23/2025 at 9:20:29 PM
But there are places where humans do follow reasoning steps, such as arithmetic and logic. The fact that we need to add RLHF to models to make them more useful humans is also evidence that statistical reasoning is not enough for a general intelligence.by goatlover
4/23/2025 at 3:25:19 PM
How many books or software wrote by recently graduated students have you read/use?And by LLMs?
by Yeask
4/23/2025 at 7:24:26 PM
> * Humans tell you how they think, but it seemingly is not how they really think.> * Humans tell you facts they believe to be true but are false.
Think and believe are key words here. I'm not trying to be spiritual but LLMs do not think or believe a thing, they only predict the next word.
> * Humans often need to be verified by another human and should not be trusted.
You're talking about trusting another human to do that though, so you trust the human that is verifying.
by nopelynopington
4/24/2025 at 3:54:54 AM
LLMs aren't as good as average humans, most software folks like us like to believe rest of the world is dumb but it isn't.My grand-dad who only ever farmed land and had no education at all could talk, calculate, and manage his farm land. Was he not as good as me at anything academic yes. But
I will never be as good at him at understanding how to farm stuff.
Most people in who think LLMs are smart seem to conflate ignorance/lack of knowledge to being dumb.
This is a rather reductive take, and no I don't believe that's how human intelligence works.
Your dumb uncle on thanks giving who might even have a lot of bad traits isn't dumb likely just ignorant. All human IQ studies and movies have distorted our perception of intelligence.
A more intelligent person doesn't necessarily need to have more or better quality knowledge.
Or measuring them with academic abilities like writing and maths is the dumber/irresponsible take.
And yes please feel free to call me dumb in response.
by minraws