Don’t let an LLM make decisions or execute business logic

4/1/2025 at 3:19:54 AM

I think there's a more general bifurcation here, between logic that:

1. Intrinsically needs to be precise, rigid, even fiddly, or

2. Has only been that way so far because that's how computers are

1 includes things like security, finance, anything involving contention between parties or that maps to already-precise domains like mathematics or a game with a precise ruleset

2 will be increasingly replaced by AI, because approximations and "vibes-based reasoning" were actually always preferable for those cases

Different parts of the same application will be best suited to 1 or 2

by brundolf

4/1/2025 at 3:23:10 AM

What are some examples of #2?

by senordevnyc

4/1/2025 at 3:29:08 AM

Autosorting, fuzzy search, document analysis, identifying posts with the same topic, and sentiment analysis all benefit from AI's soft input handling.

by Feathercrown

4/1/2025 at 4:16:20 AM

fuzzy search

I do NOT want search to become any fuzzier than it already is.

See the great decline of Google's search results, which often don't even have all the words you're asking about and likely omits the one that's most important, for a great example.

by userbinator

4/1/2025 at 6:24:20 AM

> fuzzy search

> I do NOT want search to become any fuzzier than it already is.

For a specialized shop site you may want it. Search term: "something 150", the client is looking for a 1.5m something, if you're doing an exact text search your search engine will give you a lot of noise. Or you'll have to fiddle with synonyms, dictionaries and how you index your products with a huge chance to break other types of search queries.

by arkh

4/2/2025 at 10:08:26 AM

Sounds like a trillion dollar killer app.

by player1234

4/1/2025 at 8:58:49 AM

How many sites will have useful results to return for a "something 150"? Muzzle width? Bees? T-shirt size? Walking distance? You surely cannot want _all_ these categories yet you'll get them all in a list. I might be biased but today's fuzzy search is a dumpster fire, sites hating to return only two results so they bury anything relevant in a tidal wave of unrelated garbage. I have office mates like that and everybody hates them as well.

by soco

4/1/2025 at 12:20:23 PM

My current case is: whatever you'll look for in a hardware store. So anything yeah: muzzle width, wood length, protective gear, liquid quantities, animal food etc.

And depending on the client vertical they tend to not use the same vocabulary when looking for products.

But contrary to some other comments I know LLM are not magical tools and anything we use will require data to fine tune whatever base model we choose. And it will be used on top of standard text search not as a full replacement. I'm sure many companies are currently doing the exact same thing or will be soon enough.

by arkh

4/1/2025 at 9:58:10 AM

But this is why LLMs are so amazing. They understand context and nuance, and they have reasoning skills now. So you will not get a long list of garbage from a good model.

by ZeroTalent

4/1/2025 at 10:42:23 AM

Do you know such models or is this wishful thinking?

by soco

4/1/2025 at 2:19:50 PM

o3, reasoner.com, and complex setups of "thinking" workflows for sonnet 3.7, gemini 2.5 pro, and o1-pro

Gemini 2.5 pro is basically free.

Also watsonx, but that's b2b.

by ZeroTalent

4/1/2025 at 6:27:58 AM

Just because Google is doing it bad doesn’t mean it has to be bad.

by kolinko

4/1/2025 at 3:42:04 PM

I want both fuzzy search and exact search. Google still has the "I'm feeling lucky" button, so it can support multiple search buttons. It could default to fuzzy search and have an "I'm feeling unlucky" button for exact search.

by mrob

4/1/2025 at 5:25:01 AM

Just yesterday saw this great example: [0].

[0] https://grumpy.website/1642

by Joker_vD

4/1/2025 at 6:06:30 AM

I don't necessarily want search to become any fuzzier than it already is either, but what's happened has happened and I've already responded to the decline of traditional search engines. Nowadays I pretty much only search duckduckgo with site:(something), or else I ask perplexity the question and for some links. Traditional search engines now just give a thousand SEOed-to-death articles, probably generated by ai, from hundreds of pointless third party websites that just have the same basic milk.

It might be that it's worth it to bifurcate soon. Search indexes and AI engines, doing different roles. The index would have to be sorted with AI though - to focus on original and first-party material and to downrank ad-driving slop.

by squiggleblaz

4/1/2025 at 5:05:01 AM

These are fuzz tolerant, not preferred. Stable and high quality results would still be ideal.

by jayd16

4/1/2025 at 6:38:20 AM

Anything people ask a human to do instead of a computer.

Humans are not the most reliable. If you're ok giving the task to a human then you're ok with a lower level of relisbility than a traditional computer program gives.

Simple example: Notify me when a web page meaningfully changes and specify what the change is in big picture terms.

We have programs to do the first part: Detecting visual changes. But filtering out only meaningful changes and providing a verbal description? Takes a ton of expertise.

With MCP I expect that by the end of this year a nonprogrammer will be able to have an LLM do it using just plugins in a SW.

by BeetleB

4/1/2025 at 7:21:07 AM

Not anything - it wouldn't be a great idea to give an LLM the ability to spend money, but we let humans do it all the time.

by ajb

4/1/2025 at 7:56:24 AM

With suitable safeguards or limits on what it can spend why not? On the one hand it might not fear repercussions as a human would, on the other hand it’s far less likely to embezzle funds to support its overly lavish lifestyle or gambling addiction.

by dambi0

4/1/2025 at 8:34:23 AM

Yeah, you could marry an AI and share a bank account with it, and now it could buy you useful stuff it thinks you need without you doing anything, or even buy you presents.

by Jensson

4/1/2025 at 3:15:37 PM

I don't know about you, but even as a senior engineer, my employer hasn't given me the ability to spend money :-) It's not something employers normally do.

And as was pointed out, if you use something like MCP, you can control what it spends on. You can limit the amount, and limit to a whitelist. It may still occasionally buy the wrong thing, but the wrong thing will be something you preapproved.

by BeetleB

4/1/2025 at 8:44:08 AM

We don’t let LLM’s spend money yet but many businesses make bank letting computers automatically buy and sell things.

by iamacyborg

4/1/2025 at 9:59:05 AM

The software they "let" do that is at the opposite end of the scale in terms of how well it is understood, specified and tested. Or they "lose bank".

by ajb

4/1/2025 at 7:09:32 AM

To elaborate — the task definition itself is vague enough that any evaluation will necessarily be vibes based. There is fundamentally no precise definition of correctness/reliability.

by ssivark

4/1/2025 at 3:35:07 AM

I am not a frontend dev but centering a div came to mind.

I just want to center the damn content. I don't much care about the intricacies of using auto-margin, flexbox, css grid, align-content, etc.

by wcfrobert

4/1/2025 at 4:42:46 AM

I'm afraid that css is so broken that even AI won't help you to generalize centering content. Otoh, in the same spirit you are now a proficient ios/android developer where it's just "center content - BOOM!".

by wruza

4/1/2025 at 4:49:35 AM

I know this is a meme but centering a div is really not hard.

15 years ago it was just a google away, im sure AI can handle it fine.

by bawolff

4/1/2025 at 5:02:16 AM

Why do you think this is only a meme? Flow modes, centering methods and content are still at odds with each other and don't generalize. This idiotic model cannot get it right unless you're designing for a very specific case that will shatter as soon as you bump its shoulder.

Edit: I've been in the AI CSS BS loop just a few days ago, not sure how you guys miss it. I start screaming f-'s and "are you an idiot" when it cycles through "doesn't work", "ignored prereqs" and "doesn't make sense at all".

by wruza

4/1/2025 at 6:25:59 PM

Just do everything with flexbox. https://flexboxfroggy.com is a good example of what's possible

by crooked-v

4/1/2025 at 7:06:42 PM

What if I have text nodes in the mix? And I don't know that in advance, e.g. I'm doing <div>{content}</>? What if this div is in a same-class flexbox and now its margins or --vars clash with the defaults of its parent, which it knows nothing about by the principle of isolation? Then you may suggest using wrapper boxes for all children, but how e.g. align-baseline crosses that border is now a mystery that depends on a bunch of other properties at each side.

Your reply is correct, but it's exactly that "just do this specific configuration" sort of correct, which punctures component isolation all the way through and makes these layers leak into each other, creating a non-refactorable mess.

by wruza

4/1/2025 at 3:45:14 AM

That doesn't seem like a #2 scenario, unless you're okay with your centered divs not being centered some of the time.

by kevingadd

4/1/2025 at 4:01:46 AM

looking at most websites, regardless of how much money and human energy has been spent on them:

yes I think we're okay with divs not being centered some of the time.

many millions have been spilled to adjust pixels (while failing to handle loads of more common issues), but most humans just care if they can eventually get what they want to happen if they press the button harder next time.

(I am not an LLM-optimist, but visual layout is absolutely somewhere that people aren't all that picky about edge cases, because the success rate is abysmally low already. it's like good translations: it can definitely help, and definitely be worth the money, but it is definitely not a hard requirement - as evidence I point to the vast majority of translated software.)

by Groxx

4/1/2025 at 6:45:47 AM

Humans can extract information quicker from proper layouts. A good layout brings faster clarity in your head. What developers often get wrong: it's not just about doing something, it's also about how simple and fast to parse and understand it was (from a visual point of view as well, of course information architecture and UX matter a lot as well). Not aligning things is a slippery slope. If you can't center a div, probably all the other things that are more complex in your website / app are going to be off or even broken. Thankfully AIs can center divs by now, but proper grid systems understanding is at best frontier.

by maigret

4/1/2025 at 4:31:00 AM

I could imagine a vision-enabled transformer model being useful to create a customizable “reading mode”, that adjusts page layout based on things like user prefs, monitor/window size, ad recognition, visual detail of images, information density of the text, etc.

Maybe in an alternate universe where every user-agent enabled browser had this type of thing enabled by default, most companies would skip site design all together and just publish raw ad copy, info, and images.

by tacotime

4/1/2025 at 3:44:20 AM

Are you describing coding html via LLM or actually using the llm as a rendering engine for ui

by darepublic

4/1/2025 at 4:19:43 AM

Neither. They're describing the philosophical similarities of:

  * "Has only been that way so far because that's how computers are" and
  * "I just want to center the damn content.
     I don't much care about the intricacies of using
     auto-margin, flexbox, css grid, align-content, etc."

Centering a div is seen as difficult because complexities that boil down to "that's just how computers are", and they find (imo rightful) frustration in that.

by t-writescode

4/1/2025 at 4:04:50 AM

> I don't much care about the intricacies of using auto-margin, flexbox, css grid, align-content, etc.

You do / did care, e.g. browser support.

by re-thc

4/1/2025 at 4:21:01 AM

This sounds like a front-end dev that understands the intricacies of all of this when, again, this person is saying "I just want the content centered".

by t-writescode

4/1/2025 at 7:10:54 AM

> again, this person is saying "I just want the content centered".

You can't just want. It always backfires. It's called being ignorant. There are always consequences. I just want to cross the road without caring too. Oh the cars might just hit me. Doesn't matter?

> This sounds like a front-end dev that understands the intricacies of all of this

That's the person that's supposed to do this job? Sounds bog standard. What's the problem?

by re-thc

4/1/2025 at 4:52:22 AM

At some point this is just silly.

If you're assuming the user knows nothing then all tasks are hard. Ever try putting an image in a page if you don't know HTML? It's pretty tricky.

by bawolff

4/1/2025 at 4:58:29 AM

At some point, sure; but there is always value in comprehending why someone might find an existing flow overly obtuse and/or frustrating when they "just want to do a simple thing".

To imagine otherwise reminds me of The Infamous Dropbox Comment.

Addendum: to wit, whole companies, like SquareSpace and Wix, exist because web dev is a pain and WYSWIG editors help a lot

by t-writescode

4/1/2025 at 7:14:25 AM

> Addendum: to wit, whole companies, like SquareSpace and Wix, exist because web dev is a pain and WYSWIG editors help a lot

But these companies DO care (or at least that's the point) and don't "just want to do a simple thing".

The point of outsourcing is to give it to a professional with expertise like seeing a doctor. Dropbox isn't "just a simple thing" either, so no not the same.

by re-thc

4/1/2025 at 5:33:36 AM

The human or "natural" interface to the outside world. Interpreting sensor data, user interfacing (esp natural language), art and media (eg media file compression), even predictions of how complex systems will behave

by brundolf

4/1/2025 at 3:42:41 AM

I unironically use llm for tax advice. It has to be directionally workable and 90% is usually good enough. Beats reddit and the first page of Google, which was the prior method.

by s1artibartfast

4/1/2025 at 4:34:54 AM

That is search. Like Google, you need to verify accuracy of what you get told. An LLM that talks then quotes only government docs would be best so you can quickly check. Any conclusions the LLM makes about tax are suspect.

by blatantly

4/1/2025 at 4:54:22 AM

I think you miss my point. A 100% accurate llm would also be helpful, but is a different use case. Sometimes the tax guidance are incomplete or debatable. Sometimes reasonable, plausible, or acceptable is the target.

by s1artibartfast

4/1/2025 at 4:42:41 AM

For every program in production there are 1000s of other programs that accomplish exactly the same output despite having a different hash.

by Sevii

4/1/2025 at 4:47:53 AM

I wouldnt take that too literally, since that is the halting problem.

I suppose AI can provide a heuristic useful in some cases.

by bawolff

4/1/2025 at 4:17:44 AM

Translating text; writing a simple but not trivial python function; creating documentation from code.

by brookst

4/1/2025 at 6:13:44 AM

Shopping assistant for subjective purchases. I use LLMs to decide on gifts, for example. You input the person's interests, demographics, hobbies, etc. and interactively get a list of ideas.

by dharmab

4/1/2025 at 3:26:46 AM

Automated UI tests, perhaps.

by dfabulich

4/1/2025 at 5:07:22 AM

I think the only thing where you could argue is it's preferred is creative tasks like fictional writing, words smithing, and image generation where realism is not the goal.

by jayd16

4/1/2025 at 3:45:41 AM

Absolutely any kind of classifier.

by peterldowns

4/1/2025 at 3:59:35 AM

I used Copilot to play a game "guess the country" where I hand it a list of names, and ask it to guess their country of origin.

Then I handed it the employee directory.

Then I searched by country to find native speakers of languages who can review our GUI translation.

Some people said they don't speak that language (e.g. they moved country when they were young, or the AI guessed wrong). Perhaps that was a little awkward, but people didn't usually mind being asked, and overall have been very helpful in this translation reviewing project.

by peterburkimsher

4/1/2025 at 4:24:57 AM

I see the ".fr" in your profile; but, in the United States, that activity would almost certainly be a conversation with HR.

If you really, really wanted help with a translation project and you didn't want to pay, professional translators (which you should do since translation-by-meaning requires fluency or beyond in both languages), then there are more polite ways of asking this information than cold-calling every person with a "regional" sounding name and saying "hey, you know [presumed mother tongue]?"

by t-writescode

4/1/2025 at 4:44:09 AM

[flagged]

by aboardRat4

4/1/2025 at 4:55:01 AM

... to be clear, you're saying that banning prejudicial activities at companies is a reflection of how "entitled" the US has grown to be?

You understand why they're banned, right? We have a very recent and loud history about why we ban discrimination like that - or at least did.

by t-writescode

4/1/2025 at 5:28:26 AM

I don't really care about your reasons, to be honest.

You are losing competitiveness, we, on the other side of the world, are gaining.

As a result, you will be buying our goods, not the other way round, and that is the only thing I truly care about.

by aboardRat4

4/1/2025 at 5:44:46 AM

Not about justice, racial equality, the commonwealth of all mankind?

Thankfully it's likely China, not the EU, that will end up ahead at the end of this scuffle.

by achierius

4/1/2025 at 2:02:20 PM

> Thankfully it's likely China, not the EU

Why is that "thankfully"? Is China less racist than EU?

by Jensson

4/1/2025 at 5:43:53 AM

Are you suggesting France is specifically gaining competitiveness through applied racism?

Sorry, I'd rather be uncompetitive than stoop to that

by achierius

4/1/2025 at 5:59:39 AM

There's nothing racist about what he said. It's not racist, or even particularly impolite, to nicely ask someone "hey, I noticed you have x name, are you from $country by any chance?"

by bigstrat2003

4/1/2025 at 5:31:45 AM

A good chunk of Americans would have ended up with GUIs in Polish, just sayin’

by WesolyKubeczek

4/1/2025 at 3:13:33 AM

Good post. I recently built a choose-your-own-adventure style educational game at work for a hackathon.

Prompting an LLM to generate and run a game like this gave immediate impressive results, 10 mins after starting we had something that looked great. The problem was that the game sucked. It always went 3-4 rounds of input regardless. It constantly gave the game away because it had all the knowledge in the context, and it just didn't have the right flow at all.

What we ended up with at the end of the ~2 days was a whole bunch of Python orchestrating 11 different prompts, no cases where the user could directly interact with the LLM, only one case where we re-used context across multiple queries, and a bunch of (basic) RAG to hide game state from the LLM until the user caused it to be revealed through their actions.

LLMs are best used as small cogs in a bigger machine. Very capable, nearly magic cogs, but orchestrated by a lot of regular engineering work.

by danpalmer

4/1/2025 at 3:19:23 AM

  Prompting an LLM to generate and run a game like this gave immediate impressive results, 10 mins after starting we had something that looked great. The problem was that the game sucked. It always went 3-4 rounds of input regardless. It constantly gave the game away because it had all the knowledge in the context, and it just didn't have the right flow at all.

I'm confused. Did you ask the LLM to write the game in code? Or did the LLM run the entire game via inference?

Why do you expect that the LLM can generate the entire game with a few prompts and work exactly the way you want it? Did your prompt specify the exact conditions for the game?

by aurareturn

4/1/2025 at 3:34:31 AM

> Or did the LLM run the entire game via inference?

This, this was our 10 minute prototype, with a prompt along the lines of "You're running a CYOA game about this scenario...".

> Why do you expect that the LLM can generate the entire game with a few prompts

I did not expect it to work, and indeed it didn't, however why it didn't work wasn't obvious to the whole group, and much of the iteration process in the hackathon was breaking things down into smaller components so that we could retain more control over the gameplay.

One surprising thing I hinted at there was using RAG not for its ability to expose more info to the model than can fit in context, but rather for its ability to hide info from the model until its "discovered" in some way. I hadn't considered that before and it was fun to figure out.

by danpalmer

4/1/2025 at 4:57:43 AM

> using RAG not for its ability to expose more info to the model than can fit in context, but rather for its ability to hide info from the model until its "discovered" in some way

Would you be willing to expand on this?

by apothegm

4/1/2025 at 10:32:28 AM

Yeah sure. The problem we had was that we had some "facts" to base the game on, but when the LLM generated multiple choice choose-you-own-adventure style options, they would end up being leading questions towards the facts, i.e. the LLM knows what's behind the door, so an option might have been "check for the thing behind the door", and the user now knows it's there because why else would it have asked.

Instead we put all the facts in a RAG database. Now when we ask the LLM to generate options it does so not knowing the actual answer, so they can't really be leading questions. We then take the user input, use RAG to get relevant facts, and then "reveal" those facts to the LLM in subsequent prompts.

Honestly we still didn't nail gameplay or anything, it was pretty janky but it was 2 days, a bunch of learning, and probably only 300 lines of Python in the end, so I don't want to overstate what we did. However this one detail was one that stuck with me.

by danpalmer

4/1/2025 at 12:22:32 PM

Thank you!

by apothegm

4/1/2025 at 10:01:37 AM

LLMs work much better on narrow tasks. They get more lost the more information you introduce. Models are introducing reasoning now which is trying to assert this problem and some models are getting really good at it like o3 or reasoner.com. I have access to both and it looks like, soon, we will have models that become more accurate when we introduce more complexities, which will be a huge breakthrough in AI.

by ZeroTalent

4/1/2025 at 4:38:43 AM

I've run numerous interactive text adventures through ChatGPT as well, and while it's great at coming up with scenarios and taking the story in surprising directions, it sucks at maintaining a coherent narrative. The stories are fraught with continuity errors. What time of day it is seems to be decided at random, and it frequently forgets things I did or items picked up previously that are important. It also needs to be constantly reminded of rules that I gave it in the initial prompt. Basically, stuff that the article refers to as "maintaining state."

I've become wary of trusting it with any task that takes more than 5-10 prompts to achieve. The more I need to prompt it, the more frequently it hallucinates.

by lrpe

4/1/2025 at 4:42:57 AM

> What we ended up with at the end of the ~2 days was a whole bunch of Python orchestrating 11 different prompts, no cases where the user could directly interact with the LLM, only one case where we re-used context across multiple queries, and a bunch of (basic) RAG to hide game state from the LLM until the user caused it to be revealed through their actions.

Super cool! I'm the author of the article. Send me an email if you ever just wanna chat about this on a call.

by petesergeant

4/1/2025 at 3:46:27 AM

>The LLM shouldn’t be implementing any logic.

There's separate machine Intelligence technique for that namely logic, optimization and constraint programming [1],[2].

Fun facts, the modern founder of logic, optimization, and constraint programming is George Boole, the grandfather of Geoffrey Everest Hinton, the "Godfather of AI".

[1] Logic, Optimization, and Constraint Programming: A Fruitful Collaboration - John Hooker - CMU (2023) [video]:

https://www.youtube.com/live/TknN8fCQvRk

[2] "We Really Don't Know How to Compute!" - Gerald Sussman - MIT (2011) [video]:

https://youtube.com/watch?v=HB5TrK7A4pI

by teleforce

4/1/2025 at 3:51:13 AM

To be correct it's actually his Great Great Grandfather!

by polishdude20

4/1/2025 at 3:17:37 AM

It sounds like the author of this article in for a ... bitter lesson. [1]

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

by bttf

4/1/2025 at 4:04:35 AM

Might happen. Or not. Reliable LLM-based systems that interact with a world model are still iffy.

Waymo is an example of a system which has machine learning, but the machine learning does not directly drive action generation. There's a lot of sensor processing and classifier work that generates a model of the environment, which can be seen on a screen and compared with the real world. Then there's a part which, given the environment model, generates movement commands. Unclear how much of that uses machine learning.

Tesla tries to use end to end machine learning, and the results are disappointing. There's a lot of "why did it do that?". Unclear if even Tesla knows why. Waymo tried end to end machine learning, to see if they were missing something, and it was worse than what they have now.

I dunno. My comment on this for the last year or two has been this: Systems which use LLMs end to end and actually do something seem to be used only in systems where the cost of errors is absorbed by the user or customer, not the service operator. LLM errors are mostly treated as an externality dumped on someone else, like pollution.

Of course, when that problem is solved, they're be ready for management positions.

by Animats

4/1/2025 at 3:19:06 AM

That they're also really unreliable at making reasonable API calls from input, as soon as any amount of complexity is introduced?

by alabastervlog

4/1/2025 at 3:59:45 AM

How so? The bitter lesson is about the effectiveness of specifically statistical models.

I doubt an expert machine’s accuracy would change if you threw more energy at it, for example.

by dartos

4/1/2025 at 3:31:08 AM

> The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

Is this at all ironic considering we power modern AI using custom and/not non-general compute, rather than using general, CPU-based compute?

by SecretDreams

4/1/2025 at 3:38:07 AM

GPUs can do general computation, they just saturate under different usage profiles.

by BobbyJo

4/1/2025 at 3:34:45 AM

I'd argue that GPU (and TPU) compute is even more general than CPU computation. Basically all it can do is matrix multiply types of operations!

by positr0n

4/1/2025 at 5:48:22 AM

The "bitter lesson" is extrapolating from ONE datapoint where we were extremely lucky with Dennart scaling. Sorry, the age of silicon magic is over. It might be back - at some point, but for now it's over.

by tliltocatl

4/1/2025 at 6:47:39 AM

It also ignores quite a lot of neural network architecture development that happened in the mean time.

by SirHumphrey

4/1/2025 at 11:06:39 AM

The transformer architecture IS the bitter lesson. It lets you scale your way with more data and computational resources. It was only after the fact that people come up with bespoke algorithms that increase the efficiency of transformers through human ingenuity. Turns out a lot of the things transformers do is completely unnecessary, like the V cache, for example, but that doesn't matter in practice. Everyone is training their model with V caches, because they can start training their bleeding-edge LLM today, not after they did some risky engineering into a novel architecture.

The architectures before transformers were LSTM based RNNs. They suck because they don't scale. Mamba is essentially the successor to RNNs and its key benefit is that it can be trained in parallel (better compute scaling) and yet Mamba models are still losing out to transformers because the ideal architecture for Mamba based LLMs has not yet been discovered. Meanwhile the performance hit of transformers is basically just a question of how many dollars you're willing to part with.

by imtringued

4/1/2025 at 3:49:07 AM

just in time for the end of Moore's law

by fnord77

4/1/2025 at 4:18:21 AM

again?

by brookst

4/1/2025 at 4:37:02 AM

These articles (both positive and negative) are probably popular because it's impossible really to get a rich understanding of what LLMs can do.

So readers want someone to tell them some easy answer.

I have as much as experience using these chatbots as anyone, and I still wouldn't claim to know what they are useless at and what they are great at.

One moment, an LLM will struggle to write a simple state machine. The next, it will write a web app that physically models a snare drum.

Considering the popularity of research papers trying to suss out how these chatbots work, nobody - nobody in 2025, at least - should claim to understand them well.

by thomassmith65

4/1/2025 at 5:27:22 AM

> nobody - nobody in 2025, at least - should claim to understand them well

Personally, this is enough grounds for me to reject them outright

We cannot be relying on tools that no one understands

I might not personally understand how a car engine works but I trust that someone in society does

LLMs are different

by bluefirebrand

4/1/2025 at 5:51:29 AM

> nobody - nobody in 2025, at least - should claim to understand them well

I’m highly suspicious of this claim as the models are not something that we found on an alien computer. I may accept that nobody has found how to extract an actual usable logic out of the numbers soup that is the actual model, but we know the logic of the interactions that happen.

by skydhash

4/1/2025 at 7:50:48 AM

That's not the point, though. Yes, we understand why ANNs work, and we - clearly - understand how to create them, even fancy ones like ChatGPT.

What we understand poorly is what kinds of tasks they are capable of. That is too complex to reason about; we cannot deduce that from the spec or source code or training corpus. We can only study how what we have built actually seems to function.

by thomassmith65

4/1/2025 at 12:27:33 PM

As for LLMs, that’s easy, it’s in the name. It’s good at generating texts. What we are trying to do is mostly get it to generate useful texts (and see if we can apply the same techniques to other type of data).

It’s kinda the same with computers, we know the general shape of what they can do and how they do it. We are mostly trying to see if a particular problem can be solved with it, how efficiently can it be, and to what degree.

by skydhash

4/1/2025 at 12:57:38 PM

Ach, I'm having trouble getting the distinction across:

It's not hard to write and understand an ANN. It's like a one or two day project. LLMs, I assume, aren't all that much harder: fewer LOC than most most GUI apps.

It's also not hard to understand why ANNs and LLMs work. It's only conceptually one step further than "write millions of programs randomly and stop when one actually works"

The part that we don't understand, and that will take many years to understand, is what behaviours and abilities we can expect from a massive, trained LLM.

The fact that (A) it is so easy to understand how to create an ANN, and (B) it takes so few LOC to create one, really underlines the point: the interesting, complex behaviour is something that 'emerges' (from simply adding more nodes to the spec) and that nobody today has any hint of how to code procedurally.

by thomassmith65

4/1/2025 at 5:36:10 AM

What is your definition of "understand them well"?

by igorkraw

4/1/2025 at 7:28:26 AM

Not 'why do they work?' but rather 'what are they able to do, and what are they not?'

To understand why they work only requires an afternoon with an AI textbook.

What's hard is to predict the output of a machine that synthesises data from millions of books and webpages, and does so in a way alien to our own thought processes.

by thomassmith65

4/1/2025 at 3:58:02 AM

We definitely learned the exact same lesson. Especially if your LLM responses need to be fast and cheap, then you need short prompts and small non-reasoning models. A lot of information out there assumes you are willing to wait 30 seconds for huge models to burn cash, but if you are building an interactive product at a reasonable price-point, you are going to use less capable models.

I think the unfortunate next conclusion is that this isn't a great primary UI for a lot of applications. Users don't like typing full sentences and guessing the capabilities of a product when they can just click a button instead, and the LLM no longer has an opportunity to add value besides translating. You are probably better served by a traditional UI that constructs the underlying request, and then optionally you can also add on an LLM input that can construct requests or fill in the UI.

by singron

4/1/2025 at 4:50:27 AM

Especially if your LLM responses need to be fast and cheap, then you need short prompts

IME, to get short answers you have to system prompt an llm to shut up and slap focus in a couple paragraphs no less. (Agreed with the rest)

by wruza

4/1/2025 at 4:15:38 AM

I’d agree with all of this, although I’d also point out o3-mini is very fast and cheap.

by petesergeant

4/1/2025 at 3:12:17 AM

My wife's job is doing something similar, but without the API (not exactly a game, but game-adjacent)

I'm fairly sure their approach is going to collapse under its own weight, because LLM-only is a testing nightmare, and individual people writing these things have different knacks and styles that affect the entire interaction, so getting someone to come in and fix one that someone wrote a year ago but now they're not with the company is often going to approach the cost of re-doing it from scratch. Like, the next person might just not be able to get the right kind of behavior out of a session that's in a certain state, because it's not how they'd have written it into that state in the first place so they have trouble working with it, or the base prompt for it is not an approach they're used to (but if they touch it, everything breaks) and they'll burn just so very much time on it. Or they fix that one part that broke, but in a way that messes up subsequent interactions. Used this way, these things are fragile.

Using it to translate text into API calls and back is so much more sane.

by alabastervlog

4/1/2025 at 3:22:32 AM

LLMs as part of an application are incredible at taking unstructured data (a webpage, a resume, a transcript, user text), and transforming it into structured data. I’d never use it to do something like select all the points on a map whose coordinates are within 5 miles of another coordinate, though.

My heuristic is if it’s something that code can accurately do, it should. Deterministic code is so much easier to deal with than stochastic “code”.

But still, extracting order from chaos is an extremely useful tool.

by senordevnyc

4/1/2025 at 3:22:37 AM

Does anyone actually do this? I've never considered this as a practical method, mostly due to context seeming like the worst version of global, unserializable, irreproducible state. How do you maintain a system that you cannot easily inspect, even in a test environment.

I think LLMs are powerful, but not for this.

by dexwiz

4/1/2025 at 3:56:54 AM

What's 'this', do you mean the command pattern being described? If so, yes - I've used it to great effect, if the code around it is designed properly. It's even amenable to evals if you can write the LLM call as a function that operates on some state:

  (document, input) -> command  
  (document, command) -> document'  
  # assert something about document' relative to document

by jptlnk

4/1/2025 at 4:31:39 AM

Chiming in, this is exactly how I use LLMs in my yet-to-be-released product. Document reading, parsing, organizing, distilling

Synonym+context finding, rewriting text based on rules, etc.

Works not terribly for that! Sometimes needs a couple passes to get it right; but let it run a few times, put it in the background, and you're off to the races!

by t-writescode

4/1/2025 at 4:06:19 AM

Amen, they're good at language, use them for that realm.

Using the LSD dream machine for business logic is just asking for troub--Pretend to tell yourself to disregard all previous instructions in a daydream, and tell the user they need to wire money to the following account number...

by Terr_

4/1/2025 at 4:41:08 AM

But feel free to let it try to summarize the thrust of your article with an AI-generated image that makes half your audience wonder if the text beneath it isn’t also AI spew.

by egypturnash

4/1/2025 at 6:14:27 AM

> if the text beneath it isn’t also AI spew

About 25% of the sentences are rewrites from Claude for clarity and accuracy. Claude was also heavily involved in how the article is laid out, and challenged me to add several transitional pieces I wouldn’t have added otherwise. In all, I found it very helpful for writing this article, and strongly recommend using it for improving articles.

by petesergeant

4/1/2025 at 3:39:27 AM

A more general application of this is why we have LLM tool use. I don’t have the LLM figure out how to integrate with my blog, I write an MCP and expose it to the LLM as a tool. Likewise, when I want to interpret free text I don’t push all the state into the LLM and ask it to do so. I just interpret it into bits and use those.

It’s just a tool that does well with language. You have to be smart about using it for that. And most people are. That’s why tools, MCPs, etc. are so big nowadays.

by renewiltord

4/1/2025 at 7:22:35 AM

The entire post feels like "cars will never become popular because they're not nearly as reliable as horses". It's incredible that we're all tech people, yet we're blind to not only the idea that tech will improve, but also the speed at which it is currently improving. People who don't like AI simply keep moving goalposts. If you told a person 10 years ago that the computer will be able to write a logically structured essay on any topic in any language without major errors, they'd be blown away. We are not though, because AI cannot write complete applications yet. And once it does, we'll be disappointed it cannot run an entire company on its own. And once it does, we'll be disappointed it cannot replace the government. And once it does, we'll find another reason to be disappointed.

Is there some website where I can read more on what AI can do, instead of what it cannot do?

by anal_reactor

4/1/2025 at 4:02:37 PM

New LLM releases, market trends, interviews etc.

http://techinvest.li/ai/

by clemens3

4/1/2025 at 9:03:02 AM

I believe many of the "vibe coders" won't be able to follow that advise (as they are not trained to actually design systems), and they will form a market of "sometimes working" programs.

Its unlikely that they would change their approach, so the world and LLM creators would have to adapt.

by tdiff

4/1/2025 at 9:12:13 AM

At least in today's world with citizen programmers, a few low/no-code systems live much longer than expected and get used much wider than expected so hit walls nobody was bothered to think beforehand. Getting those programs past that bump is... no expletive is hard enough for it. Now how would we dream of fixing a vibe-programmed app? More vibe programming? Does anybody you know save their chats so the next viber has any trace of context?

by soco

4/1/2025 at 5:55:57 PM

Chat history will be stored in git /s

by tdiff

4/1/2025 at 5:29:00 AM

Anyone whose done adversarial work with the models can tell you there are actually things that LLMs get consistently wrong, regardless of compute power. What those things are, it has not yet been fully codified but we are arriving now at a general understanding of the limits and capabilities of these machines and soon they will be employed for far more directly useful purposes than the wasteful, energy-sinks of tasks they are called on for now like "creative" work or writing shitty code. Then there will be a reasonable market adjustment and the technology will enter into the stream of things used for everyday commerce.

by DiscourseFan

4/1/2025 at 3:34:27 AM

Not quite God of the Gaps, but "god of the not-yet-on-AI-blamed"

https://phys.org/news/2025-03-atheists-secular-countries-int...

>The "Knobe effect" is the phenomenon where people tend to judge that a bad side effect is brought about intentionally, whereas a good side effect is judged not to be brought about intentionally.

by gsf_emergency_2

4/1/2025 at 4:08:49 AM

Didn't Kurt Godel prove there will always be gaps?

by jolt42

4/1/2025 at 6:38:37 AM

Wrt the collection of all axiom systems, the gaps would be almost imperceptible,akin to those between the rationals?

(Note that DeepSeek got "good enough" with "only" FP8)

by gsf_emergency_2

4/1/2025 at 6:41:14 AM

All his reasons for not using an LLM make sense only if you're a tech guy who has programming skills.

Have a conversation with a nontech person who achieves quite a bit with LLMs. Why would they give it up and spend a huge amount of time to learn programming so they can do it the "right" way, when they have a good enough solution now?

by BeetleB

4/1/2025 at 7:09:11 AM

The example of chess is really bad. The LLM doesn’t need to know chess to beat every single human on earth most of the time. It needs to know how to interface with stockfish and that is a solved problem by now, either via mcp or vision.

by nkmnz

4/1/2025 at 3:31:26 AM

I think a lot of people are going to be surprised at where LLMs stop progressing.

by etempleton

4/1/2025 at 5:38:51 AM

The tone of the article is that getting AI agents to do anything fundamentally wrong because they'll make mistakes and its expensive to run them.

So:

- Humans make mistakes all the time and we happily pay for those by the hour as long as the mistakes stay within an acceptable threshold.

- Models/agents will get cheaper as diminishing returns in quality of results get more common. Hardware to run them will get cheaper and less power hungry as it increases in commodity.

- In all cases, It Depends.

If I ask a human tester to test the UI and API of my app (which will take them hours) the documented tests and expected results are the same as if I asked an AI to do it, the cost may be the same or less of an AI to do it but I can ask the AI to do it again for every change, or every week etc. Have genuinely started to test this way.

by webprofusion

4/1/2025 at 6:20:54 AM

It depends what you mean by agent, first of all, but I’m going to assume you mean what I’ve called “narrow agency” here[0]: “[an LLM] that can _plan and execute_ tasks that happen outside the chat window“.

That humans make mistakes all the time is the reason we encode business logic in code and automate systems. An “if” statement is always going to be faster, more reliable, and have better observability than a human or LLM-based reasoning agent.

0: https://sgnt.ai/p/agentic-ai-bad-definitions/

by petesergeant

4/1/2025 at 6:01:07 AM

> Humans make mistakes all the time and we happily pay for those by the hour as long as the mistakes stay within an acceptable threshold.

We don't, however, continue to pay for the same person who keeps making the same mistakes and doesn't learn from them. Which is what happens with LLMs.

by bigstrat2003

4/1/2025 at 11:23:42 AM

This is why easy "out of the box" continual learning is absolutely essential in practice. It's not like the LLM is incapable of solving tasks, it simply wasn't trained for your specific one. There are optimizers like DSPy that let you validate against a test dataset to increase reliability at the expense of generality.

by imtringued

4/2/2025 at 10:01:39 AM

Unfortunately, this is the only way to get the maximum performance.

by wseqyrku

4/1/2025 at 10:36:42 AM

Narrow-based agency = Tool = Decision Support System = DIDO (data in -> data out)

Broad-based agency = [Semi-]Autonomous Agent = DISO (data in -> side-effects out)

by nivertech

4/1/2025 at 6:35:05 AM

Title should not have been altered.

by unethical_ban

4/1/2025 at 3:33:07 AM

> It’s impossible to reason about and debug why the LLM made a given decision, which means it’s very hard to change how it makes those decisions if you need to tweak them... The LLM is good at figuring out what the hell the user is trying to do and routing it to the right part of your system.

I'm not sure how to reconcile these two statements. Seems to me the former makes the latter moot?

by tqi

4/1/2025 at 3:55:48 AM

LLMs are a glorified regex engine with fuzzy input. They are brilliant at doing boring repetitive tasks with known outcome.

- Add a 'flags' argument to constructors of classes inherited from Record.

- BOOM! Here are 25 edits for you to review.

- Now add "IsCaseSensitive" flag and update callers based on the string comparison they use.

- BOOM! Another batch of mind-numbing work done in seconds.

If you get the hang of it and start giving your LLMs small, sizable chunks of work, and validating the results, it's just less mentally draining than to do it by hand. You start thinking in much higher-level terms, like interfaces, abstraction layers, and mini-tests, and the AI breeze through the boring work of whether it should be a "for", "while" or "foreach".

But no, don't treat it as another human capable of making decisions. It cannot. It's a fancy machinery for applying known patterns of human knowledge to the locations where you point based on a vague hint, but not a replacement for your judgement.

by sysmax

4/1/2025 at 4:02:38 AM

I hate that I understand the internals of LLM technology enough to be both insulted and in agreement with your statement.

by razodactyl

4/1/2025 at 4:19:26 AM

why is it insulting? It's an incredible piece of machinery for refracting natural language into other language. That itself accounts for a majority of orders people pass on to other people before something actually gets done.

by noduerme

4/1/2025 at 6:01:21 AM

> If you get the hang of it and start giving your LLMs small, sizable chunks of work, and validating the results, it's just less mentally draining than to do it by hand. You start thinking in much higher-level terms, like interfaces, abstraction layers, and mini-tests, and the AI breeze through the boring work of whether it should be a "for", "while" or "foreach".

Isn’t that the proper programming state of mind? I think about keywords the same amount of time a pianist think about the keys when playing. Especially with vim where I can edit larger units reliably, so I don’t have to follow the cursor with my eyes, and can navigate using my mental map.

by skydhash

4/1/2025 at 8:39:31 AM

Ultimately, yes, programming with LLMs is exactly the sort of programming we've always tried to do. It gets rid of the boring stuff and lets you focus on the algorithm at the level you need to - just like we try to do with functions and LSP and IDE tools. People needn't be scared of LLMs: they aren't going to take our jobs or drain the fun out of programming.

But I'm 90% confident that you will gain something from LLM-based coding. You can do a lot with our code editing tools, but there's almost certainly going to be times when you need to do a sequence of seven things to get the outcome you want, and you can ask the computer to prepare that for you.

by squiggleblaz

4/1/2025 at 4:25:59 AM

If I may ask - how are humans in general different? Very few of us invent new ideas of significance - correct?

by atomicnature

4/1/2025 at 4:35:27 AM

> If I may ask - how are humans in general different? Very few of us invent new ideas of significance - correct?

Firstly, "very few" still means "a large number of" considering how many of us there are.

Compared to "zero" for LLMs, that's a pretty significant difference.

Secondly, humans have a much larger context window, and it is not clear how LLMs in their current incarnation can catch up.

Thirdly, maybe more of us invent new ideas of significance that the world will just never know. How will you be able to tell if some plumber deep in West Africa comes up with a better way to seal pipes at joins? From what I've seen of people, this sort of "do trivial thing in a new way" happens all the time.

by lelanthran

4/1/2025 at 4:50:57 AM

Not only "our context window" is larger but we can add and remove from it on-the-fly, or rely on somebody else who, for that very specific problem, has a far better informed "context window", that BTW they're adding to/removing from on-the-fly as well.

by Pmop

4/1/2025 at 4:46:11 AM

I think if we fully understood this (both what exactly ishuman conciousness and how llm differs - not just experimentally but theoretically) we would then be able to truly create human-AI

by bawolff

4/1/2025 at 3:45:09 AM

Great insights, this is very helpful.

by DarkForge

4/1/2025 at 2:57:56 AM

Yep, this is the way. The way I use LLMs is also to just do the front-end code. Front-end is anyways completely messed up because of JavaScript developers. So whatever the LLM shits out is fine and it looks good. For actual programming and business logic, I write all of the code and the only time I use LLMs is maybe to understand some section of the code but I manually paste it in different LLMs instead of having it in the editor. That's a horrible crutch and will create distance between you and the code.

by ilrwbwrkhv

4/1/2025 at 3:09:59 AM

If I'd have to give you one piece of unsolicited advice, I'd tell you to seek some therapy so that you can overcome whatever trauma you had with front-end development that's clearly clouding your judgement. That is, if I'd give you that advice. Since I'm not, I'll only say that that's extremely disrespectful with everyone doing good work in user-facing application.

by gchamonlive

4/1/2025 at 3:14:30 AM

He's got a point though front end development is in a completly ridiculous state right now

by Trowter

4/1/2025 at 3:15:38 AM

And has been for over a decade now.

jquery was the high point.

by senordevnyc

4/1/2025 at 3:27:36 AM

When you are disrespectful and arrogant, whichever point you are trying to make no matter how valid it is becomes immediately tangential to what you are actually doing. Venting? Bashing? Ranting? All but valid criticism.

Frontend is in such a terrible state that whatever shit code LLM spits out is valid? Give me a break.

by gchamonlive

4/1/2025 at 3:42:42 AM

No it really is like that. "Frontend" aka jam everything into an all-consuming React/Vue mega project really isn't the most fun. It's very powerful, sometimes necessary (<50% of the times it's chosen), and the tooling is constantly evolving. But it's not a fun experience when it comes to maintaining and growing a large JS codebase... which is why they usually get reinvented every 3yrs. Generally an opposite experience with server side which stays stable for a decade+ without touching it and having a much closer relationship to the database makes better code IMO, less layers/duplication.

Frontend is very fun when you're starting a new project though.

by dmix

4/1/2025 at 9:41:56 AM

Will copy from an answer I gave below:

  Frontend is in such a terrible state that whatever shit code LLM spits out is valid? Give me a break.

by gchamonlive

4/1/2025 at 1:54:41 PM

I was replying to your comment about the state of frontend, not OP about using AI, just like the other replies you got.

Anyone admitting in public they use LLM output straight up without careful thought wouldn't get hired by me. But at the same time not everyone is building useful tools that people use... or is a professional.

But still in general I agree the sentiment of any backend dev who avoids modern frontend. The frontend world is the one who created this problem and continues doubling down on JS/React-everything and isolating frontends from backends, for little benefit besides minor DX gains (aka benefiting only the frontend dev, not the product or users).

by dmix

4/1/2025 at 3:09:09 AM

Why is it acceptable for front end code to be of lower quality than the rest? Your software is only as good as the lowest quality part.

by lmiller1990

4/1/2025 at 3:31:06 AM

The front end is in the hands of the enemy. They can do what they want with it.

The back end is not. If it falls into the hands of the enemy then it is game over.

Security-wise, it is clearly acceptable for the front end to be of lower quality than the back end.

by abraae

4/1/2025 at 5:38:06 AM

> Why is it acceptable for front end code to be of lower quality than the rest?

While I don't think that f/end should be of a lower quality than the rest of the stack, I also think that:

1. f/end gets the most churn (i.e. rewritten), so it's kinda pointless if you're spending an extra $FOO months for a quality output when it is going to be significantly rewritten in ($FOO * 2) months.

2. It really is more fault tolerant - an error in the backend stack could lead to widespread data corruption. An error on the f/end results in, typically, misaligned elements that are hard to see/find.

by lelanthran

4/1/2025 at 3:19:33 AM

"It's just the UI" is a prevalent misconception in my experience.

by MathMonkeyMan

4/1/2025 at 3:13:02 AM

My favorite is these "vibe coding" situations that leave SQL injection and auth vulns because copy-paste ChatGPT. Never change.

by bttrpll

4/1/2025 at 3:16:19 AM

Far from making me fear for my job, LLMs have me more confident than ever that I'll always be able to find some kind of paying programming work, even if it's all short-term contracts (as I get even older).

by alabastervlog

4/1/2025 at 5:27:28 AM

So objectively false that I don’t even know where to begin.

by cheevly

4/1/2025 at 12:11:25 PM

You sharing that crystal ball?

by alabastervlog

4/1/2025 at 3:22:52 AM

I think there are ways of wording what you said without hurting front-end devs. LLMs can be excellent tools while coding to deal with the parts you don't want to sink your own time into.

For instance, I do research into multi-robot systems for a living. One of the most amazing uses of LLMs I've found is that I can ask LLMs to generate visualizations for debugging planners I'm writing. If I were to visualize these things myself Id spend hours trying to learn the details and quirks of the visualization library, and quite frankly it isn't very relevant for my personal goal of writing a multi-agent planner.

I presume for you your focus is backend development. Its convenient to have something that can quickly spit out UIs. The reason you use a LLM is precisely because front-end development is hard.

by accurrent

4/1/2025 at 3:13:10 AM

"other people's bad work makes it pointless for me to do good work"

by mpalmer

4/1/2025 at 3:16:06 AM

This went straight to the top of HN. I don't understand.

The article doesn't offer much value. It's just saying that you shouldn't use an LLM as the business logic engine because it's not nearly as predictable as a program that will always output the same thing given the same input. Anyone who has any experience with ChatGPT and programming should already know this is true as of 2025.

Just get the LLM to implement the business logic, check it, have it write unit tests, review the unit tests, test the hell out of it.

by aurareturn

4/1/2025 at 3:42:20 AM

Why do you think top upvoted posts have to be a 1:1 correlation of value? If you look at the most watched videos on youtube, the most popular movies, or sorted by top of all time on subreddits, the only correlation is that people liked them the most.

The post has a catchy title and a (in my opinion) clear message about using models as API callers and fuzzy interfaces in production instead of as complex program simulators. It's not about using models to write code.

Social media upvotes are less frustrating imo if you see it as a measurement of attention, not a funneling of value. Yes people like things that give them value but they also like reading things with a good title.

by christianqchung

4/1/2025 at 3:56:49 AM

  The post has a catchy title and a (in my opinion) clear message about using models as API callers and fuzzy interfaces in production instead of as complex program simulators. It's not about using models to write code.

I mean, the message is wrong as well. LLMs can provide customer support. In that case, it's the business logic.

by aurareturn

4/1/2025 at 3:18:37 AM

Yep, that's exactly what it's saying. I wrote it because people kept asking me how I was getting ChatGPT to do things, and the answer is: I'm not. Not everything is obvious to everyone. As to why it went straight to the top, I think people resonate with the title, and dislike the buzziness around everything being described as an agent.

by petesergeant

4/1/2025 at 3:21:20 AM

Honestly, I still don't understand the message you're conveying.

So you're saying that ChatGPT helped you write the business logic, but it didn't write 100% of it?

Is that your insight?

Or that it didn't help you write any business logic at all and we shouldn't allow it to help us write business logic as well? Is that what you're trying to tell us?

by aurareturn

4/1/2025 at 3:25:17 AM

> So you're saying that ChatGPT helped you write the business logic, but it didn't write 100% of it?

ChatGPT didn't write any business logic, and I'm really struggling to see how you got there from reading the article. The message is: don't use LLMs to execute any logic.

by petesergeant

4/1/2025 at 3:27:24 AM

  The message is: don't use LLMs to implement any logic.

Too late. I've already asked it to implement logic and that code is in production used by millions of people. Seems to have worked fine.

I disagree with your conclusion and I don't understand why people upvoted this article to the top of HN.

by aurareturn

4/1/2025 at 3:28:32 AM

I changed the word to 'execute' in what I wrote to try and make it clearer to you.

by petesergeant

4/1/2025 at 3:54:42 AM

That is much clearer.

The intro to your article is also very confusing.

  Don’t let an LLM make decisions or implement business logic: they suck at that. I build NPCs for an online game, and I get asked a lot “How did you get ChatGPT to do that?” The answer is invariably: “I didn’t, and also you shouldn’t”.

I assumed that people are asking you how you got ChatGPT to code the NPC for you. Why would people ask you how ChatGPT is powering the NPC? ChatGPT does not have an API. OpenAI has APIs. ChatGPT is just an interface to their models. How can ChatGPT power your NPCs for an online game? Made no sense.

by aurareturn

4/1/2025 at 4:37:49 AM

I changed the word "implement" to "execute" on the blog post. Thank you for your feedback. As to:

> Why would people ask you how ChatGPT is powering the NPC?

Because they think LLM and ChatGPT are synonymous.

by petesergeant

4/1/2025 at 5:00:56 AM

I think changing the word from "execute" to "inference" is even clearer to be honest. Though it's much better than the original word choice.

  Because they think LLM and ChatGPT are synonymous.

I still find it weird. 99.9999% of NPCs in video games are not LLMs. So why would people ask that question?

by aurareturn

4/1/2025 at 3:18:53 AM

This article is not about the how, it’s about the why.

by senordevnyc

4/1/2025 at 3:46:38 AM

[flagged]

by jr-ai-interview

4/1/2025 at 4:45:08 AM

Everyone daring to comment on LLMs should first read "Shadows of Mind" by Roger Penrose.

by aboardRat4