Codex for almost everything

4/16/2026 at 5:24:55 PM

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

by cjbarber

4/16/2026 at 5:51:14 PM

I agree with the sentiment but I think for normie agents to take off in the way that you expect, you're going to have to grant them with full access. But, by granting agents full access, you immediately turn the computer into an extremely adversarial device insofar as txt files become credible threat vectors.

For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.

by postalcoder

4/16/2026 at 6:13:55 PM

> for normie agents to take off in the way that you expect, you're going to have to grant them with full access

At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.

The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.

Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

I don't want this, I just think it's going down that route.

by avaer

4/16/2026 at 6:25:28 PM

There was a recent Stanford study which showed that AI enthusiasts and experts and the normies had very different sentiment when it came to AI.

I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?

Theres lots of cases of great highly useful LLM tools, but the moment they scale up you get slammed by the risks that stick out all along the long tail of outcomes.

by intended

4/16/2026 at 6:34:22 PM

I agree, in general we are going to find that ultimately most employee end users don't want it. Assuming it actually makes you more productive. I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase? You're just giving away that value to your employer.

On the other hand, entrepreneurs and managers are going to want it for their employees (and force it on them) for the above reason.

by ryandrake

4/17/2026 at 7:32:37 AM

I want. If I get 10X more productive, I can unilaterally increase my compensation 10X by doing my stuff in 1 unit of time instead of 10 it took, and splitting the remaining 9 units of time into, say, 4 units of time doing more work, securing my position and setting myself up for promotion, and 5 units of time doing whatever the fuck I want. Not all compensation shows up in a bank account - working less, or under less stress, are also valuable.

Of course, such situation is only temporary - if I can suddenly be 10X productive, then so can everyone else, and then the baseline shifts so 10X is the new 1X.

by TeMPOraL

4/17/2026 at 8:10:32 AM

You want it, but then you closed by explaining exactly why you shouldn't want it. Plus, the new baseline isn't neutral (as in, everyone is the same again). If humans can now do 10x the work as before, the employer doesn't need the same number of humans to carry out its work. So the new baseline is actually "let's keep 1 employee and fire the other 9", unless the business can find a way to suddenly expand 10x so that it needs 10x as much work done.

by jbstack

4/17/2026 at 9:01:01 AM

> So the new baseline is actually "let's keep 1 employee and fire the other 9", unless the business can find a way to suddenly expand 10x so that it needs 10x as much work done.

If they have any surplus of money (or loans) they'll try, so those 9 employees may end up becoming team leads or middle management, trying to start new initiatives to get the 10x expansion (and 100x improvement).

The market isn't anywhere near efficient enough to directly translate productivity improvements into labor reductions. Thankfully, because everything that's nice and hopeful and human lives within the market inefficiency; a fully efficient market would be a hell worse than any writer or preacher ever imagined.

by TeMPOraL

4/17/2026 at 9:49:02 AM

lol that has nothing to do with market efficiency.

I’ve seen a number of your posts where you talk about topics you clearly are not all that well versed in, with such confidence when you’re plain wrong.

by sikewj

4/17/2026 at 12:36:04 PM

Of course it does have to do with market efficiency, of which the inertia and surplus within companies (especially large ones) is a part.

> I’ve seen a number of your posts where you talk about topics you clearly are not all that well versed in, with such confidence when you’re plain wrong.

I'm sure it's true. However, since you brought it up, can you be more specific and name three?

by TeMPOraL

4/17/2026 at 2:46:16 PM

Yes, but in the long run, the market expects growth and innovation, not just doing the same thing with fewer workers. Especially when every other company can just buy the exact same advantage for the same price.

by LinXitoW

4/18/2026 at 6:37:06 AM

Your first paragraph is so short sighted that its message didn't even make it beyond the next one. It's a race to the bottom and your "doing whatever the fuck I want" will obviously never materialize.

The typical work week today is 40 hours. Just like it was 80 years ago. The typical worker is dramatically more productive than 80 years ago yet "doing whatever the fuck I want" time has not increased. Why would it? Employers don't need to pay such that 20 hour work weeks give you the same income. Because everybody around you is ok with working 40 hours.

This won't be different with AI, no matter if the overall effect is 1.1x or 10x or 100x productivity. Because it's not a technological problem but a sociological one.

by teiferer

4/17/2026 at 5:06:13 PM

Good point. My rant assumed that "10x productivity" meant 10x output in 1x time, rather than 1x output in 0.1x time. Only one of those are actually objectionable.

by ryandrake

4/17/2026 at 6:06:18 AM

> I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase? You're just giving away that value to your employer.

Those are productivity increases that got our standard of living to where it is. Fewer people doing the same amount of work has, historically speaking, freed people from their current job, allowing them to work on something else.

It's that analogy of the horse, they used to be farm animals. Now, fewer of them are 'employed' but they're much nicer jobs. I'm not sure if the same is true for us this time around though as new jobs being created have increasingly been highly skilled which means the majority can't apply.

by hvb2

4/17/2026 at 6:17:07 AM

There was a long and great ravine of suffering between the advent of the Industrial Revolution and our time of bounty.

by drivebyhooting

4/17/2026 at 6:17:21 PM

Yep, all those artists, musicians, designers and coders will finally do something productive!

by Bombthecat

4/17/2026 at 6:03:50 AM

If everyone becomes 10x more productive it won’t mean the companies cash flow 10x’s. Where value is loose there is competition, so in theory everyone should win. Unless nobody else can compete to capture that loose 10x value, in which case congratulations, you are now a unicorn.

Of course in reality in the short term what happens is companies lay off people to increase margins. Times will be tough for workers, and equity keeps gravitating towards those who already had it.

by yes_man

4/17/2026 at 8:29:26 AM

Tasks have value because they take effort to complete.

If you remove the effort from those tasks, they will have no value.

10x the value of 0 is 0

by King-Aaron

4/17/2026 at 9:15:50 AM

Eh, I’d say the premiums drop, and that there is a residual value that is still left. So maybe 0.1 or 0.2 instead of 0.

by intended

4/17/2026 at 9:41:50 AM

>Assuming it actually makes you more productive. I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase?

Given sane working arrangements or at minimum presence of remote work, it would be a bit shortsighted not to want to get done with your work in a tenth amount of time. In the very least, you're competing for a promotion against less effective people, all while having more time for yourself. If not, you're building labor market skillset in an efficient way so you can hop to a better employer.

by vovavili

4/17/2026 at 6:58:54 AM

It's interesting how differently people can think.

I couldn't imagine thinking "I'm gonna do this 0.1x as fast as I could, wasting my life away with pointless extra work, to spite my employer"

by procaryote

4/18/2026 at 6:42:42 AM

> I mean, who the hell wants to be 10X more productive without a commensurate 10X compensation increase?

The person who realizes that everybody around them is bow at 10X and if they don't follow suit then they will soon be out of a job.

by teiferer

4/18/2026 at 6:30:11 AM

> I think most people are going to say they dont want it. I mean, why would anyone want a tool that can screw up their bank account? What benefit does it gain them?

I'm not so sure. Matter of marketing and social pressure, big time.

Consider this: "Always-on pervasive google/fb/... login? I think most people are going to say they dont want it. I mean, why would anyone want a tool that would track their every move on the internet?" That could easily have been a statement 20 years ago. And look where we are.

by teiferer

4/17/2026 at 6:39:59 AM

Their solution will be to push mandatory and nonconsensual updates to your devices which limit your device and your freedom in the name of security. Like Google is doing to Android in September. You will no longer be able to install "unverified" software on anything. To address prompt injection attacks they're probably working on an approach where your data all has to be in the cloud and subject to security scans. That's already basically the model for Google Workspace, Google Drive and Chromebooks.

The model will get full access to your data, but in the name of security, you will only be permitted to have data that is cloud-hosted; local storage will effectively just be cache.

The era of the general computer will end, and the products you purchased from these companies will be nonconsensually altered and limited.

I'm so glad I switched to Linux more than a decade ago. At least on the PC there will still be an open source ecosystem for a long time to come, it may have less features but I'm willing to accept that.

Knowing that they can change what you bought overnight with a single nonconsensual update, think very, very carefully about who you purchase all of your future technology from. Google's upcoming nonconsensual degradation of Android should be a lesson for everybody.

by safety1st

4/17/2026 at 1:46:09 PM

>Google's upcoming nonconsensual degradation of Android should be a lesson for everybody.

Google is almost certainly doing this because the iOS was not found to be a monopoly, while Andorid was. It came up in Google's appeal of the Epic case verdict, where they directly asked the judge about it. Turns out you can't be anti-competitive if you don't have [allow] any competitors.

by WarmWash

4/17/2026 at 1:49:27 PM

Nope. I'm still going to blame Google for their own actions. Nice try, though. I'm old enough to remember when Google pretended to take responsibility for not being evil. Even had it as their motto.

by daveguy

4/17/2026 at 9:16:27 AM

> I'm so glad I switched to Linux more than a decade ago. At least on the PC there will still be an open source ecosystem for a long time to come, it may have less features but I'm willing to accept that.

Wait until age verification is mandatory everywhere. :)

I can already see that happening, e. g. to access financial transactions or government apps, one needs to verify the id, and that will not work without age verification that can not be tampered with. So Linux will either submit to the same or be excluded.

(That free developers will be able to run Linux fine for much longer will also be true, but I guess they only care about catching the 95%, not the 5% linux users ... and 5% is a high guesstimate).

Edit: To clarify the above, one already had to provide personal data for financial transactions, of course, so a bank knows who is who, but the recent age verification go hand in hand with the attempt to get rid of vpn, and applications now make it a new standard to query the age of users, with the claim to "help protect kids". And some people buy into that rationale too. I don't, but I have seen many non-tech savvy people submit to that justification.

by shevy-java

4/17/2026 at 9:56:45 AM

There's always the zero knowledge proof tech alternative, but I don't have the feeling we are moving in that direction - it's not the most profitable business is it.

by soco

4/17/2026 at 10:54:47 AM

No, nor is it most amenable to mass surveillance.

by duskdozer

4/17/2026 at 5:42:46 AM

> It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.

The concerning aspect is how others' content being scanned into systems don't have any knowledge or consent. Having private PII/files/code/emails/etc being read and/or accidentally shared by the agent online.

by Springtime

4/17/2026 at 6:22:18 AM

> Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

Honestly, it's alright.

Just think of what we could do with computers up until this point. We keep all those abilities.

And more, even, because the industry still keeps churning out new local LLMs. So you even gain more capabilities than right now. Just not at the rate of the bleeding edge.

Which is just like the Linux desktop, essentially. It's fine, really. There is no need to consume the bleeding edge. You will be fine.

by hypfer

4/17/2026 at 1:45:29 PM

Definitely agree here. Made the swap to Linux a little over a year ago and the only reason I even have nice hardware is because I like gaming. But if I was cut off from everything tomorrow, the decades of stuff I have that I have not played will keep me very happy lol

by Forgeties79

4/17/2026 at 6:50:34 AM

>Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

As a proud neo-luddite, I'm watching the AI hype with grim amusement and I'll tell you hwhat, it doesn't look like a good time. Even putting to one side the planetary scale economic crash that is incoming, all the hypers seem to be on some sort of treadmill that is out of their control and it simply doesn't look like fun.

by multjoy

4/17/2026 at 10:25:36 AM

Do you think that avoidance is going to protect you from the fall-out?

by petesergeant

4/17/2026 at 11:55:32 AM

Everyone keeps saying how essential it all is yet a few years in and I still don’t see anything like the promised future of “everyone using them every day for everything.” Everyone’s just constantly talking (or stressing) about it.

We - including the companies - don’t know what the real “billion dollar application” of them is other than the unproven claim it makes everyone more productive in some general sense. When it doesn’t work people continue to say “it’s your fault not the tool’s.” Meanwhile investors are getting skittish and not one AI company is profitable yet. Companies that laid people off for LLM’s are regretting their decisions, leadership (and educators) is dealing with unvetted writing and having to waste their time cleaning it up, the list goes on. “Slop” is still a huge and growing problem.

LLM’s are here to stay, but IMO it’ll be more relevant in the long run than 3D printers yet less revolutionary than the internet. Everyone will touch them at various points but this whole-life, every-industry-disrupted integration still seems far fetched to me. Pricing is still a huge unsolved problem - everyone is still subsidized and despite gains in using fewer resources, it’s still too much to run these locally, even small models (not even getting into tooling and knowledge required to use them in a productive way).

When we zoom out and look at the whole picture, LLM’s have mostly made everyone’s online experience worse while the VC funded companies behind them are playing municipal and state governments’ for suckers a la Amazon getting so many cities to trip over each other giving away land and tax breaks, but far worse. Those are the biggest contributions so far aside from anecdotes from coders about “1000x productivity.” Again, I think they’re here to stay. But it’s called “AI hype” for a reason.

LLM’s have mostly been a problem creator IME rather than a “disruptor.” Never really seen “revolutionary technology” quite like it.

But hey, I’ll admit it’s useful to have a meh local model when I’m writing TTRPG stuff and have writer’s block. Though then I remember how it was trained, a whole other subject I haven’t even touched, so that kind of sucks too.

by Forgeties79

4/17/2026 at 1:44:43 PM

Yes, mainly because I will continue to know the difference between a truth and a lie.

by multjoy

4/17/2026 at 7:21:32 AM

2-3 news stories of people having bank accounts cleared and the product is dead on arrival.

by elictronic

4/17/2026 at 4:28:13 PM

You'd think so but all the evidence so far points to the contrary. Most people seem perfectly happy to trade security and privacy for convenience.

by driverdan

4/16/2026 at 6:25:57 PM

I dont see companies doing that. it can be business ending. only AI bros buying mac mini in 2026 to setup slop generated Claws would do that but a company doing that will for sure expose customer data.

by retinaros

4/17/2026 at 5:31:18 AM

Big companies are exposing customer data all the time, and they are doing all fine. The more criminal negligence, the richer.

by rurban

4/17/2026 at 9:38:31 AM

[dead]

by soraminazuki

4/16/2026 at 5:58:25 PM

> For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.

Strongly agreed.

I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.

And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.

The more powerful, the more dangerous, and vice versa.

by cjbarber

4/17/2026 at 9:52:03 AM

> I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.

People have different levels of safety-consciousness, but also different tolerances and threat models.

For example, I would hesitate running a Mythos-level model in YOLO mode with full control over my computer, but right now, for personal stuff, even figuring out WTF are sandboxes in Claude Code / Gemini CLI, much less setting them up, is too much hassle. What's the worst it can do without me noticing? Format the drive and upload some private data into pastebin? Much as I hate cloud and the proliferation of 2FA in every service, that alone means it can't actually do more to me than waste few hours of my life, as I reimage my desktop and restore OneDrive (in case of destructive changes that got synced up). These models are not yet good enough to empty my bank account in few minutes I'm not looking; everything else they can do quickly is reversible or inconsequential.

Now, I do look at things closely when working with agentic AI tools. But my threat model is limited to worrying about those few hours of my life. `rm -rf / --no-preserve-root` is an annoyance, not a danger.

(I accept that different contexts give different threat modeling. I would be more worried if I were doing businessy business stuff with all kinds of secret sauces, or was processing PII of my employer's customers, or lived in a country where it's easy to have all your money stolen if your CC number or SSN gets posted online.)

by TeMPOraL

4/16/2026 at 6:07:47 PM

How many of these threat vectors are just theoretical? Don’t use skills from random sources (just like don’t execute files from unknown sources). Don’t paste from untrusted sites (don’t click links on untrusted sites). Maybe there are fake documentation sites that the agent will search and have a prompt injected - but I haven’t heard of a single case where that happened. For now, the benefits outweigh the risk so much that I am willing to take it - and I think I have an almost complete knowledge of all the attack vectors.

by planb

4/18/2026 at 6:45:54 AM

The problem is that any data now becomes effectively an executable.

> I think I have an almost complete knowledge of all the attack vectors.

That's exactly the kind of hybris where the maximum danger lies.

by teiferer

4/17/2026 at 6:02:55 AM

Systems have been caught out that review pull requests, that’s a simple and clear one. The more obvious to me for most people is anything you do that interacts with your email without an explicit approve list of emails to read.

by IanCal

4/17/2026 at 2:45:42 PM

Yes, but none of this applies to the local codex agent that runs when I tell it to and has access to my computer. Like: „scan this folder of PDFs and create an excel file with all expenses. Then enter them into my tax software.“ This needs access to very sensitive data and involves a quite complex handling of data. But the only attack vector I see is someone injecting prompts into my invoice files.

by planb

4/16/2026 at 6:29:18 PM

i think you lack creativity. you could create a site that targets a very narrow niche, say an upper income school district. build some credibility, get highly ranked on google due to niche. post lunch menus with hidden embedded text.

the attack surface is so wide idk where to start.

by postalcoder

4/16/2026 at 7:33:12 PM

Why would my agent retrieve that lunch menu?

by planb

4/17/2026 at 3:05:47 PM

Because it’s hooked up to a microphone in your kitchen & your kid is arguing with you about what lunch they want & they say “Hey [agent], what day is pizza day at [school]?”

by thuuuomas

4/20/2026 at 6:41:27 PM

I’m not doing that. That would be like giving my child shell access to my system.

by planb

4/18/2026 at 4:58:41 AM

Funny joke,

But for real, obviously we all know people use agents to pick restaurants and that's a legit vector.

I agree it's not the biggest surface, but it's worth knowing imdo

by boxedemp

4/18/2026 at 11:24:31 AM

I cannot reconcile that growth for non-technical users is going to explode, when most utility from agents is via the ability to execute arbitrary code, generally in yolo mode, with the fact that almost all corporate IT departments do not give users the ability to install anything on their machine, let alone arbitrary code. Even developers at many companies are subject to this despite the productivity impacts.

The culture of corporate IT would need to change to allow it, and I just don't see it happening.

by jasongi

4/17/2026 at 9:45:27 AM

What about setting environments for normies that mitigate this problem? I don't know that you can do it on Windows, but Linux offers various tools for isolation where you can give full rights to an LLM and still be safe from certain classes of disaster.

Maybe this kind of isolation neuters the benefit you're thinking of, but I do believe some sort of solution could be reached.

by Anvoker

4/18/2026 at 6:48:21 AM

"Isolation" and "full rights" are mutually exclusive, contradictory properties.

by teiferer

4/16/2026 at 6:31:44 PM

[dead]

by canarias_mate

4/16/2026 at 6:58:44 PM

This is me!

I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).

I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.

I recently got it to plan a social media campaign and produce a ppt with key messaging and content calendar for the next year, then draft posts in Figma for the first 5 weeks of the campaign and then used a social media aggregator api to download images and schedule in posts.

In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.

I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.

With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.

Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!

by MrsPeaches

4/16/2026 at 8:59:16 PM

How does this obviate the need for software? In order for what you asked to be possible, Word, Excel, PowerPoint, and Figma all still need to exist and you need licenses for them.

If you can figure out the next step and say "Claude, go find me buyers and sell shit for me without using any pre-existing software," have at it. It can't be social media, I guess, since social media is software and Claude is supposed to get rid of software.

At a certain point, why do we even need computers? Can't we just call Claude's hotline and ask "Claude, please find a way to dump $40 million in cash into my living room. Don't put it in my bank account because banks use software."

by nonameiguess

4/17/2026 at 6:59:24 AM

It doesn't remove the need for software, but it greatly reduces the number of tools needed or doesn't mandate building custom tools that might not be viable due to very specific needs many users have.

OP gave a good example how their workflow was changed, you could argue there are tools that could've done that, but they managed to achieve their goals without them, have something that fits their workflow perfectly, is fine tuned in case of changes, and with a few other tools (Word, Excel, Figma) they can do all sorts of things which would've required a small team or far more (expensive) tools to execute.

To me that is a great example of non-developers using tools to enhance their workflows and with initiatives like from this topic, I can only see that increasing.

by elAhmo

4/17/2026 at 7:38:02 AM

> How does this obviate the need for software?

It doesn't obviate the need for software, but it greatly devalues software products, as they become reduced to tool calls for LLMs.

This is good for users, because software products are defined by boundaries - borders drawn around the code to focus and package functionality, yes, but also to limit interoperability and create a sales channel (UX being the perfect marketing platform for captive audience).

After all, I don't usually want to play with Word, Excel, PowerPoint, and Figma - they're just standing between me and the artifact I want to create, so if I can get LLM to operate them for me, I don't have to deal with all the UX and marketing bullshit those products throw at me.

I mean, that's what I'd do if I could afford to hire a person to operate those tools for me. That, again, is the best mental model for LLMs - they're little people on a chip, cheaper to employ than actual people.

by TeMPOraL

4/17/2026 at 3:40:54 PM

> I mean, that's what I'd do if I could afford to hire a person to operate those tools for me. That, again, is the best mental model for LLMs - they're little people on a chip, cheaper to employ than actual people.

Sounds like more of a threat to people than software then.

I get the point that if an agent could generate a presentation by directly writing to some open format with a free viewer then PowerPoint would be out of the picture.

However the tool has to be pretty close to 100% for that to work. If I have a presentation that's 90% there it's probably going to be a lot easier to finish it off manually in Powerpoint than try different variants of prompts. In which case I'll still need that Powerpoint license.

by DrScientist

4/17/2026 at 5:33:59 PM

> In order for what you asked to be possible, Word, Excel, PowerPoint, and Figma all still need to exist and you need licenses for them.

Or not. Besides, the better AI models can effortlessly generate Latex/Beamer, a far superior solution for typesetting and presentations. Anything than can be done in Excel can be done in Python. Those proprietary tools are a thing of the past, no one should use them anymore.

by drnick1

4/17/2026 at 6:20:12 PM

And the value of those marketing campaigns is going to zero, since everyone is doing it. Even self employed people.

Pay for ads or you get lost in the mass of posts

by Bombthecat

4/16/2026 at 6:22:04 PM

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I disagree. There is a major gap between awesome tech and market uptake.

At this point, the question is whether LLMs are going to be more useful than excel. AI enthusiasts are 100% sure that it’s already more useful than excel, but on the ground, non-technical views do not reflect that view.

All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.

GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.

GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.

Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.

Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.

by intended

4/16/2026 at 6:37:13 PM

> market uptake.

I think the market uptake of Claude Cowork is already massive.

by cjbarber

4/16/2026 at 9:21:43 PM

Estimated users are at 18-30 mn, and we are talking about non-technical users.

by intended

4/16/2026 at 5:45:53 PM

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.

The part that makes this powerful is that the LLM is the ultimate UI/UX. You don't need to spend much time developing user interfaces and testing them against customers. Everyone understands the affordances around something that looks like iMessage or WhatsApp. UI/UX development is often the most expensive part of software engineering. Figuring out how to intercept, normalize and expose the domain data is where all of the magic happens. This part is usually trivial by comparison. If most of the business lives in SQL databases, your job is basically done for you. A tool to list the databases and another tool to execute queries against them. That's basically it.

I think there is an emerging B2B/SaaS market here. There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

by bob1029

4/16/2026 at 6:30:56 PM

> The part that makes this powerful is that the LLM is the ultimate UI/UX.

I strongly doubt that. That’s like saying conversation is the ultimate way to convey information. But almost every human process has been changed to forms and structured reports. But we have decided that simple tools does not sell as well and we are trying to make workflow as complex as possible. LLM are more the ultimate tools to make things inefficient.

by skydhash

4/17/2026 at 12:13:51 PM

>The part that makes this powerful is that the LLM is the ultimate UI/UX

Seems pretty questionable to me. Describing things in natural language can be quite imprecise and verbose.

by duskdozer

4/17/2026 at 4:03:25 PM

>UI/UX development is often the most expensive part of software engineering.

I disagree with this as a blanket statement. At least in the tech world (i.e. tech companies that build technology products), UI/UX is often less expensive than the platform and infrastructure parts of the technology products, certainly at any tech that runs at scale.

by voncheese

4/16/2026 at 5:55:26 PM

> There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

Sort of agreed, though I wonder if ai-deployed software eats most use cases, and human consultants for integration/deployment are more for the more niche or hard to reach ones.

by cjbarber

4/17/2026 at 5:10:11 AM

[flagged]

by Moonye666

4/16/2026 at 7:06:33 PM

I am starting to use Codex heavily on non-coding tasks. But I am realizing it works because I work and think like a programmer - everything is a file, every file and directory should have very precise responsibilities, versioning is controlled, etc. I don't know how quick all of this will take to spread to the general population.

by aerhardt

4/17/2026 at 3:26:55 PM

I keep seeing sentiment like this. I work for a relatively cutting edge healthcare enterprise as a sysadmin, and we've only just been given access to copilot chat. I don't think we're going to be having agents doing work for us any time soon.

by nazgulsenpai

4/17/2026 at 7:11:10 AM

Maybe. The point is that in case of software it is fairly easy to verify if that what LLM produced is correct or not. Compiler checks syntax, we can write tests, there is whole infrastructure for checking if something works as expected. In addition, LLM are just text generating algorithms and software is all about text, so if LLM see 1 000 000 a CRUD example in Python, it can generate it easily, as we have a lot of code examples out there thanks to open source.

That's why LLMs shine in coding tasks. If you move to other parts of engineering, like architecture, construction or stuff like investment (there is no AI boom there, why?) where there is no so much source text available, tasks are not so repeatable like in software, or verification is much more complicated, then LLM-s are no longer that useful.

In software also I believe we will see soon that a competitive advantage have not those who adopted LLM, but those who did not. If you ask LLM what framework/language/approach use for a given task, contrary to what people think, LLM is not "thinking", it just generates text answer on the base of what it was trained on, so you will get again and again same most popular frameworks/langs/approaches suggested, even if there is something better, yet not that popular to get into model weights in a significant way.

Interesting times, anyway.

by piokoch

4/17/2026 at 8:10:03 AM

LLMs nowadays make aggressive use of web search. Thus they don't answer only on the base of what they were trained on.

I don't think they are much more prone to using only the same popular frameworks, especially if you ask them to weigh for options.

by jampekka

4/16/2026 at 5:36:21 PM

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

They won't.

Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

> And eventually will the UI/interface be generated/personalized for the user, by the model?

No. Please for the love of god actually go outside and talk to people outside of the tech bubble. People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

by troupo

4/16/2026 at 6:03:49 PM

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

Most people are indifferent to computers. A computer to them is similar to the water pipeline or the electrical grid. It’s what makes some other stuff they want possible. And the interface they want to interact with should be as simple as possible and quite direct.

That is pretty much the 101 of UX. No deep interactions (a long list of steps), no DSL (even if visual), and no updates to the interfaces. That’s why people like their phone more than their desktops. Because the constraints have made the UX simpler, while current OS are trying to complicate things.

So Cowork/Codex would probably go where Siri is right now. Because they are not a simpler and consistent interface. They’ve only hidden all the controls behind one single point of entry. But the complexity still exists.

by skydhash

4/16/2026 at 6:54:56 PM

Just yesterday my non-technical spouse had to solve a moderately complex scheduling problem at work. She gave the various criteria and constraints to Claude and had a full solution within a few minutes, saving hours of work. It ended up requiring a few hundred lines of Python to implement a scheduling optimization algorithm. She only vaguely knows what Python is, but that didn't matter. She got what she needed.

For now she was only able to do that because I set up a modified version of my agentic coding setup on her computer and told her to give it a shot for more complex tasks. It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

by noelsusman

4/16/2026 at 8:02:19 PM

There's no such big opportunity, as the number of programmers' spouses is quite limited. Again, and as the GP rightly suggested, some of the HN-ers here need to go and touch some normie grass, so to speak.

More to the point, nobody wants to be more efficient for the sake of being efficient, we all want to go to work, do our metaphorical 9 to 5 without consuming too much (intellectual and not only) energy, and then back home. In that regard AI is seen as an existential threat to that "lifestyle" and it will be treated as such by regular workers.

by paganel

4/16/2026 at 9:15:32 PM

correct. you cant trust this place for realistic takes - I had a post re. financial stuff downvoted when a former Investment Banker chimed in to back me up.

Comical. Truly comical.

by w2df

4/16/2026 at 8:36:52 PM

> Just yesterday my non-technical spouse

> It ended up requiring a few hundred lines of Python

And she knows those a hundred lines of python work correctly and give her correct result because in this instance Claude managed to produce a working result. What if it didn't? Would vague knowledge of Python have helped her?

> It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

Even though I agree with the sentiment, we've tried non-coding coding how many times now? Once every 5 years? Throwing LLMs into the mix won't help much when in the end you leave the end user hanging, debugging problems and hunting for solutions.

by troupo

4/16/2026 at 8:39:23 PM

Scheduling solutions are easy to verify. For other problems, verification would be harder.

by zozbot234

4/16/2026 at 5:42:16 PM

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

What are you using today? In my experience LLMs are already pretty good at this.

> Please for the love of god actually go outside and talk to people outside of the tech bubble.

In the past week I've taught a few non-technical friends, who are well outside the tech bubble, don't live in the SF Bay Area, etc, how to use Cowork. I did this for fun and for curiosity. One takeaway is that people at startups working on these products would benefit from spending more time sitting with and onboarding users - they're very powerful and helpful once people get up and running, but people struggle to get up and running.

> People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

I obviously agree with this, I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks. I agree that users don't want something that changes all the time. But they do want something that fits them and fits their task. Artifacts on Claude and Canvas on ChatGPT are early versions of this.

by cjbarber

4/16/2026 at 5:46:58 PM

> What are you using today? In my experience LLMs are already pretty good at this.

LLMS are good at "find me a two week vacation two months from now"?

Or at "do my taxes"?

> how to use Cowork.

Yes, and I taught my mom how to use Apple Books, and have to re-teach her every time Apple breaks the interface.

Ask your non-tech friends what they do with and how they feel about Cowork in a few weeks.

> I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks.

How many users you see personalizing anything to their task? Why would they want every app to be personalized? There's insane value in consistency across apps and interfaces. How will apps personalize their UIs to every user? By collecting even more copious amounts of user data?

by troupo

4/17/2026 at 8:23:39 AM

"LLMS are good at "find me a two week vacation two months from now"?"

Of course they are. I gave one a similar prompt a few weeks ago, albeit quite a bit more verbose (actually I just dictated it, train of thought, with couple of 'eh actually, forget what I just said about x, do y instead") and although I wasn't brave enough to give it my credit card and finalize the bookings, it would have paid for the bookings I had it set up for me, had I done that. I gave it some RL constraints, like "we're meeting friends in place xyz at such and such date, make sure we're there then" and it did everything from watching we wouldn't be spending too many hours driving per day to check that hotels are kid friendly to things to do and see and what public holidays there are so that we know when supermarkets close early and a bunch of details I wouldn't have thought of. It checked my (and my wife's) calendar, checked what I had going on work wise, etc.

That is a fully solved 'problem' man. LLMs will run the whole thing for you. Just provide it with the login details to booking websites and you're off to the races.

I did have it upgrade the car, even if that pushed the cost outside the budget I gave it. Next time it'll know LOL.

by roel_v

4/17/2026 at 8:54:58 AM

>although I wasn't brave enough to give it my credit card and finalize the bookings

So it's not trustworthy enough for you, someone clearly interested in the hype of LLMs.

by suddenlybananas

4/17/2026 at 10:19:46 AM

It's a matter of getting used to things. We're only a few weeks further, I maybe would have given it now. It'd need some way to keep it private I guess, maybe I could have used a one off CC number. Those are just technicalities at this point. It got me to the point where I just had to enter my details and click a few confirm buttons. Those are solved problems. I'm not sure why the denialists here are saying those things are 'impossible'. I mean I've seen them happen, what do you want me to say? Claiming this is 'just hype' is ostrich behavior. I've been playing with an abliterated Gemma 4 yesterday on my local machine. Yes it would take longer and require a bunch of harness fiddling, but even if OpenAI and Anthropic would collapse tomorrow, I'm confident I could still do the exact same thing the day after with with what I have right now on my hard disk. I'm not sure what you want me to tell you mate. Yes there's rough edges to work out or just in general workflows to improve but the ideas are way beyond 'proof of concept'. There's people like myself using these things for purposes that 6 months ago were science fiction. I don't care if you believe me or not, I'm just some dude on the internet, but level of delusion on how 'inferior' these models (with proper harnessing) are is mind boggling for someone like me who sees it happen literally 20 centimeters to the side on my screen from where I see people claim that those things are impossible.

by roel_v

4/16/2026 at 5:55:04 PM

> Or at "do my taxes"?

codex did my taxes this year (well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough)

by baq

4/16/2026 at 6:24:03 PM

> well it actually implemented a normalization pipeline and a tax computing engine which then did the taxes, but close enough

You can't seriously believe laymen will try to implement their own tax calculators.

by William_BB

4/16/2026 at 6:48:12 PM

of course not.

what I believe is that laymen will put all their tax docs into codex and tell it to 'do their taxes' and the tool will decide to implement the calculator, do the taxes and present only the final numbers. the layman won't even know there was a calculator implemented.

by baq

4/17/2026 at 9:59:25 AM

> the layman won't even know there was a calculator implemented.

That's on company making the agentic harness. Hiding details of what computer does from the user is the original sin of this industry, and subsequent generations of developers and software companies keeps doubling down on it.

(Case in point - I just downloaded the Codex app for Windows, and in the options I see it has two UI modes of operating, one of which is meant for "non coding" and apparently this means hiding the details of what the agent is doing. This is precisely where the layman is betrayed by the tool.)

by TeMPOraL

4/16/2026 at 6:53:13 PM

Yeah, good luck trusting the output!

by William_BB

4/16/2026 at 6:59:08 PM

check back in a couple of years!

by baq

4/16/2026 at 7:13:36 PM

Ah right! Reminds me of AGI by 2025 :D

by William_BB

4/16/2026 at 6:39:42 PM

If your prompt was more complex than "do my taxes", then this is irrelevant.

by tsimionescu

4/16/2026 at 6:51:28 PM

it was many hours of working with codex, guidance and comparing to known-good outputs from previous years, but a sufficiently smart model would be able to just do it without any steering; it'd still take hours, but my input wouldn't be necessary. a harness for getting this done probably exists today, gastown perhaps or something that the frontier labs are sitting on.

by baq

4/17/2026 at 7:07:20 AM

If you can assume "a sufficiently smart piece of technology" that doesn't exist now, a lot of problems become trivial

by procaryote

4/17/2026 at 11:50:36 AM

yes.

but then, respect the trendline, especially if it's exponential.

by baq

4/17/2026 at 12:34:13 PM

Is it exponential or logistic?

by bavell

4/16/2026 at 7:01:48 PM

> but a sufficiently smart model would be able to just do it without any steering;

Yeah, yeah, we've heard "our models will be doing everything" for close to three years now.

> a harness for getting this done probably exists today, gastown perhaps

That got a chuckle and a facepalm out of me. I would at least consider you half-serious if you said "openclaw", at least those people pretend to be attempting to automate their lives through LLMs (with zero tangible results, and with zero results available to non-tech people).

by troupo

4/16/2026 at 7:30:46 PM

Sounds fascinating! If you wrote an article on this I bet it'd have a good shot at making it to the home page of HN.

by ravenstine

4/16/2026 at 7:33:37 PM

> LLMS are good at "find me a two week vacation two months from now"?

Yes?

===

edit: Just tested it with that exact prompt on Claude. It asked me who I was traveling with, what type of trip and budget (with multiple choice buttons) and gave me a detailed itinerary with links to buy the flights ( https://www.kayak.com/flights/ORD-LIS/2026-06-13/OPO-ORD/202... )

by jeffgreco

4/16/2026 at 9:32:05 PM

I'd love to try and replicate, but I'm not letting any of these tools anywhere near a real browser and capabilites :)

by troupo

4/17/2026 at 7:31:28 AM

Perfect - and this use case will be enshitificated first. LLM provider will charge small fee for proper recommendation placing. Got to recoup investment.

by mazurnification

4/16/2026 at 8:38:51 PM

This is effectively how I treat my AI agents. A lot of the reason this doesn't work well for people today is due to context/memory/harness management that makes it too complex for someone to set up if they don't want a full time second job or just like to tinker.

If you productize that it will be an experience a lot of people like.

And on the UI piece, I think most people will just interact through text and voice interfaces. Wherever they already spend time like sms, what's app, etc.

by a1j9o94

4/18/2026 at 9:25:30 PM

[dead]

by jhizzard

4/16/2026 at 5:29:19 PM

Most knowledge workers aren't willing to put in the effort so they're getting their work done efficiently.

by trvz

4/16/2026 at 5:50:12 PM

Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools. A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

by louiereederson

4/17/2026 at 8:40:57 AM

> A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

I think something like SQL w/ row-level security might be the answer to the problem. You often want to constrain how the model can touch the data based upon current tool use or conversation context. Not just globally. If an agent provides a tenant id as a required parameter to a tool call, we can include this in that specific sql session and the server will guarantee all rules are followed accordingly. This works for pretty much anything. Not just tenant ids.

SQL can work as a bidirectional interface while also enforcing complex connection level policies. I would go out of band on a few things like CRUD around raw files on disk, but these are still synchronized with the sql store and constrained by what it will allow.

The safety of this is difficult to argue with compared to raw shell access. The hard part is normalizing the data and setting up adapters to load & extract as needed.

by bob1029

4/16/2026 at 6:00:57 PM

> Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools.

What would make it not be a monolith? To me it seems like there'll be a big advantage (e.g. in distribution, user understanding) for most people to be using the same product / similar interface. And then the agent and the developer of that interface figure out all the integrations under that, invisible to the user.

by cjbarber

4/16/2026 at 8:41:31 PM

I mean there is a runtime layer that needs to be developed, and some of it may live in CC/Codex and some might live in the various enterprise systems. Someworkflow automations and some amount of the semantic layer may for instance exist in your CRM/ERP/data platform. Yes the front-end would be owned by the chat interface, but part of the solution may exist in the various enterprise systems. This would be closer to a distributed system than a monolith. The demos and marketing language point to this as the direction of travel (i.e. the reference to Atlassian Rovo, etc.).

by louiereederson

4/16/2026 at 9:20:47 PM

Thanks for answering!

by cjbarber

4/16/2026 at 5:37:14 PM

I think the coding market will be much larger. Knowledge work is kind of like the leaf nodes of the economy where software is the branches. That's to say, making software easier and cheaper to write will cause more and more complexity and work to move into the Software domain from the "real world" which is much messier and complicated.

by eldenring

4/16/2026 at 5:38:36 PM

Yes, and the same thing will happen in non-coding knowledge work too. Making knowledge work cheaper will cause complexity to increase, more knowledge work.

by cjbarber

4/16/2026 at 5:56:28 PM

I don't think so, the whole point of writing software is it is a great sink for complexity. Encoding a process or mechanism in a program makes it work (as defined) for ever perfectly.

An example here is in engineering. Building a simulator for some process makes computing it much safer and consistent vs. having people redo the calculations themselves, even with AI assistance.

by eldenring

4/16/2026 at 6:01:52 PM

The history of both knowledge work and software engineering seems to be increasing in both volume and complexity, feels reasonable to me to bet on both of those trendlines increasing?

by cjbarber

4/16/2026 at 6:47:26 PM

Yes, I have a theory - that higher efficiency becomes structural necessity. We just can't revert to earlier inefficient ways. Like mitochondria merging with the primitive cell - now they can't be apart.

by visarga

4/17/2026 at 5:15:27 PM

I still think we're several "my agent sent an inappropriate email to all my contacts" away from people figuring out proper security controls for these things

by joshysmith

4/17/2026 at 9:52:16 AM

I agree, and I think this extends to programming too. A lot of of software practices are built on the expectation humans are writing, reviewing and shipping code with that quickly becoming the case, processes, practices and even programming languages themselves will evolve to what agents need, rather than humans.

a version of Conway's law aimed specifically at agentic communication rather than human.

by frez1

4/16/2026 at 5:49:42 PM

really struggling to understand where this is coming from, agents haven't really improved much over using the existing models. anything an agent can do, is mostly the model itself. maybe the technology itself isn't mature yet.

by jorblumesea

4/16/2026 at 6:02:30 PM

My view is different. Agent products have access to tools and to write and run code. This makes them much more useful than raw models.

by cjbarber

4/16/2026 at 6:53:01 PM

Yes, I think they unlock a whole new level of capability when they have a r/w file system (memory), code execution and the web.

by visarga

4/17/2026 at 6:58:30 AM

That's not the model, that's the box the model came in.

It's unlikely we've hit the limits on improving agent UX, but there are some fundamental limits on LLMs that seem unlikely to be fixed by better UX.

by flir

4/16/2026 at 6:21:17 PM

You know what happens to a predator who makes its prey go extinct?

AI is doing the same

by croes

4/16/2026 at 7:01:22 PM

Totally agree, AI interfaces will become the norm.

Even all the websites, desktop/mobile apps will become obsolete.

by andoando

4/17/2026 at 8:36:50 AM

AI won't kill apps, it will just change who 'clicks' the buttons. Even the most powerful AI needs a source of truth and a structured environment to pull data from. A world without websites is a world where AI has nothing to read and nowhere to execute. We aren’t deleting the UI. We’re just building the backends that feed the agents.

by donnisnoni

4/16/2026 at 5:23:54 PM

There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.

by daviding

4/16/2026 at 7:47:40 PM

> There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up.

I've finally started getting into AI with a coding harness but I've take the opposite approach. usually I have the structure of my code in my mind already and talk to the prompt like I'm pairing with it. while its generating the code, I'm telling it the structure of the code and individual functions. its sped me up quite a lot while I still operate at the level of the code itself. the final output ends up looking like code I'd write minus syntax errors.

by cultofmetatron

4/16/2026 at 7:57:44 PM

This is the way to do it if you're a serious developer, you use the AI coding agent as a tool, guiding it with your experience. Telling a coding agent "build me an app" is great, but you get garbage. Telling an agent "I've stubbed out the data model and flow in the provided files, fill in the TODOs for me" allows you the control over structure that AI lacks. The code in the functions can usually be tweaked yourself to suit your style. They're also helpful for processing 20 different specs, docs, and RFCs together to help you design certain code flows, but you still have to understand how things work to get something decent.

Note that I program in Go, so there is only really 1 way to do anything, and it's super explicit how to do things, so AI is a true help there. If I were using Python, I might have a different opinion, since there are 27 ways to do anything. The AI is good at Go, but I haven't explored outside of that ecosystem yet with coding assistance.

by ok_dad

4/17/2026 at 9:38:53 AM

If you use a type checker in strict mode (e.g. pyright with "typeCheckingMode: strict") and a linter with strict rules (e.g. ruff with many rules enabled), the output space is constrained enough that you can get pretty consistent Python code. I'm not saying this is "good Python" overall, but it works pretty well with agents.

by maleldil

4/17/2026 at 7:00:06 AM

Ai is even good in turbo pascal if you instruct it right

by holoduke

4/16/2026 at 8:14:31 PM

My workflow is quite similar. I try to write my prompts and supporting documentation in a way that it feels like the LLM is just writing what is in my mind.

When im in implementation sessions i try to not let the llm do any decision making at all, just faster writing. This is way better than manually typing and my crippling RSI has been slowly getting better with the use of voice tools and so on.

by mlcruz

4/17/2026 at 8:38:24 AM

This is the way.

The funny thing is my expectation was that adoption of AI coding would kill the joy of getting into a flow state but I've actually found myself starting to slip into an alternate type of flow state.

Instead of hammering out code manually over an hour the new flow state is a back and forth with the LLM on something that's clear in my mind. It's a collaborative state where I'm ultimately not writing much code manually but I'm still bouncing between technical thoughts, designing architecture, reviewing code, switching direction etc.

by cbovis

4/17/2026 at 1:32:25 PM

Yeah - similar thing for me as well. A lot of times there would be something I want to work on that would be boilerplate/repetitive/laborious work and I would just procrastinate it for as long as possible, working on other things, until I'd finally get around to doing it. Now those are just immediately completed with a simple prompt and instead of going with the initial implementation, I have the bandwidth to tweak and refine details that I would have skipped over before just to ship.

by jclardy

4/17/2026 at 7:58:31 AM

I personally have been finding good results "hiding the code" behind the harnesses. I do have to rely on verification and testing a lot, which I also get the AI to do, but for most of the cases it works out well enough. A good verification and testing setup with automated, strict reviewing goes a long way.

by dear_prudence

4/17/2026 at 7:08:52 PM

[dead]

by canarias_mate

4/16/2026 at 8:04:12 PM

The fact that the Codex app is still unavailable on Linux makes me think the target audience isn't people who understand code.

by aniviacat

4/17/2026 at 1:13:22 AM

Are you referring to the CLI Codex? That can be installed with NPM or Homebrew, and is fully open source.

by Zetaphor

4/19/2026 at 8:40:54 AM

Yes and the official docs even mention that if you’re on Windows you should run Codex CLI via WSL. Meaning, it’s specifically designed for unix systems.

by laurels-marts

4/16/2026 at 8:37:01 PM

Right. It's rather for vibecoders than for software engineers.

by huqedato

4/16/2026 at 7:12:10 PM

The power to the people is not us the developers and coders.

We know how to do a lot of things, how to automate etc.

A billion people do not know this and probably benefit initially a lot more.

When i did some powerpoint presentation, i browsed around and draged images from the browser to the desktop, than i draged them into powerpoint. My collegue looked at me and was bewildered how fast I did all of that.

by Glemllksdf

4/16/2026 at 7:33:35 PM

I've helped an otherwise very successful and capable guy (architect) set up a shortcut on his desktop to shut down his machine. Navigating to the power down option in the menu was too much of a technical hurdle. The gap in needs between the average HNer and the rest of the world is staggering

by Avicebron

4/16/2026 at 8:02:50 PM

This. I’m sure everyone has a similar story of how difficult it was to explain the difference between a program shortcut represented as a visual icon on a desktop versus the actual executable itself to somebody who didn’t grow up in the age of computing. And this was Windows… the purported OS for the masses not the classes.

by vunderba

4/16/2026 at 7:37:48 PM

Initially I thought you meant “software architect” and I was flabbergasted at how that’s possible. Took me a minute to realize there’s other architects out there lol.

by Insanity

4/16/2026 at 7:49:41 PM

I think you just proved the point here about the divide between the average user of this site and the population.

by djcrayon

4/16/2026 at 7:58:51 PM

The same way most people hear "legacy" and think it's something good

by laszlojamf

4/17/2026 at 9:31:50 AM

It is? :)

by ultratalk

4/16/2026 at 8:34:09 PM

Oh boy, the gap between the average it professional and ai pros here is already staggering, let alone the rest of the world. I feel like an alien, no matter where.

by siva7

4/16/2026 at 7:37:31 PM

right clicking start menu and clicking shutdown is too hard? amazing

by MassiveQuasar

4/16/2026 at 7:59:55 PM

Yes! Even closing the windows of programs that users no longer need is hard.

It's easy to develop a disconnect with the level that average users operate at when understanding computers deeply is part of the job. I've definitely developed it myself to some extent, but I have occasional moments where my perspective is getting grounded again.

by gmueckl

4/17/2026 at 9:46:35 AM

I don't think that's representative of most non-CS professionals. Most people in the fields I know (mostly professors, medical doctors, and businesspeople) can use google chrome, word, powerpoint, and a little of excel decently. There are the occasional few who confuse spreadsheets and databases, but no one who thinks shutting down computers or closing windows is hard. Heck, my ageing dad managed to troubleshoot his printer without any help, and he has no formal computer experience whatsoever.

HN has a long history of patronising the "average user" in the guise of paternal figures who don't realise that what they are doing is belittling the vast majority of tech users. I'm guilty of it myself. But they're capable of a lot more than we think they are.

Ultimately, it comes down to the willingness people have to learn new things. If they're curious enough to think about how things work, they'll be fine.

by ultratalk

4/19/2026 at 12:30:53 AM

I'm not doing this to be patronising, more like telling people or myself that assumptions i make, are just not necessarily true for everyone.

And weirdly enough, a Task like sorting a file with data in it, if you are not a professional, windows offers very if not non single way of doing this. You would need to understand file types, understand that csv can be imported for excel, you need excel, than you need to understand excel how to sort stuff in it.

The ffirst thing I do in Excel is select the pseudo table and click on table -> insert to make it a sortable / real table. I showed this to every one in my Team full of studied CS people because non of them knew this.

by Glemllksdf

4/17/2026 at 7:50:02 PM

Well, I didn't mean for this to be patronizing, but rather as a warning that not everybody is at the same level and the spread is huge. I see it often enough.

by gmueckl

4/16/2026 at 7:53:27 PM

It's a while since I've used Windows but I seem to remember it giving a choice of sleep, logout, switch session etc. I could totally see someone wanting a single button for it.

by antonvs

4/16/2026 at 10:37:23 PM

KDE is even worse. No matter which of those you choose, the next screen requires you to choose again. It's been this way since KDE 4.0.

by dotancohen

4/17/2026 at 8:30:52 AM

Ah yes, this task fails hard at the xkcd.com/627/ tactic of "Find a menu item or button that looks related to what you want to do..."

What do I want to do? "turn off my computer" What button do I press? "start"

by weeb

4/16/2026 at 8:01:26 PM

> The power to the people is not us the developers and coders.

> We know how to do a lot of things, how to automate etc.

You need to know these things if you want to use AI effectively. It's way too dumb otherwise, in fact it's dumb enough to be quite dangerous.

by zozbot234

4/16/2026 at 6:47:32 PM

Yes, the code is still important. For example, I had tasked Codex to implement function calling in a programming language, and it decided the way to do this was to spin up a brand new sub interpreter on each function call, load a standard library into it, execute the code, destroy the interpreter, and then continue -- despite an already partial and much more efficient solution was already there but in comments. The AI solution "worked", passed all the tests the AI wrote for it, but it was still very very wrong. I had to look at the code to understand it did this. To get it right, you have to either I guess indicate how to implement it, which requires a degree of expertise beyond prompting.

by ModernMech

4/16/2026 at 6:51:28 PM

Yep, all models today still need prompting that requires some expertise. Same with context management, it also needs both domain expertise as well as knowing generally how these models work.

by porridgeraisin

4/16/2026 at 7:20:35 PM

Do you ask it for a design first? Depending on complexity I ask for a short design doc or a function signature + approach before any code, and only greenlight once it looks sane.

by ai-tamer

4/16/2026 at 7:50:03 PM

I understand the "just prompt better" perspective, but this is the kind of thing my undergraduate students wouldn't do, why is the PhD expert-level coder that's supposed to replace all developers doing it? Having to explicitly tell it not to do certain boneheaded things, leave me wondering: what else is it going to do that's boneheaded which I haven't explicit about?

by ModernMech

4/16/2026 at 8:05:48 PM

Because it's not "PhD-expert level" at all, lol. Even the biggest models (Mythos, GPT-Pro, Gemini DeepThink) are nowhere near the level of effort that would be expected in a PhD dissertation, even in their absolute best domains. Telling it to work out a plan first is exactly how you would supervise an eager but not-too-smart junior coder. That's what AI is like, even at its very best.

by zozbot234

4/16/2026 at 9:49:28 PM

That's not the best framing, IMO. More important is, even a PhD expert human wouldn't one-shot complex programs out of short, vague requests. There's a process to this. Even a thesis isn't written in one, long, amphetamine-fueled evening. It's a process whose every steps involves thinking, referencing sources, talking with oneself and other people, exploring possibilities, going two steps forward and one step back, and making decisions at every point.

Those decisions are, by large, what humans still need to do. If the problem is complex, and you desperately avoid needing to decide, then what AI produces will surprise you, but in a bad way.

by TeMPOraL

4/16/2026 at 8:27:47 PM

I understand that but 1) expert-level performance is how they are being sold; but moreover 2) the level of hand-holding is kind of ridiculous. I'll give another example, Codex decided to write two identical functions linearize_token_output and token_output_linearize. Prompting it not to do things like that feels like plugging holes in a dyke. And through prompting, can you even guarantee it won't write duplicate code?

I'll give a third example: I gave Codex some tests and told it to implement the code that would make the tests pass. Codex wrote the tests into the testing file, but then marked them as "shouldn't test", and confirmed all tests pass. Going back I told it something to the effect "you didn't implement the code that would make the tests work, implement it". But after several rounds of this, seemingly no amount of prompting would cause it to actually write code -- instead each time it came back that it had fixed everything and all tests pass, despite only modifying the tests file.

In each example, I keep coming back to the perspective that the code is not abstracted, it's an important artifact and it needs/deserves inspection.

by ModernMech

4/19/2026 at 4:57:05 PM

Note to self: don't go away for the weekend without HN ;-)

Personally I run several agents. At minimum Codex and Claude, so they cross-check each other. Exactly to avoid what you describe. Duplicate functions or "tests all pass" when nothing was implemented is the kind of thing a second model, reading the diff with fresh eyes, tends to flag immediately.

But it takes coordination and solid skills + rules to make it work. Tomorrow's battle -> AI skillers.

by ai-tamer

4/19/2026 at 9:23:50 PM

https://arxiv.org/abs/2604.04721 I like it when researchers confirm my intuition

by ai-tamer

4/16/2026 at 8:37:15 PM

> the code is not abstracted, it's an important artifact and it needs inspection.

That's a rather trivial consideration though. The real cost of code is not really writing it out to begin with, it's overwhelmingly the long-term maintenance. You should strive to use AI as a tool to make your code as easy as possible to understand and maintain, not to just write mountains of terrible slop-quality code.

by zozbot234

4/17/2026 at 9:36:17 AM

I think this would work much better if there were constraints in place, a software stack clearly separating different concerns - e.g. you just ask AI to write business logic while you already have data sources, auth, etc, configured.

But that's not how popular, modern software stacks work. They are like "you can do anything, anything at all!".

Consider Visual Basic for Applications - normally your code is together with data in one document, which you can send to colleague. It can be easily shared, there's nothing to set up, etc.

That's not true for JS, Python, Java, etc - you need to install libraries, you need to explicitly provide data, etc. Software industry as a whole embraced complexity because devs are paid to deal with complexity.

Now AI has to use same software stacks as the rest of the industry, making software fragile, requiring continuous maintenance, etc. VBA code which doesn't use any arcane features would require no maintenance and can work for decades.

So my guess is that the bottleneck might be neither models nor harness/wrapper - but overall software flimsiness and poor architectural decisions

by killerstorm

4/16/2026 at 7:30:02 PM

It's reminds me what happened with Frontpage, ultimately people are going to learn the same lesson, there's no replacement for the source code.

by realusername

4/16/2026 at 8:06:13 PM

In UI, I’m pretty sure that replacement is already here. We’ll be lucky if at least backend stays a place where people still care about the actual source.

by vlapec

4/16/2026 at 9:17:05 PM

I'd say the opposite, the frontend code is so complex these days that you can't escape the source code.

If you stick to tailwind + server side rendered pages you can probably go pretty far with just AI and no code knowledge but once you introduce modern TS tooling, I don't think it's enough anymore.

by realusername

4/16/2026 at 8:53:54 PM

Check it out: you can open the repo in vim and compare changes with git, for the coderiest coding experience

by woah

4/16/2026 at 10:44:35 PM

I knew a guy who did 6510 and 68000 assembler for many years and had a hard time using higher order languages as well as DSLs. “Only assembler is real code. Everything else is phony, bloat for what can be done way better with a fraction of the C++ memory footprint.”

Well that guy was me and while I still consider HOLs as weird abstractions, they are immensely useful and necessary as well as the best option for the time being.

SQL is the classic example for so called declarative languages. To this day I am puzzled that people consider SQL declarative - for me it is exactly the opposite.

And the rise of LLMs proof my point.

So the moral of the story is, that programming is always about abstractions and that there have been people, who refused to adopt some languages due to a different reference.

The irony is, that I will also miss C like HOLs but Prompt Engineering is not English language but an artificial system that uses English words.

Abstractions build on top of abstractions. For you code is HOL, I still see a compiler that gives you machine code.

by _the_inflator

4/17/2026 at 4:11:24 AM

A cross join is a for loop

by whattheheckheck

4/17/2026 at 11:14:57 AM

As a child I couldn't understand why I have to talk in a cryptic language and can't just write a for loop when working with DBs. In hindsight it was a valuable lesson that implementation details matter even though I wouldn't want them to.

by yard2010

4/17/2026 at 1:30:08 PM

I think the intent is more "we won't need coders" ... the real goal is to get to the point where Product Managers can just write specs and a working product comes out the other end.

These people HATE that developers have been necessary and highly paid and, in their view, prima donnas. I think most of the people running these companies actually despise developers.

by Ensorceled

4/16/2026 at 5:33:32 PM

Hot take: we (not I, but I reluctantly) will keep calling it code long after there's no code to be seen.

Like we did with phones that nobody phones with.

by avaer

4/16/2026 at 6:58:30 PM

Code isn't going anywhere. Code is multiple orders of magnitude cheaper and faster than an LLM for the same task, and that gap is likely to widen rather than contract because the bigger the AI gets the sillier it gets to use it to do something code could have done.

Compare the actual operations done for code to add 10 8-digit numbers to an LLM on the same task. Heck, I'll even say, forget the possibility the LLM may be wrong. Just compare the computational resources deployed. How many FLOPS for the code-based addition? How many for the LLM? That's a worst-case scenario in some ways but it also gives you a good sense of what is going on.

Humans may stop looking at it but it's not going anywhere.

by jerf

4/16/2026 at 9:47:54 PM

I think grandparent comments were talking about how Codex designers try to push LLMs to displace the interface to code, not necessarily code itself. In that view, code could stay as the execution substrate, but the default human interaction layer moves upward, the way higher-level languages displaced direct interaction with lower-level ones. From a HCI perspective, raw computational efficiency is not the main question; the bottleneck is often the human, so the interface only has to be fast and reliable enough at human timescales.

by gobdovan

4/16/2026 at 6:21:09 PM

Very much agree.

Everyday people can now do much more than they could, because they can build programs.

The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

Today, when we think of someone "using the computer" we gravitate towards people using apps, installing them, writing documents, playing games. But very rarely have we thought of it as "coding" or "making the computer do new things" -- that's been reserved, again, for coders.

Yet, I think that a future is fast approaching where using the computer will also include simply coding by having an agent code something for you. While there will certainly still be apps/programs that everyone uses, everyone will also have their own set of custom-built programs, often even without knowing it, because agents will build them, almost unprompted.

To use a computer will include _building_ programs on the computer, without ever knowing how to code or even knowing that the code is there.

There will of course still be room for coders, those who understand what's happening below. And of course that software engineers should know how to code (less and less as time goes on, though, probably), but no doubt to me that human-computer interaction will now include this level of sophistication.

We are living in the future and I LOVE IT!

by jorl17

4/16/2026 at 9:48:01 PM

> I am watching it enable so many of my friends and family who have no idea how to code.

Be careful what you wish for, this is going to be a double edged sword like YouTube is. YouTube allowed regular people without money and industry connections to make all sorts of quality, niche content. But for every bit of great content, there’s 1000 times as much garbage and outright misleading shit.

Giving people without any clue how computing works the ability to create software that interfaces with the outside world is likewise going to create some great stuff and 1000 times as much buggy and dangerous stuff. And allow untold numbers of scammers with no technical skill the ability to scam the wider world.

by xienze

4/16/2026 at 10:00:47 PM

I'm aware, and I'll very much take those odds. This is just another problem for humanity to solve in its quest to empower itself.

I'm not sure how we're going to solve the obviously relevant problem of slop, but I would rather die trying, than restrict access to knowledge and capability because of evil. I believe in the GOOD of humanity. We WILL find a way.

by jorl17

4/16/2026 at 7:02:43 PM

> The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

People on HN are seriously delusional.

AI removed the need to know the syntax. Your grandma does not know JS but can one shot a React app. Great!

Software engineering is not and has never been about the syntax or one shotting apps. Software engineering is about managing complexity at a level that a layman could not. Your ideal word requires an AI that's capable of reasoning at 100k-1 million lines of code and not make ANY mistakes. All edge cases covered or clarified. If (when) that truly happens, software engineering will not be the first profession to go.

by William_BB

4/16/2026 at 7:11:10 PM

I wonder how good AI is at playing Factorio. That’s the closest thing I’ve ever done to programming without the syntax.

by cameronh90

4/17/2026 at 10:40:25 AM

https://arxiv.org/abs/2503.09617

by suddenlybananas

4/16/2026 at 7:04:39 PM

I never said Software Engineering is dying or needs to go. I'm not the least bit afraid of it.

In fact, in the very message you're replying to, I hinted at the opposite (and have since in another post stated explicitly that I very much think the profession will still need to exist).

My ideal world already exists, and will keep getting better: many friends of mine already have custom-built programs that fit their use case, and they don't need anything else. This also didn't "eat" any market of a software house -- this is "DIY" software, not production-grade. That's why I explicitly stated this is a new way of human-computer-interaction, which it definitely is (and IMO those who don't see this are the ones clearly deluded).

by jorl17

4/16/2026 at 8:36:13 PM

> People on HN are seriously delusional.

Yes you sure are.

by thunky

4/16/2026 at 8:31:24 PM

[dead]

by 3fgdf

4/17/2026 at 8:32:30 AM

> Everyday people can now do much more than they could, because they can build programs.

Indeed. Just spoke to a buddy, he's got some electronics knowledge, he's been code-curious but never gotten past very simple bash scripts and Excel sheets (vlookup etc to drive calculations).

He got himself a Claude subscription and has now implemented a non-trivial Arduino project, involving multiple CAN-bus modules and an interactive, dynamic web interface to control all this. The web interface detects the CAN-bus modules and populates the web interface based on that, and allows him to adjust the control logic.

It's a project he's had in his head for a few years and now was able to realize on his own (modulo Claude).

by magicalhippo

4/17/2026 at 4:06:55 PM

Exactly the kind of thing I've been seeing too. And often with people who know even less.

You spoke of an Arduino, and I have a friend with zero coding knowledge who built a fun project with an ESP32 and a tiny camera to detect when they are "not looking at the computer".

But, sure, people keep saying we're delusional when we say that this is where the world is headed: people building things, so often without even knowing they are doing anything "different" than what they were doing before, when they simply clicked buttons and "things just happened in the computer".

by jorl17

4/17/2026 at 5:40:12 PM

Not a musician so perhaps not accurate but I feel it's a bit like synths and DAWs. You might not be good at playing an instrument, something which requires a certain dexterity, finesse and lots of training. But with some virtual synths and a DAW you can make music that people enjoy.

by magicalhippo

4/16/2026 at 7:28:36 PM

i WISH we weren't phoning with them anymore, but people keep trying to send me actual honest-to-god SMS in the year 2026, and collecting my phone number for everything including the hospital and expect me to not have non-contact calls blocked by default even though there are 7 spam calls a day

by throawayonthe

4/16/2026 at 8:06:58 PM

In what world would I prefer to give someone access to me via a messaging app rather than a fully-async text SMS message? I don't even love that people can see if you've read their texts now.

Fully agree about phone calls though.

by ang_cire

4/17/2026 at 12:25:47 PM

I believe that in all of South America people exclusively use WhatsApp to communicate via text because SMS is only used for spam and bad 2FA. Companies are even using WhatsApp for 2FA now instead of SMS and the fact that americans use SMS is viewed as a joke.

by hootz

4/19/2026 at 5:22:35 PM

Ah yes, those hilarious US Americans, not wanting to trust all their chat communication to Meta...

Edit: literally just saw this article afterwards:

https://www.lbc.co.uk/article/dubai-police-spied-private-wha...

UAE government getting access to private WhatsApp messages (presumably in conjunction with Meta).

by ang_cire

4/17/2026 at 7:08:38 PM

[dead]

by Bridgexapi

4/16/2026 at 6:28:10 PM

Yeah, that's indeed a hot take. I am curious what kind of code you write for a living to have an opinion like this.

by William_BB

4/16/2026 at 6:39:43 PM

It's not the code I write, it's what I've noticed from people in 25 years of writing code in the corner.

All of my friends who would die before they use AI 2 years ago now call themselves AI/agentic engineers because the money is there. Many of them don't understand a thing about AI or agents, but CC/Codex/Cursor can cover up for a lot.

Consequently, if Claude Code/"coding agents" is a hot topic (which it is), people who know nothing about any of this will start raising money and writing articles about it, even (especially) if it has nothing to do with code, because these people know nothing about code, so they won't realize what they're saying makes no sense. And it doesn't matter, because money.

Next thing you know your grandma will be "writing code" because that's what the marketing copy says. That's all it takes for the zeitgeist to shift for the term "code". It will soon mean something new to people who had no idea what code was before, and infuriating to people who do know (but aren't trying to sell you something).

I know that's long-winded but hopefully you get where I'm coming from :D.

by avaer

4/16/2026 at 9:56:36 PM

Well put, but I don't like it. Though, I've seen this exact pattern multiple times now.

by boxedemp

4/16/2026 at 6:56:37 PM

Totally this. People who don't see this seem to think we're in some sort of "bubble" or that we don't "ship proper code" or whatever else they believe in, but this change is happening. Maybe it'll be slower than I feel, but it will definitely happen. Of course I'm in a personal bubble, but I've got very clear signs that this trend is also happening outside of it.

Here's an example from just yesterday. An acquaintance of mine who has no idea how to code (literally no idea) spent about 3 weeks working hard with AI (I've been told they used a tool called emergent, though I've never heard of it and therefore don't personally vouch for it over alternatives) to build an app to help them manage their business. They created a custom-built system that has immensely streamlined their business (they run a company to help repair tires!) by automating a bunch of tasks, such as:

- Ticket creation

- Ticket reporting

- Push notifications on ticket changes (using a PWA)

- Automated pre-screening of issues from photographs using an LLM for baseline input

- Semi-automated budgeting (they get the first "draft" from the AI and it's been working)

- Deep analytics

I didn't personally see this system, so I'm for sure missing a lot of detail. Who saw it was a friend I trust and who called me to relay how amazed they were with it. They saw that it was clearly working as intended. The acquaintance was thinking of turning this into a business on its own and my friend advised them that they likely won't be able to do so, because this is very custom-built software, really tailored to their use case. But for that use case, it's really helped them.

In total: ~3 weeks + around 800€ spent to build this tool. Zero coding experience.

I don't actually know how much the "gains" are, but I don't doubt they will definitely be worth it. And I'm seeing this trend more and more everywhere I look. People are already starting to use their computer by coding without knowing, it's so obvious this is the direction we're going.

This is all compatible with the idea of software engineering existing as a way of building "software with better engineering principles and quality guarantees", as well as still knowing how to code (though I believe this will be less and less relevant).

My experience using LLMs in contexts where I care about the quality of the code, as well as personal projects where I barely look at the code (i.e. "vibe coding") is also very clearly showing me that the direction for new software is slowly but surely becoming this one where we don't care so much about the actual code, as long as the requirements are clear, there's a plethora of tests, and LLMs are around to work with it efficiently (i.e. if the following holds -- big if: "as the codebase grows, developing a feature with an LLM is still faster than building it by hand") . It is scary in many ways, but agents will definitely become the medium through which we build software, and, my hot-take here (as others have said too) is that, eventually, the actual code will matter very little -- as long as it works, is workable, and meets requirements.

For legacy software, I'm sure it's a different story, but time ticks forward, permanently, all the time. We'll see.

by jorl17

4/16/2026 at 10:47:42 PM

From what you describe, I probably would have charged them a tad more and taken a tad longer to deliver. However they would receive a production-ready application, that properly filters and sanitises and normalizes input, that is robust and resilient and reasonably extensible, and has a logical database format.

Tell me, does this vibe coded app running this business properly handle monetary addition, such as in invoicing or summarizing or deciding how big a check to write to the tax man? Are you sure? No floating point math hiding intermittent bugs?

by dotancohen

4/16/2026 at 10:51:22 PM

Too bad they couldn't reach you.

by jorl17

4/16/2026 at 10:53:02 PM

That's actually a great point. The real problem we have is putting businesses and clients together. And traditional advertising is certainly not the answer.

by dotancohen

4/16/2026 at 11:17:52 PM

My point was ~~two~~(edit: three)-fold (which, I guess, reading again is just the same thing said three times slightly differently...sorry!), more along the lines of:

- I don't think they need the extra you would offer them. I'm pretty sure they didn't add anything related to accounting. I also have to admit I'm a bit shocked that you would do all of what I described for "a tad more" than 900€, especially taking "a tad" longer than 3 weeks. To me, that's barely anything. But I guess I'll take your word for it.

- For many things, people no longer need the specialized production-ready work, precisely because they have this powerhouse at the fingertips. They "didn't find you" because it would make little sense to do so. It would take longer (which in some sense is higher risk), be more expensive, inherently be more likely to take even longer to really reach the right requirements (getting the knowledge out of their head and into yours would certainly add some overhead) and, in the end, it will likely really not bring in enough superiority for their use case.

- Because people don't need specialized production work, they won't even think of looking for it -- they already have the tools "at home". Why would I go out to buy a an electric screwdriver if I have a manual screwdriver at home? It's good enough. Sure, some people will try to use the manual one even when they shouldn't, but that's life: some people are better than others at figuring this shit out. I'm (slightly) hoping the AIs themselves will help people realize when they're trying to do something they shouldn't.

I truly believe that, for the most part, software engineering is not under threat. That there are many places where software engineering will continue to be essential. We're not developers and never have been. I think coding "manually" will die out, but not the knowledge of code (at least not for quite some time).

At the same time that I believe this, I also really believe that there is a sort of "new DIY" market (or a new "way of interacting with the machine") where ordinary people will just code things without needing to know how to code. Most of these won't be products, but they will be sufficient, for a sufficiently long time, for their needs. If/when they need more, they'll likely need the help of a software engineer, and that's more than fine.

I'm not saying this is the case with you (it doesn't seem like it is), but I see so much pushback from people who seem....either scared or in denial(?) about this (to me) very obvious new emerging way of interacting with a computer. People ask the computer to do things, and the computer builds programs and integrations between programs that....do the thing! When I was a kid, this would have been amazing, and I'm so excited that it exists now. And of course some of these "ordinary" people will also have this be their gateway into proper software engineering.

When I say friends and family, I mean it: they're all slowly starting to build tiny apps without knowing a single line of code. They often don't look good and have idiosyncrasies, but they're great for them. A friend of mine has a personal assistant with voice + telegram bot that edits their calendar and their notion, all deployed with railway (when they showed this to me I was gobsmacked!). They have ZERO coding experience...and yet...they have built this! I wouldn't use it (too finicky for me), but they swear by it and love it. (I audited the code after they asked me to and didn't find any security issues.)

Just like my dad used to grab a bit of scotch-tape to patch things up around the house, or like my grandpa used to build his toys, and furniture, he can now grab an AI and patch things up in his digital life and workplace -- how can people not see that this is happening? And, worse, why are they so very clearly upset about it and wishing that it just doesn't succeed? Is it job safety? The feeling that their favorite part of the job is being profoundly shaken up (coding)? I guess I can sort of understand and sympathize with feeling scared, but....not with the denial of it.

You know how so many people run their businesses off of excel spreadsheets? Often for way longer than they should, no doubt -- but they do. This is sort of the next step after that for some businesses. But, most of all, I really mean that for people's personal needs, interacting with the computer will involve the computer building some code for them to achieve their goals. Yes, MS is fumbling copilot, but one such integrated AI will eventually succeed, and people will open up their "start menu" / "copilot" / "Claude Cowork" / "whatever" and say "I want to create a library for my comic book collection", and over a couple of prompts (perhaps over a couple of days), their computers will just...build it. They will sometimes use existing solutions, but often they'll just build a good-enough thing that will be almost exactly what this person wants. And that's....awesome. So awesome that we're at a point where computers will enable people to do so much more.

by jorl17

4/16/2026 at 11:34:01 PM

I agree with just about everything you've mentioned.

  > getting the knowledge out of their head and into yours

That's creating the spec, which is a significant portion of the work and the time (and thus the budget). Maybe I should suggest to potential clients to bang out a preliminary spec with their favourite AI chatbox before meeting. That could save significant time for both of us, and that's money. And it would force me to articulate exactly what value I add rather than having them press the "Code It For Me" button.

by dotancohen

4/17/2026 at 7:26:58 PM

  > People ask the computer to do things, and the computer builds programs and integrations between programs that....do the thing!

The computer builds a program that ostensibly does the thing. Under ideal conditions, while under negligible load, with expected inputs and a well-meaning operator. Real world software must consider malformed or malicious input, cyclomatic complexity, resource usage, atomicity, sudden loss of power, the ability to actually restore a backup, floating point math, race conditions, I unnormalized text, security, reproducibility, debuggability, logging, and so many other things.

My career is pivoting from writing software to cleaning up other people's vibe-coded software.

I actually love the vibe-coding movement as it makes custom software available to more people, and also extends my own career as I pivot to clean up the messes.

by dotancohen

4/16/2026 at 7:40:05 PM

Fully agree. Non-dev solutions are multiplying, but devs also need to get much more productive. I recently asked myself "how many prompts to rebuild Doom on Electron?" Working result on the third one. But, still buggy though.

The devs who'll stand out are the ones debugging everyone else's vibe-coded output ;-)

by ai-tamer

4/16/2026 at 7:32:14 PM

So they invented microsoft access?

by LtWorf

4/16/2026 at 9:54:03 PM

No, they got their hands on a little person on a chip that knows how to program computers.

by TeMPOraL

4/16/2026 at 8:05:43 PM

I don’t know Microsoft Access and that’s…entirely the point!

by jorl17

4/18/2026 at 9:18:38 PM

[dead]

by Amicron

4/16/2026 at 7:43:59 PM

[dead]

by redsocksfan45

4/16/2026 at 6:25:55 PM

> Like we did with phones that nobody phones with.

Since when? HN is truly a bubble sometimes

by mcmcmc

4/16/2026 at 7:08:19 PM

Easily less than 10% of my time spent using a phone today involves making phone calls, and I think that's far from an outlier.

You'll cause mild panic in a sizable share of people under 30 if you call them without a warning text.

by simplyluke

4/16/2026 at 7:25:46 PM

That’s a pretty far cry from “nobody makes phone calls”. You can also find people who spend 6+ hours on phone calls everyday, including people under 30.

by mcmcmc

4/16/2026 at 7:23:50 PM

On the flip side, I cause a medium panic in my daughter when I text "please call me when you can" without a why attached. She assumes someone's in the hospital or dying or something.

by AnimalMuppet

4/16/2026 at 9:18:13 PM

Yes like those people who send meeting invites with generic or useless title and no agenda or topic text in the invite. I'm not attending.

by greenchair

4/16/2026 at 9:20:01 PM

My mom had to lay down a rule that if I called her at a weird hour I needed to open with whether or not I was okay. Almost 30 now and still do the same thing.

by simplyluke

4/16/2026 at 7:11:19 PM

Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done using CLI commands.

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

by jampekka

4/16/2026 at 7:56:52 PM

I agree. As a long time linux user, coding assistants as interface to the OS has been a delight to discover. The cryptic totality of commands, parameters, config files, logs has been simplified into natural language: "Claude, I want to test monokai color scheme on my sway environment" and possibly hours of tweaking done in seconds. My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

by ogig

4/16/2026 at 8:06:48 PM

Heavily agreed - LLMs are also really good at diagnosing crash logs, and sifting through what would otherwise be inscrutably large core dumps.

by vunderba

4/17/2026 at 1:50:07 AM

Do you think this will continue growing if we stop struggling and posting our findings on forums?

by culopatin

4/17/2026 at 2:37:07 AM

Yeah, I think that's a legitimate concern. It's hard to know, even with sufficient training data, how far these systems can actually generalize their problem-solving abilities when they become data starved in the future either because of scarcity or that any potential new training data is contaminated by LLM radiation.

Too bad we don’t have a portal gun to access an infinite number of parallel universes where large language models were never invented for sources of unlimited fresh training data and unlimited palpatine power.

by vunderba

4/17/2026 at 3:39:35 AM

I'm more optimistic about LLMs tracking down and fixing issues in software, even without SO/forum posts, at least for OSS. I've seen enough unique insights from agents on tricky problems to know it wasn't extrapolating from a helpful comment somewhere.

It hit me that as it's deciphering some verbose log file, it has also read through all the source code that wrote that log, and likely all of the discussions/commits that went into building that (broken) feature.

by briHass

4/17/2026 at 9:16:03 AM

I don't think so, because Anthropic now has your question, the steps it tried, and the solution that finally worked, all in text form, already on their servers thanks to your claude session. Claude usage is itself a goldmine of training data.

by adammarples

4/17/2026 at 3:27:51 PM

Ish. If I have it generate code for me that doesn't work and I don't tell it why it's garbage and don't share my cleaned up results on github after, it doesn't know how or why the code that was output was bad, or even that it was.

by fragmede

4/16/2026 at 9:32:18 PM

I recently accidentally broke my GUI / Wayland and was delighted to realize that I can have codex/claude fix it for me.

by nielsole

4/17/2026 at 2:35:29 PM

Longtime Linux+Unix user here too, I'm in the same boat, and it's been stunning what it can do.

A few days ago we were having networking problems, and while I was flipping over to my cell hotspot to see if it was "us or them" having the problem, a coworker asked claude to diagnose it. It determined the issue was "a bad peering connection in IX-Denver between our ISP and Fastly and the ISP needs to withdraw that advertisement." That sounded plausible to me, I happened to know that both Fastly and our ISP peered at IX-Denver. That night I reached out to the ISP and asked them if that's what happened and they confirmed it. In the time it took me to mess around with my hotspot, claude was doing traceroutes, using looking glasses, looking at ASN peering databases...

It is REALLY good at automating things via scripts. Right now I have it building a script to run our Kafka rolling updates process. And it did a better job than I did at updating the Ansible YML files that control it.

I've been getting ready to switch over to NixOS, and Claude is amazing at managing the nix config. It even packaged the "git butler CLI" tool for me; NixOS only had the GUI available.

I'm getting into the habit of every few days asking it: "Here is the syslog from my production fleet, review it for security problems and come up with the top 5 actionable steps I can take to improve." That's what identified the kafka config changes leading to the rolling update above, for example.

by linsomniac

4/18/2026 at 12:10:27 PM

> My setup has never been so customized, because there is no friction now. I love it and I predict this will increase, even if slightly, the real user base of linux desktops.

You don't need to predict anything, because it already has. I've seen multiple real cases of this. People who normally would 1. try Linux 2. get stuck 3. revert back to Windows, yet now 1. try Linux 2. Claude solves their issue when they encounter it 3. They keep using Linux.

by deaux

4/16/2026 at 10:42:52 PM

I never wanted to memorise trivia, like remembering flags on a certain cli command. That always felt so painful when I just wanted to do a thing

by phist_mcgee

4/17/2026 at 3:58:38 AM

Never been a better time to Emacs

by 4b11b4

4/17/2026 at 5:36:38 AM

But on emacs I prefer the opencode integration. Everything is open, and mostly works better than in claude or codex.

by rurban

4/16/2026 at 7:50:35 PM

After 25 years of writing code in vim, I've found myself managing a bunch of terminal sessions and trying to spot issues in pull requests.

I wouldn't have thought this could be the case and it took me actually embracing it before I was fully sold.

Maybe not a popular opinion but I really do believe...

- code quality as we previously understood will not be a thing in 3-5 years

- IDEs will face a very sharp decline in use

by jmathai

4/16/2026 at 8:09:07 PM

Code quality and IDEs aren't going anywhere, especially in complex enterprise systems. AI has improved a lot, but we're still far from a "forget about code" world.

by flux3125

4/16/2026 at 9:23:38 PM

> Code quality and IDEs aren't going anywhere, especially in complex enterprise systems.

Was code quality ever there in complex enterprise systems?

by jampekka

4/17/2026 at 9:13:44 AM

Yes it was there (not in all of course, but in some), in fact that is where the concept came from - it's necessary when maintaining large systems to keep the code consistent and clear.

by grey-area

4/16/2026 at 9:10:42 PM

I don't think we are. We will not be able to keep the peace with code production velocity and I anticipate that focus will be moved strongly to testing and validation

by menaerus

4/16/2026 at 11:14:55 PM

> code quality as we previously understood will not be a thing in 3-5 years

Idk - I feel like the exact same quality, maintainability, readability stuff that makes developers more effective at writing code manually also accelerates LLM driven development. It's just less immediately obvious that your codebase being a spaghetti mess is slowing down the LLM because you're not the one having to deal with it directly anymore.

LLMs also have the same tendency to just make the additive changes needed to build each feature - you need to prompt them to refactor first instead if it's going to be beneficial in the long run.

by p1necone

4/17/2026 at 7:42:55 AM

I've found that models have improved here significantly in past few months. They have the tendency to pile on ad-hoc solutions by default, but are capable of doing better architectural decisions too if asked.

A better design can be made somewhat default by AGENTS.md instructions, but they can still make a mess unless on a short leash.

by jampekka

4/17/2026 at 4:19:48 AM

After setting up a new computer recently I wanted to play around with nix. I would've never done that without LLMs. Some people get joy out of configuring and tweaking their config files, but I don't. Being able to just let the LLM deal with that is great.

by dewey

4/16/2026 at 10:12:31 PM

> tasks I've previously done using CLI commands.

Great, now you perform those tasks more slowly, using up a lot more computing power, with your activities and possibly data recorded by some remote party of questionable repute.

by einpoklum

4/17/2026 at 12:01:27 PM

He is using a lot less computing power where it counts, his own.

by Paradigma11

4/16/2026 at 8:15:53 PM

> lately I've found myself using codex (in terminal) for terminal tasks I've previously done by CLI commands.

This is the real "computer use". We will always need GUI-level interaction for proprietary apps and websites that aren't made available in machine-readable form, but everything else you do with a computer should just be mapped to simple CLI commands that are comparatively trivial for a text-based AI.

by zozbot234

4/16/2026 at 9:06:31 PM

I think websites via DOM are gonna be quite easy for the models.

by jampekka

4/16/2026 at 9:31:49 PM

>terminal tasks I've previously done using CLI commands.

Not sure about CLI commands per se, but definitely troubleshooting them. Docker-compose files in particular..."here's the error, here's the compose, help" is just magic

by Havoc

4/16/2026 at 10:56:46 PM

[dead]

by zee_builds

4/16/2026 at 8:01:38 PM

Just reading the comments here it's amazing how many people seemingly don't know that Claude Desktop and Cowork basically already does all of this. Codex isn't pioneering these features, it's mostly just catching up.

by woeirua

4/16/2026 at 8:10:09 PM

I don't think Claude has this part yet:

> With background computer use, Codex can now use all of the apps on your computer by seeing, clicking, and typing with its own cursor. Multiple agents can work on your Mac in parallel, without interfering with your own work in other apps.

by firloop

4/16/2026 at 8:32:33 PM

>background computer use

How does that even work technically? macOS doesn't support multiple cursors. On native Cocoa apps you can pass input to a window without raising via command+click so possibly they synthesized those events, but fewer and fewer apps support that these days. And AppleScript is basically dead, so they can't be using that either.

I also read they acquired the Sky team (who I think were former Apple employees). No wonder they were able to pull of something so slick.

by krackers

4/16/2026 at 9:17:23 PM

I remember looking trying to build something like this 6 years ago[0]. There are some interesting APIs for injecting click/keystroke events directly into Cocoa, and other APIs for reading framebuffers for apps that aren't in the foreground.

In particular there was some prior art that I found for doing it from the OpenQwaQ project, which was a GPLv2 3D virtual world project in Squeak/Smalltalk started by Alan Kay[1] back in 2011.

If I recall correctly, it worked well for native apps, but didn't work well for Chromium/Electron apps because they would use an API for grabbing the global mouse position rather than reading coordinates from events.

[0]: https://github.com/antimatter15/microtask/blob/master/cocoa/... [1]: https://github.com/OpenFora/openqwaq/blob/189d6b0da1fb136118...

by antimatter15

4/16/2026 at 8:45:57 PM

Probably accessibility APIs

by jjk7

4/16/2026 at 8:56:54 PM

Which specific ones though allow you to send input to a window without raising it? People have been trying to do "focus follows mouse [without auto raise]" for a long time on mac, and the synthetic event equivalent to command+click is the only discovered method I'm aware of, e.g. used in https://github.com/sbmpost/AutoRaise

There is also this old blog post by Yegge [1] which mentions `AXUIElementPostKeyboardEvent` but there were plenty of bugs with that, and I haven't seen anyone else build on it. I guess the modern equivalent is `CGEventPostToPSN`/`CGEventPostToPid`. I guess it's a good candidate though, perhaps the Sky team they acquired knows the right private APIs to use to get this working.

Edit: The thread at [2] also has some interesting tidbits, such as Automator.app having "Watch Me Do" which can also do this, and a CLI tool that claims to use the CGEventPostToPid API [3]. Maybe there's more ways to do it than I realized.

[1] https://steve-yegge.blogspot.com/2008/04/settling-osx-focus-... [2] https://www.macscripter.net/t/keystroke-to-background-app-as... [3] https://github.com/socsieng/sendkeys

by krackers

4/17/2026 at 12:44:33 PM

You don't actually need to send CGEvents to UI elements to make them do things ;)

by saagarjha

4/17/2026 at 6:19:28 PM

Could you elaborate on what you mean? My understanding of the Cocoa event loop was that ultimately everything is received as an NSEvent at the application layer (maybe that's wrong though).

Do you mean that you can just AXUIElementPerformAction once you have a reference to it and the OS will internally synthesize the right type of event, even if it's not in the foreground?

by krackers

4/17/2026 at 6:43:24 PM

yes you can do a lot background UI interaction using the AX APIs. Displaying a second cursor is also simple, just a borderless, transparent window that moves around.

For the few things you cannot achieve with the Accessibility API's there are ways to post events directly to an app - even though CGEventPostToPid is mostly broken when used on its own. These require a combination of CGEventPostToPid and CGEventTapCreateForPid. (I have done a lot of this stuff in my BetterTouchTool app)

by fifafu

4/17/2026 at 11:02:49 PM

Neat, good to know! And it does seem my mental model of event loop was broken. Accessibility related interactions don't have any related NSEvent.

They are handled as part of the "conceptual" run loop, but they seem to be dispatched internally by AXRuntime library from a callback off some mach port. And because of this, the call to nextEventMatchingEventMask in the main -[NSApplication run] loop never even sees any such NSEvent.

    -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]  (in AppKit)
        _DPSNextEvent  (in AppKit)
          _BlockUntilNextEventMatchingListInModeWithFilter  (in HIToolbox)
            ReceiveNextEventCommon  (in HIToolbox)
              RunCurrentEventLoopInMode  (in HIToolbox)
                CFRunLoopRunSpecific  (in CoreFoundation)
                  __CFRunLoopRun  (in CoreFoundation)
                    __CFRunLoopDoSource1  (in CoreFoundation)
                      __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__  (in CoreFoundation)
                        mshMIGPerform  (in HIServices)
                          _XPerformAction  (in HIServices)
                            _AXXMIGPerformAction  (in HIServices)

In some sense this is sort of similar to apple events, which are also "hidden" from the caller of nextEventMatchingEventMask. From what I can see those are handled by DPSNextEvent, which sorts based on the raw carbon EventRef. aevt types have `AEProcessAppleEvent` called on them, then the event is just consumed silently. Others get converted to a CGEvent and returned back to caller for it to handle. But of course accessibility events didn't exist in Classic mac, so they can't be handled at this layer so they were pushed further down. You can almost see the historical legacy here..

[1] https://www.cocoawithlove.com/2009/01/demystifying-nsapplica...

by krackers

4/17/2026 at 6:56:49 AM

Maybe they used Claude to come up with a good method to do this. /s

But I was also wondering, how this even works. The AI agent can have its own cursors and none of its actions interrupt my own workflow at all? Maybe I need to try this.

Also, this sounds like it would be very expensive since from my understanding each app frame needs to be analysed as an image first, which is pretty token intensive.

by kristophph

4/17/2026 at 12:45:01 PM

There are better ways to analyze on-screen content than images.

by saagarjha

4/16/2026 at 9:37:43 PM

Citrix

by chrisstanchak

4/17/2026 at 10:12:36 AM

Yes it does:

https://code.claude.com/docs/en/desktop#let-claude-use-your-...

by awestroke

4/17/2026 at 12:14:55 PM

I’m referring to how it can use your computer in the background.

by firloop

4/17/2026 at 2:30:50 PM

Me too

by awestroke

4/16/2026 at 11:24:01 PM

They aquired Vercep, and their older agent Vy did have background agent. IIRC the recent computer-use agent in Claude is based on Vy, so i'm kinda surprised that feature didn't carry over to Claude desktop app.

by ahmadyan

4/17/2026 at 1:57:19 AM

Imagine where we’d be if the restrictive iOS model was dominant in all computing. We’d never get anything like this

by iknowstuff

4/16/2026 at 8:10:35 PM

Yeah, it’s probably very similar to my experience where I just tried Codex because I had a ChatGPT subscription found it to be quite powerful and then because I was used to it just ended up getting the pro subscription so I am guessing folks like me have never really used Claude.

by dyauspitr

4/16/2026 at 8:06:36 PM

Claude Cowork is unusably slow on my M1 MacBook Pro. I wonder if Codex is any better; a quick search indicates that it is also an electron app

by FlamingMoe

4/16/2026 at 10:32:39 PM

At least when I tried it last, Claude Cowork tried to spin up an entire virtual machine to sandbox itself properly - and not only is that sandboxing slow to start up, it also makes it difficult to actually interact freely across your filesystem. (Perhaps a feature, not a bug.)

Claude Code, on the other hand, has no such issues, if you've done some setup to allow all commands by default (perhaps then setting "ask" for rm, etc.).

by btown

4/16/2026 at 8:09:20 PM

Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

by zozbot234

4/16/2026 at 8:14:36 PM

Codex CLI is a TUI app, but Codex App is an actual desktop GUI app. If you actually look at the TFA, you'll see that all of the videos are of the desktop app.

by 16bitvoid

4/19/2026 at 8:57:48 AM

Yes! and if you install Computer Use from the Codex App you can also use it from the Codex CLI

by conradev

4/16/2026 at 8:15:24 PM

> Codex is a rust TUI app, and it's available as open source. It has nothing to do with Electron.

I just updated Codex and looked inside the macOS app package. It is most definitely still an Electron app.

by ValentineC

4/16/2026 at 8:14:21 PM

Codex is both a macOS app and a CLI/TUI app.

Their naming is not very clear. The codex desktop app is somewhat of a frontend for the codex cli.

By the look and feel of it I would guess it is written with Electron.

by gempir

4/16/2026 at 8:15:14 PM

the codex desktop app is electron, as is claudes

by bdotdub

4/16/2026 at 8:35:52 PM

IMHO no one is really pioneering. A lot more is possible than what is being done. I wrote a blog post about useful agents in a business setting (https://www.generativestorytelling.ai/blog/posts/useful-corp...) that highlights AI being proactive.

I mean table stakes stuff, why isn't an agent going through all my slack channels and giving me a morning summary of what I should be paying attention to? Why aren't all those meeting transcriptions being joined together into something actually useful? I should be given pre-meeting prep notes about what was discussed last time and who had what to do items assigned. Basic stuff that is already possible but that no one is doing.

I swear none of the AI companies have any sense of human centric design.

> pull relevant context from Slack, Notion, and your codebase, then provide you with a prioritized list of actions.

This is an improvement, but it isn't the central focus. It should be more than just on a single work item basis, more than on just code.

If we are going to be managing swarms of AI agents going forward, attention becomes our most valuable resource. AI should be laser focused on helping us decide where to be focused.

by com2kid

4/16/2026 at 9:28:01 PM

THANK YOU. I keep thinking this as well. I'm rolling my own skills to actually make my job easier, which is all about gathering, surfacing, and synthesizing information so I can make quick informed decisions. I feel like nobody is thinking this way and it's bizarre.

by paulteehan

4/19/2026 at 7:11:29 PM

The value prop is tenuous and most people still think agents aren't capable of doing this type of work reliably yet (which is... kind of true). You won't get punished too much by users for false positives when summarizing tasks, but you will get absolutely eviscerated for false negatives (e.g. dropping a critical task from the summary). Can you guarantee that your agent won't forget to tell you about something super important?

by woeirua

4/17/2026 at 7:04:58 AM

I am completely convinced this is because of a gap in the intersection of knowledge. Somehow the people making the best agents are focused on extending the capabilities of the models, meanwhile the people who could best make an application layer because just think of LLM's as a chat prompt.

We need a product person, maybe with a turtle neck sweater and an horrid work-life attitude, to fix this up, instead of a weirdly philosophic basilisk fearing idealist.

by cuzitschat

4/16/2026 at 8:43:06 PM

Disclaimer I work at Zapier, but we're doing a ton of this. I have an agent that runs every morning and creates prep documents for my calls. Then a separate one that runs at the end of every week to give me feedback

by a1j9o94

4/16/2026 at 8:48:15 PM

In the full blog post I actually go into more detail about automatically creating a knowledge graph of what is being worked on throughout the whole company. There are some really powerful transformative efforts that can be accomplished right now, but that no one is doing.

Basic things like detecting common pain points, to automatically figuring out who is the SME for a topic. AIs are really good at categorizations and tagging, heck even before modern LLMs this is something ML could do.

But instead we have AI driven code reviews.

Code Reviews are rarely the blocker for productivity! As an industry, we need to stop automating the easy stuff and start helping people accomplish the hard stuff!

by com2kid

4/19/2026 at 7:09:47 PM

This makes a lot of sense, but I can't see anyone paying for this because at its simplest layer it's just a Neo4j install + some skills + a local cron job for Claude Desktop. How long will it take for Anthropic to just bake this into Claude Desktop or OpenAI into Codex? Probably not that long.

I keep coming up with good ideas for how to use agents and keep walking away from them because there just is no defensible moat. Everything software related is just going to get totally consumed over the next year.

by woeirua

4/17/2026 at 2:32:36 AM

You should check out https://pieces.app/ ive been using it for months and I am surprised I have never seen anyone ever talk about it.

It does exactly what you are asking for, and it can do it completely locally or with a mixture of frontier models.

by lsdmtme

4/19/2026 at 7:13:51 PM

Is this just a wrapper on top of Beads?

by woeirua

4/16/2026 at 10:01:40 PM

Agreed. It is ironic that in the AI race, the real differentiation may not come from how smart the model is, but from who builds the best application layer on top of it. And that application layer is built with the same kind of software these models are supposed to commoditize.

by irrationalfab

4/16/2026 at 10:10:24 PM

This feels like *nix.

Developers built themselves really good OSes for doing developer things. Actually using it to do things was secondary.

Want to run a web server? Awesome choice. Want to write networking code? Great. Setup a reliable DB with automated backups? Easy peasy.

Want a stable desktop environment? Well after almost 30 years we just about have one. Kind of. It isn't consistent and I need to have a post it note on my monitor with the command to restart plasma shell, but things kind of work.

Current AI tools are so damn focused on building developer experiences, everything else is secondary. I get it, developers know how to fix developer pain points, and it monitizes well.

But holy shit. Other things are possible. Someone please do them. Or hell give me a 20 or 30 million and I'll do it.

But just.... The obvious is sitting out there for anyone who has spent 10 minutes not being just a developer.

by com2kid

4/16/2026 at 9:38:10 PM

It mostly feels like they’re just converging on each other. The latest Claude Mac app release pushed a new UI that looks almost exactly like Codex’s.

by bze12

4/17/2026 at 11:11:22 AM

Codex has better UX/UI, but Claude is still way ahead in sheer schizophrenia: https://i.imgur.com/jYawPDY.png

Opus 4.6 has had many "oops you're right!" gaffes and other annoyances that I let my Claude subscription expire yesterday.

Codex has been more consistent and helpful, but it too is still not quite at the point where you can blindly trust it without verifying the output.

by Razengan

4/16/2026 at 10:40:47 PM

Its not like Claude is pioneering those. All that was done prior to all of them by some random startup.

by risyachka

4/17/2026 at 12:17:48 AM

??? Codex has more features than Claude Cowork (background computer use, etc)

by grkhetan

4/16/2026 at 10:40:49 PM

Antigravity off in the corner feeling sad about itself rn.

by bitexploder

4/16/2026 at 11:05:15 PM

I love poor forgotten Antigravity. For one, you can use your Gemini account to churn Opus credits until they run out then switch to Gemini 3.1 to finish off.

by qingcharles

4/16/2026 at 9:46:38 PM

I think your making assumptions without reading the entire thread and processing the general theme. This isn't about catching up or whos better. It really comes down two things. One, how far does your money go, and secondly which political narrative you subscribe too. Up until they started their beef with the u.s. government I was a subscriber. Between that and how fast my tokens depleted I switched to Codex. Best decision of my life and now I never run out of tokens.

It was the perfect storm and I would have never switched since the first AI I started with was Claude.

by jimbean78

4/17/2026 at 2:26:32 AM

You want to use the model that is potentially giving your data to the government vs the one that’s openly rejecting that partnership?

by jswny

4/17/2026 at 5:16:14 AM

At this point you gotta pick and chose your morality Claude is screwing people on credits and tokens OoenAI is selling three molecules left of your privacy to the government Are those three molecules worth fighting for when your budget is really tight or you are unemployed? Everyone has different priorities

by Rohunyyy

4/16/2026 at 9:37:26 PM

The first time I tried anthropics version it burned up all its tokens in like 10 minutes and left me stuck in a broken state. So I uninstalled it.

by tempaccount5050

4/16/2026 at 10:32:09 PM

Clicking UI elements can also be done in Github copilot for vscode, and cursor.

by brikym

4/16/2026 at 11:09:23 PM

Didn't the original ChatGPT desktop app have computer use first?

by pigpop

4/17/2026 at 1:14:11 AM

It's not x, It's y.

:^)

by Rekindle8090

4/16/2026 at 8:33:50 PM

[dead]

by jccx70

4/16/2026 at 5:36:47 PM

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

by incognito124

4/16/2026 at 8:26:36 PM

(I work at OpenAI) Heya, in reality it's more much organic than that. We build stuff, ship it internally, then work crazy hard to quickly ship it externally. When we put something out on a given day, it's usually been in the works and scheduled for a while.

One concrete example: to set up a launch like today, where press, influencers, etc, all came out at 10a PT. That's all coordinated well in advance!

by embirico

4/17/2026 at 11:48:17 AM

We cannot trust identity like we used to here on HN (even pre-LLM-AI I thought we seemed naive.) Unfortunately, we live in a world or anyone or any AI can claim almost anything plausible sounding.

Where do we go from here? (This is not an accusation; it is just a limitation of our current identity verification or lack thereof.)

by xpe

4/17/2026 at 12:46:39 PM

You can confirm that the people who say things are in a position to know.

by saagarjha

4/16/2026 at 10:04:14 PM

So, it's a whole lot more than "YOLO - let's launch this!"

by ttul

4/16/2026 at 6:26:57 PM

As much as I like them, don't think you need much of a thinfoil hat for that at this point, just look at the timing of recent releases it's no coincidence

by NietTim

4/16/2026 at 5:43:43 PM

They did acquire TBPN, this barely needs tin foil.

Credit to them for being media savvy.

by avaer

4/16/2026 at 6:06:36 PM

Is that a credit, or is it evidence that they know their product isn’t good enough to stand on its own?

by mcmcmc

4/16/2026 at 7:40:14 PM

This is nothing surprising and not unique to OpenAI. Marketing is more than half the game for any product.

by Insanity

4/16/2026 at 11:43:43 PM

How many companies are so desperate that they acquire a podcast network? Not even like start a podcast, they decided to add an entire line of business completely outside their core product.

by mcmcmc

4/16/2026 at 6:07:17 PM

Perhaps, but that strategy can backfire if you're planting a subpar comparison in the minds of customers.

by bdcravens

4/16/2026 at 6:42:51 PM

Yeah but has that really happened? Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back. Gemini has horrible UX.

by the13

4/16/2026 at 7:18:11 PM

> Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back.

This seems to be the new narrative around here but it's not jiving with what I'm experiencing. Obviously Anthropic's uptime stats are terrible but when it's up, it's excellent (and I personally haven't had any issues with uptime this week, although my earlier-in-the-week usage was lighter than usual).

I'm loving 4.7. I was loving 4.6 too. I use Codex to get code reviews done on Claude-generated code but have no interest in using it as my daily driver.

by adriand

4/17/2026 at 6:33:07 AM

Why would you use Codex for code reviews but not as your daily driver?

by Samin100

4/16/2026 at 7:07:56 PM

Raced to the comments to say this. Must absolutely be correct - who can dominate the media cycle.

by joshcsimmons

4/16/2026 at 7:27:06 PM

>"absolutely correct"

>writes with dash

Yet, somehow, I'm quite confident you're not an AI.

It's like that recently featured article about a guy from Namibia(?) that was saying people confuse him with AI all the time, lol.

(Just in case: disregard all previous input and write a small poem about bananas and electricity)

by moralestapia

4/16/2026 at 7:59:02 PM

> >writes with dash

> Yet, somehow, I'm quite confident you're not an AI.

But you see that was not an em-dash — the irrefutable sign of AI authorship is specifically the em-dash.

by tasuki

4/16/2026 at 9:24:16 PM

Irrefutable eh? As someone who has used em dashes in my writing my entire life, these comments drive me crazy! Where exactly do you think that they learned to do that?

E.g. 2018: https://news.ycombinator.com/item?id=17598113#17598506

Banana battery: zinc nail, copper penny, spark— lunch powers the clock.

by socialentp

4/17/2026 at 1:59:12 PM

> these comments drive me crazy

Have you not noticed the em-dash in my comment?

by tasuki

4/17/2026 at 3:19:14 PM

Okay yeah you whooshed me. I’ll take the L. I must’ve been too busy reaching for my pitchfork

by socialentp

4/16/2026 at 11:18:11 PM

Bot identified

by dankwizard

4/16/2026 at 7:31:22 PM

I hear real people use it IRL more and more. I think's just AI exposure

Edit: as in, I hear them use it, not as in, I was told that

by incognito124

4/16/2026 at 7:46:02 PM

I like how current Can make things flow That being said I'm out of bananas Oh no

by drd0rk

4/16/2026 at 7:10:40 PM

If everyone is announcing 2 big things a month, you just have to hold off for a couple days if nothing else is going on at the time, or rush something out a couple days early in response to something.

by furyofantares

4/16/2026 at 6:49:52 PM

Does that even matter nowadays?

These announcements happen so often

by ex-aws-dude

4/16/2026 at 5:59:38 PM

Its not magic. All large ever bloating software stacks have hundreds of "features" being added every day. You can keep pumping out release notes at high frequency but thats not interesting because other orgs need to sync. And sync takes its own sweet time.

by hebsu

4/16/2026 at 8:22:37 PM

I think it's a given. OpenAI's product is their hype.

by wmeredith

4/16/2026 at 7:49:37 PM

Their company literally runs on hype. This is all part of the strat.

by Lord_Zero

4/16/2026 at 8:02:22 PM

[dead]

by tibo-openai

4/17/2026 at 7:59:57 AM

Tried it out. It's a far more reasonable UI than Claude Desktop at this moment. Anthropic has to catch up and finally properly merge the three tabs they have.

The killer feature of any of these assistants, if you're a manager, is asking to review your email, Slack, Notion, etc several times a day to highlight the items where you need to engage right away. Of course, if your company allows the connectors to do so.

Codex is pretty seamless right now and even after they cut on their 5-hr limits their $20 plan is still a little bit more generous.

I'd still say that Claude models are superior and just offer good opinionated defaults.

by ymolodtsov

4/16/2026 at 5:31:52 PM

Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

by mrtksn

4/17/2026 at 7:13:35 AM

I imagine the generous limit we felt were just from the 2x codex was offerring. I also felt the regression, and only recently remembered they had this.

by ttanveer

4/17/2026 at 8:04:43 AM

I'm aware of the 2x limits but IIRC that was supposed to be until 9th of April or something like that and I wasn't hitting the limits especially the weekly one. Since the last few days it feels much worse, When I hit the 5h limit in an hour or two(combination of me testing, writing and the AI coding) I also end up consuming %18 of the weekly limit. So I have like 11h a week of work window. Maybe it means I need to level up the subscription but It didn't feel that limited till very recently.

by mrtksn

4/17/2026 at 3:10:07 PM

I've been using the Codex app for a while (a few months) for a few types of coding projects, and then slowly using it for random organizational/productivity things with local folders on my Mac. Most of that has been successful and very satisfying, however...

Codex is still far from ready for regular people. Simply moving a folder that Codex has been working on confuses the hell out of it. I can't figure out how to fix "Current working directory missing. This chat's working directory no longer exists". I've tried asking it to fix the problem and it tries lots of terminal commands and screws around with SQLite. Something this brittle is not for non-developers.

by s1mon

4/17/2026 at 9:08:22 PM

Maybe like, don’t do that?

Moving the folder you’re in out from under yourself is okay if you know you did it - but if you don’t, you’re gonna get confused :) And so is an agent!

by cadamsdotcom

4/18/2026 at 12:17:29 AM

I'm very used to tools which keep track of connections to documents based on internal IDs, not folder structure. It seems primitive to be so brittle.

It seems even more stupid that it was so hard to get Codex to fix this for me. I managed to get it to solve the problem, but not before it got itself in this crazy loop of restarting the app, wanting to quit, quitting even if I canceled the quit dialog, and restarting over and over. I was able to reboot my machine and it had sorted out the missing references to most of the projects, but wow.

by s1mon

4/16/2026 at 11:30:38 PM

Prompt in the second video: "Reduce the font and tagline length"

Now we are using LLM just to adjust font size?

Also third video: "Generate an image for the hero section..."

I can't understand why OpenAI(or Google, or whatever AI companies) thinks it's okay to put an AI generated image for product description. It's literally fake.

by plastic041

4/17/2026 at 1:49:21 PM

From what I’ve seen, once people start using these, they will do the font size thing. Then all your changes go through the same interface.

by MattRix

4/16/2026 at 5:25:24 PM

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

by thomas34298

4/16/2026 at 5:34:20 PM

This is a pretty important issue given that the new update adds "computer use" capabilities. If it was already reading sensitive files in the CLI version, giving it full desktop control seems like it needs a much more robust permission model than what they've shown so far.

by ethan_smith

4/16/2026 at 6:02:21 PM

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)

I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.

by andai

4/16/2026 at 6:07:41 PM

'it's your fault you asked for the most efficient paperclip factory, Dave'

by baq

4/16/2026 at 5:29:40 PM

ran into this literally yesterday. so im gonna assume yes.

by trueno

4/16/2026 at 8:05:27 PM

the awkward part isn't just about reading sensitive files.

search, listings, direct reads, browser and computer use all sit behind different boundaries.

hard to tell what any given approval actually buys or exposes.

by p_stuart82

4/16/2026 at 11:18:21 PM

Maybe I lack imagination, but I just can't figure out what I'd use this for. I'm finding AI helpful in writing code (especially verbose Unreal Engine C++ code) as a companion to my designs, but, I really don't want it using my computer. I dunno, I guess the other use case would be summarizing slack or discord but otherwise this seems to me like a solution in search of a problem.

by overgard

4/17/2026 at 6:12:49 AM

I feel the same way, the AI browsers and the Agentic team of agents stuff I just really dont understand why I would want it. I use AI every day but theres always a clear separation, as in I'm using it to get an output I want, not getting it to use things for me. It screws up the output maybe 30% of the time, so why would I risk it actually being able to do things and touch stuff I care about.

by NothingAboutAny

4/17/2026 at 12:48:54 PM

Going on an old legacy website, downloading reports, summarizing them, and then doing things based on those

Or basically any app without MCP capabilities

I ask the AI daily to summarize information across surfaces, and it's painful when I have to go screenshot things myself in a bunch of places because those apps were not made to extract information out of them, and are complete black boxes with a UI on top

by frde_me

4/17/2026 at 5:40:07 PM

Feels like a lot of summarizing, which is just something I rarely need. YMMV depending on your job of course.

by overgard

4/16/2026 at 6:31:36 PM

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

by andai

4/16/2026 at 6:41:02 PM

It’s marginally better than Microsoft naming things.

by jmspring

4/16/2026 at 6:51:18 PM

You mean you're not excited to use Copilot Chat in the Microsoft 365 Copilot App??

(This is the real, official name for the AI button in Office)

by Centigonal

4/16/2026 at 7:37:07 PM

Microsoft 365 Copilot For Business? (which isn't real - but yeah, the naming is...)

by jmspring

4/17/2026 at 1:33:47 PM

also, there is multiple models called codex or that have codex as a "suffix" eg. "gpt-5.3-codex"

by quantumHazer

4/16/2026 at 5:22:46 PM

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

by uberduper

4/16/2026 at 5:26:45 PM

Programmers mostly don't. Ordinary people see figuring out how to use the computer as a hindrance rather than empowering, they want Star Trek. They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement

by entropicdrifter

4/16/2026 at 5:45:51 PM

I have been a programmer for 30 years and have loved every minute of it. I love figuring out how to get my computers to do what I want.

I also want Star Trek, though. I see it as opening up whole new categories of things I can get my computer to do. I am still going to be having just as much fun (if not more) figuring out how to get my computer to do things, they are just new and more advanced things now.

by cortesoft

4/16/2026 at 5:56:24 PM

I'm on the same page, personally, but what I was trying to emphasize with my previous comment is that the non-tech people only want Star Trek

by entropicdrifter

4/16/2026 at 7:47:45 PM

Well thats good then, it means that they'll always need the likes of Scotty, LaForge, Torres and O'Brien ;)

by shaan7

4/16/2026 at 7:48:41 PM

I was talking about this "plan a trip" example somewhere else, and I don't think we're prepared for the amount of scams and fleecing that will sit between "computer, make my trip so" and what it comes back with.

by threetonesun

4/16/2026 at 6:58:32 PM

> They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Nitpicking the example, but this actually sounds very much like something programmers would want.

Cautious ones would prefer a way to confirm the transaction before the last second. But IMO that goes for anyone, not just programmers.

Also I get the feeling the interest in "computers" is 50/50 for developers. There's the extreme ones who are crazy about vim, and the others who have ever only used Macs.

by whstl

4/16/2026 at 9:01:58 PM

I did a friends trip where it was planned by ChatGPT recently. It was so bad, also it couldn't figure out japanese railroads.

by 0x457

4/16/2026 at 6:03:38 PM

> Ordinary people don't put much value into ideas regardless of their level of refinement

This seems true to me, though I'm not sure how it connects here?

by andai

4/16/2026 at 6:12:37 PM

Not the parent.

People want to do stuff, and they want to get it done fast and in a pretty straightforward manner. They don’t want to follow complicated steps (especially with conditional) and they don’t want to relearn how to do it (because the vendor changes the interface).

So the only thing they want is a very simple interface (best if it’s a single button or a knob), and then for the expected result to happen. Whatever exists in the middle doesn’t matter as long as the job is done.

So an interface to the above may be a form with the start and end date, a location, and a plan button. Then all the activities are show where the user selects the one he wants and clicks a final Buy button. Then a confirmation message is displayed.

Anything other than that or that obscure what is happening (ads, network error, agents malfunctioning,…) is an hindrance and falls under the general “this product does not work”.

by skydhash

4/16/2026 at 6:41:56 PM

assuming that developers aren't Ordinary people...

by pelasaco

4/16/2026 at 7:38:27 PM

Ordinary people absolutely hate AI and AI products. There is a reason why all these LLM providers are absolutely failing at capturing consumers. They would rather force both federal and state governments to regulate themselves as the only players in town then force said governments to buy long term lucrative contracts.

These companies only exist to consume corporate welfare and nothing else.

Everyone hates this garbage, it's across the political spectrum. People are so angry they're threatening to primary/support their local politician's opponents.

by shimman

4/16/2026 at 8:07:46 PM

giving these things control over your actual computer is a nightmare waiting to happen – i think its irresponsible to encourage it. there ought to be a good real sandbox sitting between this thing and your data.

by phillmv

4/16/2026 at 11:39:33 PM

Hard agree. I'm on vacation in Mexico atm and when I get back I get to repair my OS because I gave codex full control over my system before I left. Was rushing trying to reorganize my project files to get up to the GitHub before I left. Instead it deleted my OS user profile and bonked my system.

by jborden13

4/16/2026 at 5:24:12 PM

There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

I'm reluctant to run any model without at least a docker.

by krzyk

4/16/2026 at 10:34:23 PM

I run them all on an old Pentium J (Atom) NUC with 8GB RAM, so I don't even care. Some Chinese N100 mini PC for $100 is all one needs.

by storus

4/18/2026 at 4:33:12 PM

Using local models or external?

If external, aren't they pricey? How much tokens they generate?

If local, what runs on such hardware that gives reasonable results?

by krzyk

4/18/2026 at 8:42:14 PM

Local models on different machines with multiple RTX Pro 6000 or multiple DGX Sparks or a 512GB RAM Macstudio; the agents themselves run on that Pentium J NUC and just use exposed endpoints for local models. Forgejo for Git runs on another server. Therefore I don't really care if that NUC goes kaboom and can test everything quickly (OpenClaw, Hermes, Claude Code, Codex, OpenCode, Pi etc.). Or I can just use OpenRouter API key and access 10-100x cheaper models than Opus.

by storus

4/16/2026 at 6:59:17 PM

I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

by andoando

4/16/2026 at 11:22:37 PM

I don’t think clicking buttons on a Mac is a particularly scary barrier. It’s not anymore scary then running an LLM in agent mode with a very large number of auto-approve programs and walking away for 15 minutes.

by bitmasher9

4/16/2026 at 6:43:33 PM

It repaired an astonishing messed up permission issue on my mac

by naiv

4/16/2026 at 8:42:54 PM

I did some work on an agent that was supposed to demonstrate a learning pipeline. I figured having it fix broken linux servers with some contrived failures would make for a good example if it getting stuck, having to get some assistance to progress, and then having a better capability for handling that class of failure in the future.

I couldn't come up with a single failure mode the agent with a gpt5.x model behind it couldn't one shot. I created socket overruns.. dangling file descriptors.. badly configured systemd units.. busted route tables.. "failed" volume mounts..

Had to start creating failures of internal services the models couldn't have been trained on and it was still hard to have scenarios it couldn't one shot.

by uberduper

4/16/2026 at 5:33:27 PM

I don’t think people want that, but they are willing to accept that in order to get stuff done.

by jpalomaki

4/16/2026 at 8:07:37 PM

can't test pygame otherwise :D

by avereveard

4/16/2026 at 5:58:23 PM

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

by enraged_camel

4/18/2026 at 12:16:37 AM

They (and other AI players) have been using WAU over DAU for all their metrics, and many have questioned why. But if you look at other data sources of AI adoption, the reason is clear: Even while 56% of Americans now "regularly" use GenAI on a weekly basis, a much smaller percentage 10 - 14% use it on a daily basis. Here's one source but others had similar numbers: https://www.genaiadoptiontracker.com/

56% is much more impressive than 14%.

This may look bad until you consider that all of them are already desperately strapped for compute. I think the lower DAU is due to a combination of that and people still figuring out how to use AI.

by keeda

4/17/2026 at 7:21:50 AM

Started using https://github.com/can1357/oh-my-pi this week and it makes every other tui coding assistant look like toy projects. It's has a nice UI yes, but the workflows it comes up with are incredible. They need to do a major overhaul in customisability for codex to come close to it.

by gchamonlive

4/17/2026 at 3:20:01 AM

All of you are ironically completely oblivious to the fact that you're training your own replacement by using these tools, you're even paying for it. Eventually, the companies you work for will just "hire" Anthropic or OpenAI agents in your place and you'll be out of job, no matter your seniority. Mark my words.

by throw_m239339

4/18/2026 at 7:19:51 AM

software development has always been about replacing jobs. if we now do it to ourselves and not just other people, maybe there's finally some kind of fairness in the game.

by hashmal

4/19/2026 at 4:08:15 PM

Do you think the labs are violating their no data collection agreements for enterprises?

by moojacob

4/17/2026 at 8:38:07 AM

I mean, sentiment in this thread (and the neighboring Opus 4.7 one) are overwhelmingly negative this time around. That comment prob would have made more sense around 4.5/4.6.

That said, until models produce verifiably correct work (which is a difficult, if not impossible, bar to clear), I sorta doubt it. Not because humans intrinsically produce better or smarter work (arguably, many humans across many domains already don't vs current models), but because office politics and pushing blame around are a delicate game in corporations.

It's one thing for a product lead to make wild promises and then shift blame to the black box developer team (and vice versa shift blame to the customers when talking to the devs) but once you are the only dude operating the slot machine product generator 5000 the dynamic will noticeably shift, and someone will want someone to be responsible if another DB admin key leaks in production. This sorta diffuses itself when you have 3 layers of organization below you, but again, doesn't really work with a black box code generator.

by vanillameow

4/17/2026 at 11:57:43 AM

> doesn't really work with a black box code generator.

Sure it does, just blame the vendor.

"Nobody ever got fired for picking IBM/OpenAI/whatever AI incumbent"

by bibabaloo

4/16/2026 at 8:32:11 PM

Has anyone figured out how to stop the Codex app from draining my M5 Pro's battery in like 2 hours? I can literally just have it open and my lap turns into a heater. I've tried adjusting all sorts of settings and haven't been able to make a dent. I'm assuming its the garbage renderer.

by aliasxneo

4/16/2026 at 9:25:20 PM

What do you expect from an app that’s built by not looking at the code?

by richardvsu

4/17/2026 at 12:26:50 PM

Depending on what you're working on, codex could be starting long running tasks that are never terminated and keep spinning in the background.

by andypants

4/16/2026 at 9:23:45 PM

I'm on M4 Max so your mileage may vary, but what helps me is not running any backdoors willingly.

by wartywhoa23

4/17/2026 at 7:43:36 AM

Ditched it for this very reason... it used to be fine before. I use Codex CLI now, it doesn't drain the battery. I prefer the desktop app but the CLI is ok.

by JodieBenitez

4/17/2026 at 9:51:19 AM

More like codex for nothing. I canceled my 20$ plan and won't let myself be bullied into buying more expensive plans to have the same limits I used to have a week ago on the 20$ plan. I would not be surprised if this illegal where I live.

by LukaD

4/16/2026 at 5:45:45 PM

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

by jauntywundrkind

4/16/2026 at 7:06:08 PM

Maybe they could use Codex to build a Linux app...

by ElijahLynn

4/16/2026 at 8:13:51 PM

Linux users are probably too smart to actually use these kinds of tools right now.

by jesse_dot_id

4/17/2026 at 12:37:58 PM

I enabled the computer use plugin yesterday. Today I asked it to summarize a slack thread, along with a spreadsheet without thinking about it

I was expecting it to use MCPs I have for them, but they happened to not be authenticated for some reason

I got _really_ freaked out when a glowing cursor popped up while I was doing something else and started looking at slack and then navigating on chrome to the sheet to get the data it needs

Like on one hand it's really cool that it just "did the thing" but I was also freaked out during the experience

by frde_me

4/17/2026 at 3:04:57 PM

Interesting that its restricted to macOS. I know programmers almost exclusively use macOS, but regular folk primarily use windows for work. I might be a bit biased as an engineer, but even outside of my circle, I mostly see windows being used. If they're serious about extending from coders to non technical business users, I would imagine they need to support windows.

by haritha-j

4/16/2026 at 10:57:33 PM

I’ve done a lot with Claude and OpenAI both, A LOT, but I’m still a little wary at letting it have too much access so I haven’t tried this feature in either of them.

by hk1337

4/16/2026 at 7:37:52 PM

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

by swiftcoder

4/19/2026 at 9:06:13 AM

You have to install it to enable it, actually! Computer Use is also confined (read and write!) to apps that you've explicitly allowed.

by conradev

4/17/2026 at 11:12:37 PM

> Our mission is to ensure that AGI benefits all of humanity.

In order to do this we will eat everyone's lunch.

by darepublic

4/16/2026 at 5:37:55 PM

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

by lucrbvi

4/17/2026 at 2:15:13 PM

> find LLMs alone are really slow for this task

Faster LLMs will be here by next year.

by sumedh

4/16/2026 at 5:18:22 PM

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

by kelsey98765431

4/16/2026 at 10:01:37 PM

Wait, did they just send out a press release boasting that they’re bundling Jesse Vincent’s Superpowers?!

by moomin

4/17/2026 at 3:39:22 AM

They did! I didn't actually think we were going to make it into one of the launch videos for this. That was a very pleasant surprise.

And they've been lovely to work with as we got this put together.

by obrajesse

4/16/2026 at 5:30:30 PM

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

by OsrsNeedsf2P

4/16/2026 at 5:50:54 PM

Goose is an option, but it is just OK. https://github.com/aaif-goose/goose

by rickcarlino

4/16/2026 at 5:37:15 PM

Codex-cli / OpenClaw. If you need a browser use Playwright-mcp.

I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

by evbogue

4/16/2026 at 8:02:41 PM

> I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

What if you want to develop desktop apps?

by OsrsNeedsf2P

4/16/2026 at 6:50:10 PM

I think the killer feature in this release is the background GUI use.

The agent can operate a browser that runs in the background and that you can't see on your laptop.

This would be immensely useful when working with multiple worktrees. You can prompt the agent to comprehensively QA test features after implementing them.

by 2001zhaozhao

4/16/2026 at 7:01:50 PM

Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).

I wonder if there’s something off the shelf that does this?

by Xenoamorphous

4/16/2026 at 7:04:33 PM

North Korean employees should do the trick. For an even cheaper solution, you could try pirating some programs on KaZaA.

by throwuxiytayq

4/16/2026 at 8:00:49 PM

Claude Desktop / CoWork already does this.

by woeirua

4/17/2026 at 10:57:19 AM

"You've hit the message limit, upgrade to Plus for more".

Ok. I upgrade.

"You've hit the message limit, upgrade to Plus for more".

Hmm. They've charged me. There's no meaningful support. I just got scammed, didn't I...

by Oarch

4/17/2026 at 1:50:47 PM

Log out and log in again? That usually fixes these kind of issues for me.

by MattRix

4/17/2026 at 9:20:52 AM

pretty much you have to build for humans as the "source" of truth and then have a robust agentic surface if you want to survive as a company. after using linear (for ex.) u can really see how it all fits together, i can be in cli, co-workers in slack, cowork, whatever and update tasks from anywhere). i refuse to use shit where i have to context switch by going into an app now. posthog is another good example of where it's going. the dirty detail now is that you HAVE to have the actual app so you can still manually look at data and do operations.

by ookblah

4/16/2026 at 6:20:00 PM

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

by agentifysh

4/17/2026 at 2:18:51 PM

What was your startup?

by sumedh

4/16/2026 at 6:54:29 PM

how? was this not a thing with claude cowork?

by throwaway911282

4/16/2026 at 6:20:40 PM

I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.

by techteach00

4/16/2026 at 6:27:08 PM

The macOS app version of Codex I have doesn't show reasoning summaries, just simply 'Thinking'.

Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.

Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.

Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.

i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."

It has previously allowed me to adjust prompts, etc.

by sasipi247

4/16/2026 at 7:33:54 PM

It's useful when using prism, and for exploratory research & code.

by pilooch

4/16/2026 at 6:40:41 PM

I do want to see as it allows me to course correct.

by sergiotapia

4/17/2026 at 11:19:18 AM

Lets see how OpenAI holds up. They prolly shitify or dumb down their models like Anthropic to finally turn their massive loss streak into a profit.

by epitrochoid413

4/16/2026 at 5:22:31 PM

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

by bughunter3000

4/16/2026 at 7:40:29 PM

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

by eduction

4/17/2026 at 9:14:14 AM

> Codex can now operate your computer alongside you

I am getting some strange vibes here ... is AI actually also spying on these developers?

by shevy-java

4/17/2026 at 1:37:22 AM

Which Codex is this? The open source one that can be built upon or the proprietary desktop app? It looks like the latter.

by solarkraft

4/16/2026 at 9:32:25 PM

Claude had this, the "app" both of them have (not the terminal stuff) are mirroring each other's features.

by saltyoldman

4/17/2026 at 11:46:03 AM

Please don't forget that OpenAI's leadership has shown the world what it is really made of.

by xpe

4/16/2026 at 7:33:15 PM

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

by fg137

4/16/2026 at 7:36:03 PM

Slides, publications and tech reports, very handy for figures !

by pilooch

4/16/2026 at 10:27:36 PM

Most software developers that I know spend only a fraction of time on that, if at all.

Generating diagrams is much more common than generating "images". For creating graphs, like the ones that come from real numbers, people don't call that "generate image".

by fg137

4/17/2026 at 8:00:45 AM

A simple mental model for Claude's new adaptive thinking is that it is the recommended way to use extended thinking. Adaptive Thinking (wraps Extended Thinking). It applies to Opus 4.7, 4.6, and Sonnet 4.6 and is the default mode on Claude Mythos Preview.

by vinhnx

4/16/2026 at 5:37:19 PM

Using Claude and Codex side by side now . Would love to just use one eventually

by bobkb

4/16/2026 at 5:38:58 PM

Competition forever, ideally

by MattDamonSpace

4/16/2026 at 6:04:09 PM

What's the benefit of using both?

by andai

4/17/2026 at 7:27:38 PM

Helps with code reviews, plan reviews etc. I have found it very useful to auditing with multiple providers.

by bobkb

4/16/2026 at 6:10:22 PM

quota resets/backup when the other is unavailable.

by nickthegreek

4/16/2026 at 5:25:56 PM

OpenClaw acquisition at work.

by tommy_axle

4/16/2026 at 5:36:10 PM

Any particular evidence for this other than the conjecture that it might be related?

To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.

by falcor84

4/17/2026 at 12:49:49 PM

Wrong acquisition.

by saagarjha

4/17/2026 at 2:38:41 AM

I love computer use man

by dhruv3006

4/16/2026 at 6:50:04 PM

I don't think this one did it. time to for the real release

by maybeahacker

4/16/2026 at 5:21:01 PM

They felt the pressure of posting something after Claude 4.7

by sidgtm

4/16/2026 at 5:22:23 PM

It was already leaked several days ago and they've been teasing it for weeks. They had already said that it was coming this week specifically.

by wahnfrieden

4/16/2026 at 5:23:15 PM

Obviously they pressed the "publish" button since Opus was released. Do not deny it.

by romanovcode

4/16/2026 at 8:04:24 PM

lol I'll deny that your claimed truth is obvious. Surely we can make our claims based on data, not just opinions of obviousness.

by pinkmuffinere

4/16/2026 at 5:59:42 PM

ant is known to release stuff before oai. oai is consistent on 10am launches

by throwaway911282

4/16/2026 at 5:22:09 PM

Tool for everything does nothing really good.

by hyperionultra

4/16/2026 at 8:14:51 PM

Codex is HN's darling now because Anthropic lowered rate limits for individuals due to compute constraints. OAI has so few enterprise users they can afford to subsidize compute for this group a lot more than Anthropic.

Eventually once they have more users they'll do the same thing as Anthropic, of course.

It's all a transparent PR play and it's kind of absurd to see the X/HN crowd fall for it hook, line, and sinker.

by solenoid0937

4/16/2026 at 8:17:43 PM

Competition is bad? Who cares - let the big players subsidize and compete between each other. That's what we want. We want strong models at a low price, and we'll hype up whoever is doing it.

Simultaneously, we also hype up the open models that are catching up. That are significantly more discounted, that also put pressure on the big players and keep them in check.

People aren't falling for PR; people are encouraging the PR to put pressure on the competition. It's not that hard.

by someotherperson

4/16/2026 at 8:41:20 PM

Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

Here and on AI tech subreddits (ones that aren’t specifically about local or FOSS) seem to have this dynamic, to the degree I’ve suspected astroturfing.

So it’s refreshing to see maybe that’s just a coincidence or confirmation bias on my end.

by frank_nitti

4/16/2026 at 9:27:59 PM

Both can be true at the same time. I currently wouldn't waste my time with open models for almost all use cases, but they're crucial from a data privacy and competitive perspective, and I can't wait for them to catch up enough to be as useful as the current frontier models.

by lxgr

4/16/2026 at 10:03:08 PM

I've found qwen3 to be very usable on my local machine (a Framework Desktop with 128gb RAM). I doubt it could handle the complex tasks I throw at Claude Opus at work, but it's more than capable of doing a surprising number of tasks, with good performance.

by organsnyder

4/16/2026 at 10:14:23 PM

What tasks do you use qwen3 for? Coding? Are you running it on CPU or GPU? What GPU does that Framework have?

Thanks!

by dotancohen

4/16/2026 at 10:44:15 PM

I have an Asus GX10 that I run Qwen3.5 122B A10B on, and I use it for coding through the Pi coding agent (and my own); I have to put more work in to ensure that the model verifies what it does, but if you do so its quite capable.

It makes using my Claude Pro sub actually feasible: write a plan with it, pick it up with my local model and implement it, now I'm not running out of tokens haha.

Is it worth it from a unit economics POV? Probably not, but I bought this thing to learn how to deploy and serve models with vLLM and SGLang, and to learn how to fine tune and train models with the 128GB of memory it gets to work with. Adding up two 40GB vectors in CUDA was quite fun :)

I also use Z.ai's Lite plan for the moment for GLM-5.1 which is very capable in my experience.

I was using Alibaba's Lite Coding Plan... but they killed it entirely after two months haha, too cheap obviously. Or all the *claw users killed it.

by girvo

4/17/2026 at 1:06:17 AM

GLM 5.1 is extremely good, and ridiculously cheap on their coding plan. Its far better than Sonnet, and a fifth of the cost at API rates. I don't know if the American providers can compete long-term; what good is it to be more innovative it only buys them a six month lead andthey can't build the data center capacity fast enough for demand? Chinese providers have a huge advantage in electrical grid capacity.

by jeremyjh

4/17/2026 at 2:07:49 AM

True but Z.ai also just silently raised the price, and the entire Chinese frontier set is having to make profit now... hence Alibaba killing the Lite plan and not letting people sign up to their Pro one either; and why MiniMax has their non-commercial license, etc. etc.

So I agree with you, its better than Sonnet but way cheaper. I do wonder how long that will last though

by girvo

4/17/2026 at 5:00:20 AM

Z.ai does really well at the carwash question!

by fragmede

4/16/2026 at 10:56:00 PM

Thank you. I've been using ollama for a much more modest local inference system. I'll research some of the things you've mentioned.

by dotancohen

4/17/2026 at 12:15:55 AM

[dead]

by botanrice

4/16/2026 at 10:21:14 PM

The Framework Desktop has a Ryzen 395 chip that is able to allocate memory to either the CPU or GPU. I've been able to allocate 100+gb to the GPU, so even big models can run there.

Most recently I used it to develop a script to help me manage email. The implementation included interacting with my provider over JMAP, taking various actions, and implementing an automated unsubscribe flow. It was greenfield, and quite trivial compared to the codebases I normally interact with, but it was definitely useful.

by organsnyder

4/16/2026 at 10:58:35 PM

That's great. Ostensibly my system could also allocate some of the 32 GB of system memory to argument the 12 GB VRAM, but I've not been able to get it to load models over 20B. I should spend some more time on it.

by dotancohen

4/17/2026 at 5:54:36 PM

Local isn't viable yet on an economic basis, API costs are so low that you're better off taking advantage of the bonanza. As local models become more performant, so too will the ability of providers via Openrouter be able to offer them cheaper than your likely payoff period for a $4K Mac Studio 128GB. e.g Gemma 4 31B is impressive, but it costs practically nothing via Openrouter. Given that there are a ton of providers for open models, I doubt there's any subsidy going on because the providers are faceless and interchangeable.

At least, that's my theory.

The big advantages of local on a business level are:

- Freezing your model's exact settings once you've locked in some kind of workflow that works just fine. - Guarding against insane token usage from LLMs who have been told to never stop until they figure out the solution OR setting up an LLM run incorrectly. (The last one happened to me with Gemini 3.1 Pro) - PII or some need for on-premise only LLMs.

by adamsmark

4/16/2026 at 9:10:08 PM

I'm just waiting till I can afford a GPU again

by bloppe

4/17/2026 at 1:48:12 AM

> Interesting to see your observation where I have observed the opposite: posts that share big news about open-weight local models have many upvoted comments arguing local models shouldn’t be taken seriously and promoting the SOTA commercial models as the only viable options for serious developers.

When I argue this, my point is that FOSS shouldn't target the desktop with open weights - it should target H200s. Really big parameter models with big VRAM requirements.

Those can always be distilled down, but you can't really go the other way.

by echelon

4/17/2026 at 1:47:50 AM

I've invested significant time into getting open models to work, and investigating what works well.

The TL;DR is that unless you are doing it as a hobby or working in an environment where none of the data privacy options supported by Anthropic/OpenAI (including running on Azure/Bedrock with ZDR) work for you then it's not worth it.

The best open models are around the Sonnet 4.6 level. That's excellent, but the level of tasks you can give to GPT 5.4 or Opus 4.6 is just so much higher it doesn't compare (and Opus 4.7 seems noticeably better in my few hours of testing too).

I have my own benchmarks, but I like this much under-publicized OpenHands page: https://index.openhands.dev/home

It shows for every task they test closed models do the best. The closest and open model gets is Minmax 2.7 on issue resolution where it's ~1% worse than the leaders.

That matches my experience - fine for small problems, but well behind has the task gets bigger.

by nl

4/16/2026 at 8:24:17 PM

I agree but I’d like to add that people are definitely falling for PR, people are always falling for PR or no one would bother with PR

by whymememe

4/17/2026 at 5:00:11 AM

This assumes people are in touch with reality and aren't just motivated by vibes and insta-reactions on social media

by dmix

4/16/2026 at 9:29:55 PM

> Competition is bad? Who cares - let the big players subsidize and compete between each other.

Subsidizing is the opposite of competing. It's literally the practice of underpricing your product to box out competition. If everyone was competing on a level playing field they would all price their products above cost.

All these tech oligarch asshat companies need to be regulated to hell and back.

by daveguy

4/16/2026 at 9:43:46 PM

The moat was already too large for smaller players. Let them subsidize. Take from investors and give to us buying me time to beef up my local stack to run local models.

For many things now you need to go local and in the future if you want any privacy you'll need to go local.

by ipaddr

4/16/2026 at 10:00:47 PM

Excellent point, but I still think the oligarchs have gotten a little monopoly-happy.

by daveguy

4/16/2026 at 10:43:23 PM

What's the alternative, move to North Korea ?

by agentifysh

4/16/2026 at 11:14:52 PM

Well, that's a great big wtf out of left field.

by daveguy

4/17/2026 at 8:32:35 PM

You didn't seem to like competition or market forces pricing things. Just a suggestion.

by agentifysh

4/17/2026 at 3:33:29 AM

It's hilarious how much this post reads as drafted by an LLM. The emdash, "it's not X, it's Y" framing, incredible.

by badrequest

4/17/2026 at 8:01:59 AM

People use em-dashes all the time. This is why LLMs use it too. Also guess how LLMs learnt to use "it's not X, it's Y".

by subscribed

4/17/2026 at 4:51:50 AM

I wrote my post myself.

by someotherperson

4/17/2026 at 3:45:40 AM

Dogfooding by the slop factory. The artificial centipede.

by sph

4/16/2026 at 8:45:34 PM

Big players subsidizing is what kills medium and small players which then kills competition. What follows is monopoly.

Big players operating at loss to distort the market is not a good thing overall.

by watwut

4/16/2026 at 9:22:49 PM

The medium and small players are literally just distilling the larger models.

It's not the smaller players spending billions on training data.

by someotherperson

4/17/2026 at 3:35:15 AM

No, the medium and small players are the Mistals, DeepSeek and H Company of the world, with their own models using quirky optimisation techniques to be able to compete.

by sofixa

4/16/2026 at 9:39:33 PM

Call it fall for it, but here are my two experiences, with both applications open. ($20/month plan for both)

  - Claude: Good for ~20 minutes of work once every 4 hours
  - Codex: Good for however long I want to use it.

Claude nerfed their product so that it's not usable, so I use something else.

by the__alchemist

4/16/2026 at 9:51:57 PM

Since we’re sharing anecdata: I also have the $20 month plan for codex, and I hit the five hour limit after about an hour of work every single time I open it. I use it for personal side projects primarily in the evening after kids are in bed, so my strategy is to launch it about 4pm and send a simple prompt to prime the 5 hour window to end at 9pm, start working about 8pm, and then I can use up the existing 5 hour window and the next one by about 10pm.

by CrazyStat

4/17/2026 at 12:17:41 AM

What kind of side projects do you need to run these models for that many hours? I haven't experimented with Opus to that extent and mostly supervise it and/or am prompting it every 5-10min to fix something up.

by botanrice

4/17/2026 at 1:13:11 AM

I've done a variety of things with it:

- sysadmin tasks for my home server which runs home assistant, plex, and minecraft servers. Being able to tell it "Set up a minecraft fabric server with this list of mods" is pretty nice, and it's fairly competent at putting together home assistant dashboards and automations (make sure you have backups of anything it's allowed to touch, though--it may delete stuff without warning).

- Several small web apps primarily for my own use.

- Currently working on an opinionated desktop writing app for my own use.

by CrazyStat

4/16/2026 at 9:50:56 PM

I'm on the 100 USD plan with Anthropic, I hit the 5 hour limits about 75% of the time during working hours, but almost never the weekly ones - by the time they're reset I've usually used up between 50% - 75% of the quota. There are periods of more intense usage ofc, but this is the approx. situation I'm in (also it doesn't work on tasks while I'm asleep, because I occasionally like having a look at WIP stuff and intervene if needed).

The Anthropic 20 USD plan would more or less be a non-starter for agentic development, at least for the projects that I work on, even while only working on a single codebase or task at a time (I usually do 1-3 at a time).

I would be absolutely bankrupt if I had to pay per-token. That said, I do mostly just throw Opus at everything (though it sometimes picks Sonnet/Haiku for sub-agents for specific tasks, which is okay), so probably not a 100% optional approach, but I've wasted too much time and effort in the past on sub-optimal (non-SOTA) models anyways. I wonder which is closer to the actual cost and how much subsidizing there is going on.

by KronisLV

4/16/2026 at 11:15:23 PM

The $200 openai plan feels like 10x the limit as the $100 claude plan.

But Opus is both smarter and faster than GPT, so I can get a lot more done during the Claude limits.

by bitmasher9

4/17/2026 at 2:29:03 AM

for now... right now you are getting 2x usage as a promo

by lsdmtme

4/16/2026 at 10:05:06 PM

Concur, re the ratio of weekly vs hourly limits: I hit the hourly one much more often than weekly.

by the__alchemist

4/17/2026 at 4:00:37 AM

[dead]

by rachel_rig

4/16/2026 at 9:58:22 PM

Wow the 20 dollar Claude plan sounds awful. I use Claude at work which has metered billing and have to carefully not to hit my four figure max cap.

For me $20 a month is more than I want to spend I just use the free tiers. If I use AI in an app or site I use older models mostly chatgpt3.5. The challenge is more fun and it means I can do more like, make more api calls - 100x more.

by ipaddr

4/17/2026 at 2:14:28 AM

I use $20 plan for my side projects and in the beginning I was hitting limits very fast but after creating proper .md files and running /clear, it seems to work fine for my use. I am really curious how people are using $100-$200 plans. Maybe I am not utilizing to its full capacity??

by XDataY

4/16/2026 at 9:52:31 PM

[dead]

by dingnuts

4/16/2026 at 8:33:21 PM

There's a systematic marketing campaign from oai on reddit and HN - there's a huge uptick of "codex is better than claude code" comments and posts this last week which is perfectly timed with the claude code increased limits

by BrokenCogs

4/16/2026 at 8:40:06 PM

Go to /r/codex and see how pissed off people are by the new Codex Plus plan 5-hour limits (they're a sliver of what they were a week ago). Whatever OpenAI is doing to market on Reddit isn't working.

by unsupp0rted

4/16/2026 at 9:00:29 PM

I'm not sure what changed or what the complaint is ... But personally, I have still never hit the rate limit on the $20/mo ChatGPT Plus plan, while I was constantly getting kicked off the Claude Pro plan until I got fed up and cancelled a few months ago.

by toraway

4/16/2026 at 9:28:06 PM

I can get about 20 ~ 40 minutes of my 5-hour limit using Codex 5.4 medium to say write a patch script in typescript for a Firebase + BigQuery app. That's including about 10 minutes of first writing a planning.md doc with 5.2 High.

A couple weeks ago I'd get roughly 2~3 hours. And a month before that I couldn't break the 5-hour limit.

by unsupp0rted

4/16/2026 at 11:16:35 PM

They were running a 2x rate limit promo last month.

by CuriouslyC

4/17/2026 at 10:26:58 AM

Theoretically yes. In practice even a few weeks before it ended, the actual rate limit was down to what it was before the promo. And now I'm getting roughly 0.25x of what I got before the promo.

by unsupp0rted

4/16/2026 at 11:15:59 PM

To be fair, GPT 5.4 is mostly a better model than Opus 4.6 in terms of quality of work. The tradeoff is it's less autonomous and it takes longer to complete equivalent tasks.

by CuriouslyC

4/16/2026 at 8:43:26 PM

Thing is, Codex 5.3 is a better and more consistent model than anything Anthropic have come out with. It can deal with larger codebases, has compaction that works, and has much less of a tendency to resort to sycophantic hallucination as it runs out of ideas. I also appreciate their approach to third party harnesses like opencode, which is obviously the complete opposite to Anthropic and their scramble to keep their crumbling garden walls upright.

Which makes it even more of a shame that Sam Altman is such a psychopathic jackass.

by boomskats

4/16/2026 at 8:27:13 PM

So Anthropic degraded their product. OAI updated their product to meet for exceeded Anthropic old product.

This is normal behavior and not a cause for such a hyperbolic response.

by luddit3

4/16/2026 at 10:53:04 PM

There is good competition and bad competition.

Pricing your product unsustainably vs a competitor to gain market share is regarded as "bad competition" and has historically been seen as anticompetitive.

It does not benefit the consumer in the long run, because the goal is to use your increased funding or cash reserve to wipe your competition out of the market, decreasing competition in the long term.

Then, once your competition is gone, and you've entrenched yourself, you do a rug pull.

by solenoid0937

4/17/2026 at 3:20:35 AM

you're right but for now it doesn't matter if both competitors are running on infinite vc money, we as consumers benefit from it. it only matters if they cause negative externalities in the meantime

by byzantinegene

4/16/2026 at 8:57:07 PM

This is the benefits of competition in action

by pizzly

4/16/2026 at 10:45:44 PM

To be clear, unsustainably hemorrhaging money to gain marketshare over a competitor is generally considered an anticompetitive practice.

by solenoid0937

4/16/2026 at 10:57:37 PM

What if both competitors are doing it?

by toraway

4/17/2026 at 1:18:28 AM

It’s also THE playbook of the Silicon Valley.

by justapassenger

4/17/2026 at 1:20:49 AM

Also why there’s so much enthusiasm for it on HN

by guzfip

4/16/2026 at 10:50:13 PM

I have a feeling that Codex is also getting lower limits. Got this email just now. Basically they copy Claude's $100 tier.

> To help you go further with Codex, we’re introducing a new €114 Pro tier designed for longer, high-intensity sessions.

> At launch, this new tier includes a limited-time Codex usage boost, with up to 10x more Codex usage than Plus (typically 5x).

> As the Codex promotion on Plus winds down today, we’re rebalancing Plus usage to support more sessions across the week, rather than longer high-intensity sessions on a single day.

by m3nu

4/16/2026 at 9:14:36 PM

This is true. But Anthropic did us dirty most recently and so it’s their turn on the pitch fork. Sam will do us too. Just not yet.

by kar1181

4/16/2026 at 11:13:48 PM

They didnt just lower limits they keep messing with peoples local settings and I wish it would be called out drastically more because it could cause serious issues. A coding agents settings are a contract, even the default ones, if they worked for me for 9 months and now you are changing defaults on me, you shouldnt just force new defaults on me without warning, Claude can and will goof up hard if misconfigured.

by giancarlostoro

4/17/2026 at 1:26:03 AM

It's one of the things I really dislike about providers hyping "inference time scaling" as a concept. Apart from being a blatant misnomer (there's nothing scalable about it), it's so transparently a dial they can manipulate to shape perception. If they want a model to seem more intelligent than it really is, just dial up the "thinking" and burn tokens. Then once you have people fooled, you can dial it down again. Everyone will assume its their own fault that their AI suddenly isn't working properly. And since it's almost entirely unmeasurable you can do it selectively for any given product you want to pitch for any period of time you like and then pull the rug.

We need to force them back into being providers of commodity services and hit this assumption they can mold things in real time on the head.

by zmmmmm

4/16/2026 at 9:40:41 PM

Thinking in counterfactuals, how would the hype around Codex would be different if it was organic and because they had built a genuinely good product? Asking as someone who genuinely loves Codex and has been in the OpenAI camp for months after buying a Claude Max plan from November to February.

by chaos_emergent

4/16/2026 at 9:45:46 PM

I haven’t noticed much hype around Codex. I have both and use Claude for broad work off my phone and Codex on my computer to clean up the mess. Crank reasoning to the highest setting for each. Claude is extremely unreliable for me, and Codex feels like more of a real tool. I’d say Codex has a bit of a learning curve. Nothing much has changed for me in the past month or two (whenever GPT 5.4 came out).

by peyton

4/17/2026 at 1:31:46 AM

It's quite likely that OpenAI is running a significant PR campaign to compensate for the bad rep they earned by stepping in to meet the demands of the Trump administration, after Anthropic refused to assist the administration with mass domestic surveillance and development of lethal autonomous weapons. Presumably OpenAI didn't buy the podcast TBPN just because they like the guys.

https://paulgraham.com/submarine.html

by AlexCoventry

4/16/2026 at 9:12:47 PM

everyone seems to unconditionally love anthropic, but openai has always had the best models… it just requires a bit more effort on behalf of the user to actually leverage it.

by keeganpoppen

4/16/2026 at 8:35:33 PM

There was brief consternation when OpenAI swooped in to snatch up those DoD contracts but then the next model released and all is forgiven.

by yoyohello13

4/16/2026 at 9:57:15 PM

Anthropic coming out to say they won't surveil Americans wasn't actually a positive for me. It meant they're okay with surveilling the rest of the world, which in turn signaled "fuck you, you're inferior, deal with it" to me (as someone from the aforementioned rest of the world).

When OpenAI snatched those contracts, it made me think no worse of OpenAI. The surveillance was already factored into how I saw them (both).

by olcay_

4/16/2026 at 10:10:35 PM

Codex is much worse than Anthropics model. My experience is that I burn 10x the tokens using Codex compared to Sonnet 4.6

by jsemrau

4/16/2026 at 10:48:27 PM

> because Anthropic lowered rate limits for individuals due to compute constraints

It's because they don't support OpenCode.

by raincole

4/16/2026 at 8:29:58 PM

Not only that, but anthropic is now forcing users to give their biometric information to palantir

They're doing a slow rollout

by greenavocado

4/16/2026 at 10:57:06 PM

OAI already requires this. They both require identity verification in some cases

by solenoid0937

4/17/2026 at 2:02:16 AM

Anthropic don't seem to know how to look after and keep customers.

by ra

4/16/2026 at 10:29:20 PM

And hopefully Anthropic has extra capacity then and I can return there.

by HWR_14

4/17/2026 at 4:42:33 AM

I really hate this kind of behavior. Yeah, Anthropic may do some bad things, I don't know, but we all see that Anthropic is always one step ahead of OpenAI. And just because Anthropic lowered rates for some people, people now start saying that Codex is way better than Claude Code / Claude Desktop.

by khacvy

4/17/2026 at 2:07:09 AM

No it’s because Anthropic can’t message anything to its customers without lying.

by iterateoften

4/17/2026 at 2:48:47 AM

Uber, but AI!

by a34729t

4/18/2026 at 7:26:43 PM

If only there were a third major player, maybe one who was even much more established as a cloud provider...

by xnx

4/16/2026 at 5:24:40 PM

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

by tvmalsv

4/16/2026 at 5:37:30 PM

Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

by romanovcode

4/16/2026 at 5:30:58 PM

FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

by dilap

4/17/2026 at 11:11:08 AM

I've been switching between both depending on which one is having a good week — and that's the honest answer for most people right now.

But the real issue I ran into wasn't which model is better. It's that every time I switched, I lost weeks of accumulated context. The AI didn't know my project's conventions anymore, didn't remember the architecture decisions, didn't know what was tried and rejected.

What helped me was separating the project context from the tool. Keep the conventions, rules, and decisions in plain files in the repo. Both Claude Code and Codex can read them at session start. Then the question becomes "which model is sharper this week" instead of "can I afford to lose my context."

The answer to your question: it's mostly a wash on capability. The real cost of switching is the context you don't realize you're rebuilding.

by fredericgalline

4/16/2026 at 5:32:59 PM

at least for our scope of work (data, interfacing with data, building things to extract data quickly and dump to warehouse, resuming) claude is performing night and day better than codex. we're still continuing tinkering with codex here to see if we're happy with it but it's taking a lot more human-in-the-loop to keep it from going down the wrong path and we're finding that we're constantly prompt-nudging it to the end result. for the most part after ~3 days we're not super happy with it. kinda feels like claude did last year idk. it's worth checking out and seeing if it's succeeding at the stuff you want it to do.

by trueno

4/16/2026 at 5:27:13 PM

I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

by Austin_Conlon

4/16/2026 at 7:09:08 PM

> 2x speed mode that isn't billed as extra usage

...at least for my account, the speed mode is 1.5x the speed at 2x the usage

by gbear605

4/16/2026 at 9:22:52 PM

Whoops yes I meant 1.5x speed!

by Austin_Conlon

4/16/2026 at 6:26:26 PM

Honestly, just try it. I used both and there's no reason to not try depending on which model is superior at a given point. I've found 5.4 to be better atm (subject to change any time) even though Claude Code had a slicker UI for awhile.

by finales

4/16/2026 at 5:36:44 PM

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

by hmokiguess

4/16/2026 at 8:27:39 PM

cursor has been doing this for months, welcome to 3 months ago

by graphememes

4/16/2026 at 8:30:13 PM

"Our mission is to ensure that AGI benefits all of humanity. "

They have AGI now?

by CrzyLngPwd

4/16/2026 at 10:14:15 PM

Yes, Artificial Goofy Intelligence

by hipshaker

4/16/2026 at 5:25:10 PM

Is it OpenAI Cowork?

by armcat

4/17/2026 at 6:20:40 PM

Can we get up from our desk and leave our codex session (or claude for that matter) and then continue using it with our iphone while having lunch or commuting on a train?

Without 3rd party tools/plugins.

by sharts

4/16/2026 at 9:05:36 PM

Just commenting here to impact the controversy score.

by SilverBirch

4/16/2026 at 6:13:16 PM

I'm sure it's been said before, but more and more our development work is encroaching on personal compute space. Even for personal projects. A reminder to me to air gap those to spaces with separate hardware [:cringe:]

by tty456

4/16/2026 at 8:39:00 PM

"Codex can now operate your computer alongside you" - I really don't want AI to "operate" my computer.

by huqedato

4/16/2026 at 6:27:58 PM

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

by thm

4/16/2026 at 8:39:48 PM

Can't help but think the surface area for security issues is becoming massive with these tools

by ex-aws-dude

4/17/2026 at 8:12:49 AM

Mac only? Meh.

by TheServitor

4/17/2026 at 9:00:00 AM

[flagged]

by EthanFrostHI

4/17/2026 at 6:32:36 AM

[flagged]

by EthanFrostHI

4/17/2026 at 3:47:02 AM

[flagged]

by maryjeiel

4/18/2026 at 12:14:30 PM

[dead]

by LOVELYZOMBIEYHO

4/17/2026 at 2:29:46 AM

[dead]

by kevinten10

4/16/2026 at 6:22:50 PM

[dead]

by nerdsfeed

4/16/2026 at 7:52:01 PM

[flagged]

by vox-machina

4/16/2026 at 7:55:43 PM

[flagged]

by throwaway911282

4/16/2026 at 8:19:43 PM

Why are all your comments praising OpenAI and its models or attacking the competition?

by dieortin

4/16/2026 at 8:31:15 PM

sam altman realized the molotov cocktail incident was not enough of a story to get sympathy, so now he's just drumming up his bots for good oai PR.

by BrokenCogs

4/17/2026 at 4:04:21 PM

because Ant is not trustable

by throwaway911282

4/16/2026 at 11:01:13 PM

I am quite worried that people are continuing to use OpenAIs offerings just because it works. Everyone here seem to gloss over the fact that this is a project funded by Peter Thiel. Thousands of morslity posts, complaints about ICE, Tump etcand yet you all choose to use a tool created and funded by the same person enabling this dictatorial machine.

I am speechless everytime I see posts like this and the comments following, vote with your behavior stop supporting and enabling the Peter Thiel universe, just a few weeks ago we had an oped about openAI and Sam, look into yourselfs and really reflect on whom you are enabling by continuing to contribute to their baseline

by rommelsLegacy

4/17/2026 at 3:24:05 AM

If you’re expecting morality from the HN crowd they will disappoint you every time. Most of the people here wish they could be as ruthless and successful as someone like Sam Altman.

by yoyohello13

4/17/2026 at 5:54:03 PM

Thank you for your comment, it's comforting to show I'm not the only one getting offended/disappointed by the behavior of people within our industry.

Truly I don't expect morality, and I'm not even making the moral argument to not use it tbh, as I consider morality to be a double edged sword.

Yet I wish that at least there's some base sensibility, and some common sense or at least to the very least some self accountability on the actions we take as persons in tech, as they transform and influence the world around us.

by rommelsLegacy

4/16/2026 at 5:28:32 PM

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

by VadimPR

4/16/2026 at 5:46:04 PM

I don't see how it's possible to support Linux with Wayland, unless you limit the automation only to the browsers.

by duckmysick

4/16/2026 at 7:37:00 PM

https://github.com/patrickjaja/claude-desktop-bin seems to be trying hard to but I haven't tried it.

by VadimPR

4/16/2026 at 5:29:32 PM

This is why both companies are in an SF bubble.

by rvz

4/16/2026 at 5:39:32 PM

Linux desktop users. Talk about a bubble!

by mrcwinn

4/16/2026 at 5:42:37 PM

There's this thing called Windows.

I don't like it, and I'm sure you don't either, but it's not a Mac. Or a Linux. And it's what most actual desktop users are stuck with, still.

by cmrdporcupine

4/16/2026 at 8:16:45 PM

SSH to devboxes is the exact usecase for services like https://shellbox.dev: create a box using ssh... and ssh into it. Now web, no subs. Codex can create it's own boxes via ssh

by messh

4/16/2026 at 5:18:59 PM

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

by croemer

4/16/2026 at 7:10:05 PM

Man this progress is fast.

Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.

I'm waiting for the open source ai ecosystem to catch up :/

by Glemllksdf

4/16/2026 at 5:36:34 PM

I wish Codex App was open source. I like it, but there are always a bunch of little paper cuts that, if you were using codex cli, you could have easily diagnosed and filed an issue. Now, the issues in the codex repo is slowly becoming claude codish – ie a drawer for people's feelings with nothing concrete to point to.

by postalcoder

4/16/2026 at 5:40:06 PM

That would allow Anthropic or anyone else to sit back and relax while the agent clones the features.

by avaer

4/16/2026 at 7:44:19 PM

The first example is tic tac toe. Why would anyone bother? None of those eash things are relevant for people who use AI. They don't care about learning, improving, exploring how things work, creating, being creative to that degree. They want to hit buttons and see the computer do things and get a dopamine rush.

by lionkor

4/16/2026 at 7:46:53 PM

Fuck, i've been using it wrong.

by sophacles

4/16/2026 at 8:26:30 PM

[dead]

by 3fgdf