1/21/2026 at 9:20:46 AM
All of this might as well be greek to me. I use ChatGPT and copy paste code snippets. Which was bleeding edge a year or two ago, and now it feels like banging rocks together when reading these types of articles. I never had any luck integrating agents, MCP, using tools etc.Like if I'm not ready to jump on some AI-spiced up special IDE, am I then going to just be left banging rocks together? It feels like some of these AI agent companies just decided "Ok we can't adopt this into the old IDE's so we'll build a new special IDE"?_Or did I just use the wrong tools (I use Rider and VS, and I have only tried Copilot so far, but feel the "agent mode" of Copilot in those IDE's is basically useless).
by alkonaut
1/21/2026 at 10:08:23 AM
I'm so happy someone else says this, because I'm doing exactly the same. I tried to use agent mode in vs code and the output was still bad. You read simple things like: "We use it to write tests". I gave it a very simple repository, said to write tests, and the result wasn't usable at all. Really wonder if I'm doing it wrong.by prettygood
1/21/2026 at 10:42:23 AM
I’m not particularly proAI but I struggle with the mentality some engineers seem to apply to trying.If you read someone say “I don’t know what’s the big deal with vim, I ran it and pressed some keys and it didn’t write text at all” they’d be mocked for it.
But with these tools there seems to be an attitude of “if I don’t get results straight away it’s bad”. Why the difference?
by kace91
1/21/2026 at 10:47:13 AM
There isn't a bunch of managers metaphorically asking people if they're using vim enough, and not so many blog posts proclaiming vim as the only future for building softwareby Macha
1/21/2026 at 11:18:53 AM
I’d argue that, if we accept that AI is relevant enough to at least be worth checking, then dismissing it with minimal effort is just as bad as mindlessly hyping the tech.by kace91
1/21/2026 at 11:43:25 AM
You must be new here. "I use vim between", "you don't use vim, you use Visual Studio, your opinion doesn't count" is a thing in programming circles.by dist-epoch
1/21/2026 at 8:27:11 PM
Internet commenters, sure.It never broke into the workplace like measuring AI use among your employees. Nobody's asked me about how I've used vim keybinds to improve the company's growth in a performance review.
by Macha
1/21/2026 at 12:01:32 PM
I don't understand how to get even bad results. Or any results at all. I'm at a level where I'm going "This can't just be me not having read the manual".I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.
It's not really that the code it writes might not work. I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.
by alkonaut
1/21/2026 at 1:50:35 PM
>I don't understand how to get even bad results. Or any results at all. I'm at a level where I'm going "This can't just be me not having read the manual".>I get the same change applied multiple times, the agent having some absurd method of applying changes that conflict with what I say it like some git merge from hell and so on. I can't get it to understand even the simplest of contexts etc.
That is weird. results have a ton of variation, but not that much.
Say you get a claude subscription, point it to a relatively self contained file in your project, hand it the command to run relevant tests, and tell it to find quick win refactoring opportunities, making sure that the business outcome of the tests is maintained even if mocks need to change.
You should get relevant suggestions for refactoring, you should be able to have the changes applied reasonably, you should have the tests passing after some iterations of running and fixing by itself. At most you might need to check that it doesn't cheat by getting a false positive in a test or something similar.
Is such an exercise not working for you? I'm genuinely curious.
by kace91
1/21/2026 at 12:42:19 PM
> I'm at a level where I'm going "This can't just be me not having read the manual".Sure it can, because nobody is reading manuals anymore :).
It's an interesting exercise to try: take your favorite tool you use often (that isn't some recent webshit, devoid of any documentation), find a manual (not a man page), and read it cover to cover. Say, GDB or Emacs or even coreutils. It's surprising just how much powerful features good software tools have, and how much you'll learn in short time, that most software people don't know is possible (or worse, decry as "too much complexity") just because they couldn't be arsed to read some documentation.
> I just can't get past the actual tool use. In fact, I don't think I'm even at the stage where the AI output is even the problem yet.
The tools are a problem because they're new and a moving target. They're both dead simple and somehow complex around the edges. AI, too, is tricky to work, particularly when people aren't used to communicating clearly. There's a lot of surprising problems (such as "absurd method of applying changes") that come from the fact that AI is solving a very broad class of problems, everywhere at the same time, by virtue of being a general tool. Still needs a bit of and-holding if your project/conventions stray away from what's obvious or popular in particular domain. But it's getting easier and easier as months go by.
FWIW, I too haven't developed a proper agentic workflow with CLI tools for myself just yet; depending on the project, I either get stellar results or garbage. But I recognize this is only a matter of time investment: I didn't have much time to set aside and do it properly.
by TeMPOraL
1/21/2026 at 11:08:38 AM
I agree to a degree, but I am in that camp. I subscribe to alphasignal, and every morning there are 3 new agent tools, and two new features, and a new agentic approach, and I am left wondering, where is the production stuff?by neumann
1/21/2026 at 11:46:32 AM
So just like in the JavaScript world?by dist-epoch
1/21/2026 at 10:58:03 AM
Well one could say that since it's AI, AI should be able to tell us what we're doing wrong. No?AI is supposed to make our work easier.
by galaxyLogic
1/21/2026 at 3:49:11 PM
Certainly, every tool is supposed to make our work easier or more productive, but that doesn't mean that every tool is intuitive or easy to learn to use effectively or even to use it at all.by Nekobai
1/21/2026 at 10:45:24 PM
Certainly, but aren't AI tools supposed to be intuitive and easy to use because we can communicate with them in natural language?With VIM or Emacs I am supposed to know what Ctrl-X does. But with AI tools (ideally) I should be able to ask AI (in English) to edit the document for me?
Maybe the reason we can't do it that way is that, "We're not there yet"?
by galaxyLogic
1/21/2026 at 11:20:33 AM
What you are doing wrong in respect to what? If you ask for A, how would any system know that you actually wanted to ask for B?by kace91
1/21/2026 at 11:46:00 AM
Honestly IMO it's more that I ask for A, but don't strongly enough discourage B then I get both A, B and maybe C, generally implemented poorly. The base systems need to have more focus and doubt built in before they'll be truely useful for things aside from a greenfield apps or generating maintainable code.by walt_grata
1/21/2026 at 9:48:21 PM
Some people shouldn't just be engineers in the first place, I guess.by chewz
1/21/2026 at 10:10:55 AM
You didn't actually just say "write tests" though right? What was the actual prompt you used?I feel like that matters more than the tooling at this point.
I can't really understand letting LLMs decide what to test or not, they seem to completely miss the boat when it comes to testing. Half of them are useless because they duplicate what they test, and the other half doesn't test what they should be testing. So many shortcuts, and LLMs require A LOT of hand-holding when writing tests, more so than other code I'd wager.
by embedding-shape
1/21/2026 at 1:22:18 PM
There are a lot of comments on HN and other places breathlessly gushing about agents totally doing everything end to end, so I couldn't blame someone new to this space for naively assuming that agents would be able to handle a well-bounded problem such as test coverage reasonably well.by Balinares
1/21/2026 at 2:08:39 PM
> naively assuming that agents would be able to handle a well-bounded problem such as test coverage reasonably well.We haven't figured out a way for humans to do that well :P I still see people arguing about "80% test coverage is obviously better than 70%" and similar dumb sentiments that completely misses the point.
But agree with the first part, LLMs are massively oversold and it's hard to blame users for believing them. Tempered expectations as always win.
by embedding-shape
1/21/2026 at 1:58:23 PM
No, that was an exaggeration. The prompt was decent. I explained the point of the repository, that I wanted full coverage with tests, that it could keep going until it worked. Maybe that was still not enough. With how others talk about it, I must be missing something.by prettygood
1/21/2026 at 2:07:00 PM
For tests, you need to be precise about what it should test, how it should test it, and what the assertions should be, otherwise you'll mostly get trash, they're exceptionally horrible at writing tests. Which makes sense, most programmers are too, but given the importance of correct tests, it's probably the part that needs to most human handholding right now.by embedding-shape
1/21/2026 at 1:39:35 PM
“Write tests“ may not be enough; provide it with a test harness, and instruct it to “write tests until they pass “. Next would be “your feature isn’t complete without N% coverage”. These require the ‘agentic’ piece, which is at its simplest some prompts run in a loop until an exit condition is met.by threecheese
1/21/2026 at 1:05:19 PM
> I gave it a very simple repository, said to write tests, and the result wasn't usable at all. Really wonder if I'm doing it wrong.I think so. The humans should be writing the spec. The AI can then (try to) make the tests pass.
by tasuki
1/21/2026 at 11:07:26 AM
No, you have similar experience as a lot of people have.LLMs just fail (hallucinate) in less known fields of expertise.
Funny: Today I have asked Claude to give me syntax how to run Claude Code. And its answer was totally wrong :) So you go to documentation… and its parts are obsolete as well.
LLM development is in style “move fast and break things”.
So in few years there will be so many repos with gibberish code because “everybody is coder now” even basketball players or taxi drivers (no offense, ofc, just an example).
It is like giving F1 car to me :)
by sixtyj
1/21/2026 at 10:30:42 AM
you need to write a test suite to check his test generation (soft /s)by agumonkey
1/21/2026 at 9:31:37 AM
Yeah if you've not used codex/agent tooling yet it's a paradigm shift in the way of working, and once you get it it's very very difficult to go back to the copy-pasta technique.There's obviously a whole heap of hype to cut through here, but there is real value to be had.
For example yesterday I had a bug where my embedded device was hard crashing when I called reset. We narrowed it down to the tool we used to flash the code.
I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.
There is absolutely no way I'd of been able to achieve that speed of resolution myself.
by CurleighBraces
1/21/2026 at 12:09:36 PM
- We narrowed it down to the tool we used to flash the code.- I downloaded the repository, jumped into codex, explained the symptoms and it found and fixed the bug in less than ten minutes.
Change the second step to: - I downloaded the repository, explained the symptoms, copied the relevant files into Claude Web and 10 minutes later it had provided me with the solution to the bug.
Now I definitely see the ergonomic improvement of Claude running directly in your directory, saving you copy/paste twice. But in my experience the hard parts are explaining the symptoms and deciding what goes into the context.
And let's face it, in both scenarios you fixed a bug in 10-15 minutes which might have taken you a whole hour/day/week before. It's safe to say that LLMs are an incredible technological advancement. But the discussion about tooling feels like vim vs emacs vs IDEs. Maybe you save a few minutes with one tool over the other, but that saving is often blown out of proportion. The speedup I gain from LLMs (on some tasks) is incredible. But it's certainly not due to the interface I use them in.
Also I do believe LLM/agent integrations in your IDE are the obvious future. But the current implementations still add enough friction that I don't use them as daily drivers.
by Bewelge
1/21/2026 at 12:23:53 PM
I agree with your statement and perhaps my example is bad/too specific in this case.Once I started working this way however, I found myself starting to adapt to it.
It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.
I also entirely agree that there's going to be a lot of innovation here.
IDEs imo will change to become increasingly focused on reading/reviewing code rather than writing, and in fact might look entirely different.
by CurleighBraces
1/21/2026 at 1:13:22 PM
> It's not unusual now to find myself with at least a couple of simultaneous coding sessions, which I couldn't see myself doing with the friction that using Claude Web/Codex web provides.I envy you for that. I'm not there yet. I also notice that actually writing the code helps me think through problems and now I sometimes struggle because you have to formulate problems up front. Still have some brain rewiring to do :)
by Bewelge
1/21/2026 at 1:38:38 PM
I think DHH said it best recently when he stated"I can literally feel competence draining out of my fingers"
by CurleighBraces
1/22/2026 at 8:55:49 AM
Why would you copy files anywhere?My daily process is like this:
Claude plans (Opus 4.5)
Claude implements (Opus at work, Sonnet at home - I only have the $20 plan personally :P )
After implementation the relevant files are staged
Then I start a codex tab, tell it to review the changes in the staged files
I read through the review, if it seems valid or has critical issues ->
Clear context on Claude, give it the review and ask it to evaluate if it's valid.
Contemplate on the diff of both responses (Codex is sometimes a bit pedantic or doesn't get the wider context of things) and tell Claude what to fix
If I'm at home and Claude's quota is full, I use ampcode's free tier to implement the fix.
by theshrike79
1/21/2026 at 9:30:10 AM
> I never had any luck integrating agentsWhat exactly do you mean with "integrating agents" and what did you try?
The simplest (and what I do) is not "integrating them" anywhere, but just replace the "copy-paste code + write prompt + copy output to code" with "write prompt > agent reads code > agent changes code > I review and accept/reject". Not really "integration" as much as just a workflow change.
by embedding-shape
1/21/2026 at 12:05:26 PM
I installed the copilot extension in my IDE, and switched on Agent mode.I don't really get how the workflow is supposed to work, but I think it's mostly due to how the tool is made. It has like some sort of "change stack" similar to git commits/staging but which keeps conflicting with anything I manually edit.
Perhaps it's just this particular implementation (Copilot integration in VS) which is bad, and others are better? I have extreme trouble trying to feed it context, handling suggested AI changes without completely corrupting the code for even small changes.
by alkonaut
1/22/2026 at 6:08:44 AM
Copilot in vs code is definitely trash. That aside the workflow is simple. If you are familiar with the code base then make sure to refer the files where a newb has to look if you were assigning the task to them. Tell it to ask questions. Usually framing the spec into a conversation will make things clearer in your own mind.by kaycey2022
1/21/2026 at 12:17:51 PM
Hm, yeah maybe. I've tried Cursor once, but the entire experience was so horrible, and it was really hard to know what's going on.The workflow I have right now, is something like what I put before, and I do it with Codex and Claude Code, both work the same. Maybe try out one of those, if you're comfortable with the terminal? It basically opens up a terminal UI, can read current files, you enter a prompt, wait, then can review the results with git or whatever VCS you use.
But I'm also never "vibe-coding", I'm reviewing every single line, and mercilessly ask the agent to refactor whenever the code isn't up to my standards. Also restart the agent after each prompt finished, as they get really dumb as soon as context is used more than 20% of their "max".
by embedding-shape
1/21/2026 at 7:48:34 PM
If you're dumping the context every prompt, that might be why you're not happy with the results of Cursor. I can run a dozen or two prompts before the context gets polluted enough that it's worth compacting. If you clear it's context every time, it's not going to get a holistic enough view of the problem to deliver a good feature.That's been my experience. You have to work them up to the big ask.
by ikidd
1/22/2026 at 9:00:01 AM
Instead of "asking to refactor", you might get better results by defining your standards in a ... standard way.Give the agent tools to determine whether code is up to your standards, an executable or script it can run that checks for code style and quality. This way it won't stop the agent loop until the checks pass - saving you time.
by theshrike79
1/21/2026 at 12:15:25 PM
Make sure you’re clicking “Keep” to “approve” the changes. It’s annoying but I don’t think there is a way around having to do that. Then if you manually edit something, you can mention it in your next chat message, e.g., “I made a few changes to <file>. <Next instruction>”by songodongo
1/21/2026 at 12:51:31 PM
Correct. Of the various ways to work, I find the in-IDE chat to be the worst. I rarely use it for anything other than “help me understand this line”.Try one of the CLIs. That’s the good stuff right now. Claude Code (or similar) in your shell, don’t worry about agentic patterns, skills, MCP, orchestrators, etc etc. Just the CLI is plenty.
by ctmnt
1/21/2026 at 7:43:46 PM
Copilot is a dumpster fire and I can understand why you're down on agents from that experience.Splurge on the $20 for Cursor, and install their IDE. Start with a simple project, more because it helps you see how it works than because Cursor can't handle more. Give it specific instruction and not too big a problem at one time so you can tailor the prompt. If it's niche, consider changing the model to Opus4.5 long enough for it to get a handle on the codebase. Use the Plan mode to start, adjust the plan, then let it god. Every time it makes changes it can be reverted to the state at previous prompts. Use git liberally.
I'm just a dumb farmer who quit programming 20 years ago, and I use it to build stuff that works IRL for my operation constantly. A dev should be able to wrap their head around it.
by ikidd
1/21/2026 at 9:24:22 AM
I feel like just use claude code. That is it. Use it you get the feel for it. Everyone is over complicating.It is like learning to code itself. You need flight hours.
by hahahahhaah
1/21/2026 at 10:01:04 AM
This is something that continues to surprise me. LLMs are extremely flexible and already come prepackaged with a lot of "knowledge", you don't need to dump hundreds of lines of text to explain to it what good software development practices are. I suspect these frameworks/patterns just fill up the context with unecessary junk.by cobolexpert
1/21/2026 at 10:11:47 AM
You get to 80% there (numbers pulled out of the air) by just telling it to do things. You do need more to get from 80% there to 90%+ there.How much more depends on what you're trying to do and in what language (e.g. "favourite" pet peeve: Claude occasionally likes to use instance_variable_get() in Ruby instead of adding accessors; it's a massive code smell), but there are some generic things, such as giving it instructions on keeping notes and giving them subagents to farm out repetitive tasks to prevent the individual task completion from filling up the context for tasks that are truly independent (in which case, for Claude Code at least, you can also tell it to do multiple in parallel)
But, indeed, just starting Claude Code (or Codex; I prefer Claude but it's a "personality thing" - try tools until you click with one) and telling it to do something is the most important step up from a chat window.
by vidarh
1/21/2026 at 10:32:55 AM
I agree about the small tweaks like the Ruby accessor thing, I also have some small notes like that myself, to nudge the agent in the right direction.by cobolexpert
1/21/2026 at 1:54:01 PM
There's no such thing as universal "good software development practices". There's only lots of opinions. Some are more popular, some less; some are language, domain or company-specific (or even tool-specific - see various webshit frameworks whose idiosyncrasies spill over to significantly alter coding style), and many exist only as historical baggage - but they all largely conflict with each other. And LLMs have seen them all.Consider as an example, that "Clean Code" used to be gospel, now it's mostly considered a book of antipatterns, and many developers prefer to follow Ousterhout instead of Uncle Bob. LLMs "read" both Clean Code and A Philosophy of Software Design, but without prompting they won't know which way you prefer things, so they'll synthesize something more-less in between these two near-complete opposites, mostly depending on the language they're writing code in.
The way I think about it is: "You are a staff software engineer with 15 years of experience in <tech stack used in the project>" is doing 80% of the job, by pulling in specific regions in the latent space associated with good software engineering. But the more particular you are about style, or the more your project deviates from what's the most popular practice across any dimension (whether code style or folder naming scheme or whatnot), the more you need to describe those deviations in your prompt - otherwise you'll be fighting the model. And then, it's helpful to describe any project-specific knowledge such as which tools you're using (VCS, testing framework, etc.), where the files are located, etc. so the model doesn't have to waste tokens discovering it on its own.
Prompts are about latent space management. You need to strengthen associations you want, and suppress the ones you don't. It can get wordy at times, for the same reason explaining some complex thought to another person often takes a lot of words. First sentence may do 90% of the job, but the remaining 20 sentences are needed to narrow down on a specific idea.
by TeMPOraL
1/21/2026 at 2:40:16 PM
Maybe my initial message was overly harsh, I mostly agree with your points here. I think maybe the point of disagreement is exactly _how much_ extra prompt is necessary to approach 100% of the job, but this is quite hard to measure (obviously). Your point about latent space management is a good mental model to have IMO.by cobolexpert
1/21/2026 at 11:21:22 AM
I think avoiding filling context up with too much pattern information, is partially where agent skills are coming from, with the idea there being that each skill has a set of triggers, and the main body of the skill is only loaded into context, if that trigger is hit.You could still overload with too many skills but it helps at least.
by raesene9
1/21/2026 at 10:15:41 AM
> I suspect these frameworks/patterns just fill up the context with unecessary junk.That's exactly the point. Agents have their own context.
Thus, you try to leverage them by combining ad-hoc instructions for repetitive tasks (such as reviewing code or running a test checklist) and not polluting your conversation/context.
by epolanski
1/21/2026 at 10:35:40 AM
Ah do you mean sub-agents? I do understand that if I summon a sub-agent and give it e.g. code reviewing instructions, it will not fill up the context of the main conversation. But my point is that giving the sub-agent the instruction "review this code as if you were a staff engineer" (literally those words) should cover most use cases (but I can't prove this, unfortunately).by cobolexpert
1/21/2026 at 11:50:38 AM
I do think you're right that you should be cautious about writing too convoluted sub-agents.I'd rather use more of them that are brief and specialized, than try to over-correct on having a single agent try to "remember" too many rules. Not really because the description itself will eat too much context, but because having the sub-agent work for too long will accumulate too much context and dilute your initial instructions anyway.
by vidarh
1/21/2026 at 10:49:55 AM
If I don't instruct it to in some way, the agent will not write tests, will not conform with the linter standard, will not correctly figure out the command to run a subset of tests, etc.by Macha
1/21/2026 at 12:13:04 PM
I'm stuck with the Copilot tools. Again, I don't think this is a problem with the models but with the tooling. I can't switch to claude code (for work, that is) and while I don't mind using more command line tools I don't want to run multiple IDE's.But it's good to hear that it's not me being completely dumb, it's Copilot Agent Mode tooling that is?
by alkonaut
1/21/2026 at 10:55:22 AM
It's not that simple. That's how I started as well but now I have hooked up Gemini and GPT 5.2 to review code and plans and then to do consensus on design questions.And then there's Ralph with cross LLM consensus in a loop. It's great.
by _zoltan_
1/21/2026 at 9:52:28 AM
I used to do it the way you were doing it. A friend went to a hackathon and everyone was using Cursor and insisted that I try it. It lets you set project level "rules" that are basically prompts for how you want things done. It has access to your entire repo. You tell the agent what you want to do, and it does it, and allows you to review it. It's that simple; although, you can take it much further if you want or need to. For me, this is a massive leap forward on its own. I'm still getting up to speed with reproducible prompt patterns like TFA mentions, but it's okay to work incrementally towards better results.by tmountain
1/21/2026 at 10:44:35 AM
I'm doing the same. My reason is not the IDE, I just can't let AI agent software onto my machine. I have no trust at all in it and the companies who make this software. I neither trust them in terms of file integrity nor for keeping secrets secret, and I do have to keep secrets like API keys on my file system.Am I right in assuming that the people who use AI agent software use them in confined environments like VMs with tight version control?
Then it makes sense but the setup is not worth the hassle for me.
by jonathanstrange
1/21/2026 at 10:57:15 AM
I recently pasted an error I found into claude code and asked who broke this. It found the commit and also found that someone else had fixed it in their branch.You should use claude code.
by ramraj07
1/21/2026 at 2:52:25 PM
If your org has a relationship with MS/OpenAI (many do!) you can also use OpenCode with GPT-5.2 for some pretty impressive results.Once you see what is currently possible with this technique you will understand that programming as a field is doomed, or at the very least it's becoming something almost unrecognizable.
by JeremyNT
1/21/2026 at 12:53:36 PM
You don’t even need to paste. Connect it to your IDE (which should be as easy as installing the Claude plugin in your IDE and typing `ide` in `code`) and it’ll automatically pull in whatever you have selected.by ctmnt
1/21/2026 at 11:02:00 AM
There's no reason this should not be possible in other IDEs, except for the vendor lock-in.by bojan
1/21/2026 at 9:30:06 AM
The idea is to produce such articles, not read them. Do not even read them as the agent is spitting them out - simply feed straight into another agent to verify.by dude250711
1/21/2026 at 10:19:58 AM
Present it at the next team/management meeting to seem in the loop and hope nobody asks any questionsby 63stack
1/21/2026 at 10:58:10 AM
No questions. It will be pasted into their AI tool. And things will be great. For few weeks at least until something break a nobody will know whatby chrz
1/21/2026 at 10:44:48 AM
I also sympathize with that approach, and found it sometimes better than agents. I believe some of the agentic IDEs are missing a "contained mode".Let me select lines in my code which you are allowed to edit in this prompt and nothing else, for these "add a function that does x" without starting to run amok
by breppp
1/21/2026 at 12:08:41 PM
Yes. And some way of using an instructions file. Because interacting with an agent in a tiny plugin window without use of "agents.md" or some sort of persistent prompt you can adjust retry etc is horrible.Now it's "please add one unit test for Foobar()" and it goes away and thinks for 2 minues and does nothing then I point it to where the FooBar() which it didn't find and then adds a test method then I change the name to one I like better but now the AI change wasn't "accepted"(?) so the thing is borked...
I think the UX for agents is important and ...this can't be it.
by alkonaut
1/21/2026 at 9:28:30 AM
I don't think so it seems the aspiration of these tools is it'll be agents all the way down.A high level task is given and outpops a working solution.
A) If you can't program and you're just happy to have something working you're safe.
B) If you're an experienced programmer and can specify the structure of the solution you're safe.
In between, is where it seems people will struggle. How do you get from A to B.
by rustyhancock
1/21/2026 at 10:57:25 AM
You just didn't drink enough cool-aid and have intact brain.by wiseowise
1/21/2026 at 10:19:58 AM
I am on the other side, I have given the complete control of my computer to Claude Code - Yolo Mode. Sudo. It just works. My servers run the same. I SSH into Claude Code there and let them do whatever work they need to do.So my 2 cents. Use Claude Code. In Yolo mode. Use it. Learn with it.
Whenever I post something like this I get a lot of downvots. But well ... end of 2026 we will not use computer the way we use them now. Claude Code Feb 2025 was the first step, now Jan 2026 CoWork (Claude Code for everyone else) is here. It is just a much much more powerful way to use computers.
by franze
1/21/2026 at 12:02:22 PM
> end of 2026 we will not use computer the way we use them now.I think it will take much longer than that for most people, but I disagree with the timeline, not where we're headed.
I have a project now where the entirety of the project fall into these categories:
- A small server that is geared towards making it easy to navigate the reports the agents produce. This server is 100% written by Claude Code - I have not even looked at it, nor do I have any interest in looking at it as it's throwaway.
- Agent definitions.
- Scripts written by the agents for the agents, to automate away the parts where we (well, the agents mostly) have found a part of the task is mechanical enough to either take Claude out of the loop entirely, or produce a script that does the mechanical part interspersed with claude --print for smaller subtasks (and then systematically try to see if sonnet or haiku can handle the tasks). Eventually I may get to a point of starting to optimise it to use API's for smaller, faster models where they can handle the tasks well enough.
The goal is for an increasing proportion of the project to migrate from the second part (agent definitions) to the third part, and we do that in "production" workflows (these aren't user facing per se, but third parties do see the outputs).
That is, I started with a totally manual task I was carrying out anyway, defined agents to take over part of the process and produce intermediate reports, had it write the UI that lets me monitor the agents progress, then progressively I'd ask the agent after each step to turn any manual intervention into agents, commands, and skills, and to write tools to handle the mechanical functions we identified.
For each iteration, more stuff first went into the agent definitions, and then as I had less manual work to do, some of that time has gone into talking to the agent about which sub-tasks we can turn into scripts.
I see myself doing this more and more, and often "claude" is now the very first command I run when I start a new project whether it is code related or not.
by vidarh
1/22/2026 at 10:01:26 AM
Depending on your threat model, I'd lean more into building permanent tooling that's not dependent on an external AI API provider.The more you can offload to deterministic tools (script), the easier it will be to move to local LLMs when the AI bubble bursts =)
by theshrike79
1/21/2026 at 11:11:05 AM
Claude Code and agents are the hot new hammer, and they are cool, I use CC and like it for many things, but currently they suffer from the "hot new hammer" hype so people tend to think everything is a nail the LLM can handle. But you still need a screwdriver for screws, even if you can hammer them in.by darkwater
1/21/2026 at 9:45:06 PM
That "hot new hammer" hype is a good thing given general enough tool, and LLMs very much are that. We did the same with smartphones, the Internet, personal computers, and even electricity.Some 150 years ago, humanity collectively decided to try and redo everything but with electricity. In some cases, it was a clear success - e.g. lights. It enabled further progress - see e.g. computers, MRI machines, etc. In other cases, it was a failure - see e.g. cars, which still rely on ICEs despite electric cars being first, because until recently batteries just were not there. And then, in many cases the adoption was partial - see e.g. power tools, which are usually electrical, but in professional / industrial use, there's lots of hydraulic/pressurized air powered variants.
All the above took people trying things out, "throwing shit at the wall to see what sticks". We're at this stage with LLMs now.
by TeMPOraL
1/21/2026 at 10:32:52 AM
Don't say "we" when talking about yourself.by jangxx
1/21/2026 at 10:42:35 AM
I already do.And yes, it is a hypothesis about the future. Claude Code was just a first step. It will happen to the rest of computer use as well.
by franze
1/21/2026 at 12:13:41 PM
[dead]by njhnjh
1/21/2026 at 10:32:05 AM
Copilot's agent mode is a disaster. Use better tools: try Claude Code or OpenCode (my favorite).It's a new ecosystem with its own (atrocious!) jargon that you need to learn. The good news is that it's not hard to do so. It's not as complex or revolutionary as everyone makes it look like. Everything boils down to techniques and frameworks of collecting context/prompt before handing it over to the model.
by photios
1/21/2026 at 11:08:15 AM
Yep, basically this. In the end it helps having the mental model that (almost) everything related to agents is just a way to send the upstream LLM a better and more specific context for the task you need to solve at that specific time. i.e Claude Code "skills" are simply a markdown file in a subdirectory with a specific name that translates to a `/SKILL_NAME` command in Claude and a prompt that is injected each time that skill is mentioned or Claude thinks it needs to use, so it doesn't forget the specific way you want to handle that specific task.by darkwater
1/22/2026 at 10:05:15 AM
Skills can (and IMO should) also contain scripts custom made for that skill.Like a code review skill would have scripts that read the actual code.
by theshrike79
1/22/2026 at 1:57:50 PM
Thanks for the correction, TIL.by darkwater
1/21/2026 at 12:09:59 PM
Sadly we have some partnership meaning it's Copilot or nothing.by alkonaut
1/24/2026 at 11:00:30 PM
Give Copilot CLI a try if you haven't in a while! The team's been working really hard to improve the harness, and we're taking as much community feedback as we can get! Let me know if you run into any problems :)by ryanhecht
1/21/2026 at 5:40:49 PM
I feel your pain. I used to work for a bank and the security team only approved Copilot use.OpenCode can use Copilot natively: https://opencode.ai/docs/providers/#github-copilot
I got Claude Code running with Copilot APIs via the LiteLLM proxy, but it was a pain in the butt. Just use OpenCode.
by photios
1/21/2026 at 2:54:10 PM
If you can use the Copilot CLI, it's highly likely you can use OpenCode with the same API key. It's worth doing a little research.The CLI tool matters. If you're not using opencode/claude you're missing out. But the latest OpenAI models are really quite good.
by JeremyNT
1/24/2026 at 11:03:02 PM
The Copilot CLI team has been making great strides towards improving our agentic harness! I'm curious, what have you found are the biggest shortcomings with it these days?by ryanhecht
1/22/2026 at 6:04:39 AM
I’m not as behind as that. But i cant figure out this loop thing. We have engineers here saying they are reviewing 100k lines of code a day, slinging 10 agents simultaneously. I just cannot figure out how that is humanly possible.Agentic coding has come a long way though. What you are describing sounds like a trust issue more than a skill issue. Some git scumming should fix that. Maybe what I’m going through is also a trust issue.
by kaycey2022