Why Your "AI-First" Strategy Is Probably Wrong

4/14/2026 at 1:28:15 AM

> Last Tuesday, we shipped a new feature at 10 AM, A/B tested it by noon, and killed it by 3 PM because the data said no.

I've worked with Amazon Weblab and even Amazon doesn't have the kind of traffic you'd need to make statistically significant conclusions on that time frame (with a few very obvious exceptions, like the whole site being down/unusable/etc). People need to calm down with their "hustle porn", unless they don't mind their credibility being in the toilet.

by argee

4/14/2026 at 4:57:46 AM

Hey, maybe the new feature has 100% lift and so their N is 32

by kokken

4/14/2026 at 12:23:41 AM

Couldn’t agree more with this sentiment. Bootstrapping leaning AI-first means I can iterate a few arch designs and actually test it, stress it, and rip it out if needed, within a few hours.

The technical debt problem is real, but as long as after a large session across 10-12 repos over a couple days, I can do a sweep for loose ends and kill dead code that we had for an old implementation. It’s less about building a piece, and more like building version 1 of a feature and then building version 2 a week later instead of 6 months later.

by fathermarz

4/14/2026 at 3:56:34 AM

Ugh. It’s a company selling AI agents claiming that an AI-first strategy without agents is wrong.

Let me say the quiet part out loud:

The only reason why directly pushing AI code to production works for you is because nobody actually relies on your product. If there was any kind of accountability towards your customers, you’d probably insist on a slower, more careful approach. Or else you’ll go bankrupt paying for avoidable SLA violations. “Move fast and break things” only works if you can dodge paying for what you broke.

by fxtentacle

4/13/2026 at 11:25:57 PM

The idea of their automated rollback infrastructure sounds good on paper, but at the end of the day, this still reads like a highly sophisticated machine for generating technical debt at lightspeed, mitigated only by an aggressive rollback system. You can't have an AI review code written by an AI and call it a security gate. A true security gate requires a human being who actually understands the context and who is actually accountable if the system breaks.

by distalx

4/14/2026 at 1:16:32 AM

No human has ever let through a major security vulnerability… right? Right?

by woeirua

4/14/2026 at 3:33:02 AM

X-free url: https://creao.ai/blog/we-built-an-agent-platform.-then-the-a...

by ChrisArchitect

4/14/2026 at 12:12:19 AM

This is a very information-dense post. Took some time to read it in detail. Here's my thoughts.

> OpenAI published a concept in February 2026 that captured what we'd been doing. They called it harness engineering: the primary job of an engineering team is no longer writing code. It is enabling agents to do useful work. When something fails, the fix is never "try harder." The fix is: what capability is missing, and how do we make it legible and enforceable for the agent?

This is what I've at least suspected for a while from working on my personal projects. Thanks for laying it out in clear terms.

> A production system needs to be stable, reliable, and secure. You need a system that can guarantee those properties when AI writes the code. You build the system. The prompts are disposable.

I agree, and the implication is that the primary bottleneck in any engineering project today is actually AI workflow design more than anything else, even working on the project itself. Because having the right AI workflow/scaffold/process lets you 10x the productivity of everything else on the project while keeping things production-ready, and keeping things production-ready is really hard.

> The Product Management Bottleneck

> The QA Bottleneck

So now, not only do devs become software architects that design dev processes and high-level direction more than doing the development themselves, PM and QA also need to become PM/QA architects that design PM/QA processes and product direction to stay relevant. lol.

> The Headcount Bottleneck

I think it's still an unsolved problem whether AI will reduce cooperation bottleneck between people (through new cooperation technologies like knowledge consolidation, and AI-driven performance measurement which is harder to game) or increase the bottleneck (through deep individual knowledge becoming more important since everyone is an architect). I'd guess the latter for the short term and possibly the former for the long term.

> I had to unify all the code into a single monorepo. One reason: so AI could see everything.

I wonder whether it's better for Git history cleanliness purposes to do one of the following instead:

- Use a "hub" monorepo that uses Git submodules to link to all the other repos in the project. The hub repo contains documentation and AI agent configurations, but the individual project files stay in their respective repos.

- Use an agent harness system that natively wraps over multiple repositories. (More precisely, it would make a temp folder and put the worktrees of multiple repos in that folder. Perhaps it can unpack some documentation and AI agent configs in the root too, with the root repository simply gitignore-ing the individual repo folders instead of using submodules.)

> Every pull request triggers three parallel AI review passes using Claude Opus 4.6: Pass 1: Code quality. Logic errors, performance issues, maintainability. Pass 2: Security. Vulnerability scanning, authentication boundary checks, injection risks. Pass 3: Dependency scan. Supply chain risks, version conflicts, license issues.

I agree that automated PR review with AI agents is very important. Good list of topics, I think this will help with my own implementation.

> One hour later, the triage engine runs. It clusters production errors from CloudWatch and Sentry, scores each cluster across nine severity dimensions, and auto-generates investigation tickets in Linear. Each ticket includes sample logs, affected users, affected endpoints, and suggested investigation paths.

This is cool, advanced stuff. Though I kind of think that instead of Linear, we need an AI-centric ticketing system designed from the ground up to make it easier for AIs to handle the tickets and for the humans to monitor said AIs. I've used some AI coding kanban board tools and found them to be very helpful (compared to using a separate Forgejo kanban board + AI agent), and maybe a more general AI-powered ticket management tool would be the next step.

> Each tool handles one phase. No tool tries to do everything.

I think the key is to have separate agents handling each phase. They could all be in the same tool. I agree that having one AI agent handle the entire thing isn't going to be enough for the kind of reliability one is looking for here.

> Graphite's merge queue rebases, re-runs CI, merges if green.

This is a tool I hadn't heard of before and the merge queue seems like a very useful concept. I wonder if it handles automatically resolving trivial rebase conflicts with AI. The stacked PR feature sounds pretty good too.

> People assume we're trading quality for speed. User engagement went up. Payment conversion went up. We produce better results than before, because the feedback loops are tighter. You learn more when you ship daily than when you ship monthly.

Obviously lofty claims but intuitively I think this is possible. AI output isn't perfect but current engineering teams are far from perfect either. And I think AI is more amenable to process design than people are, simply because you can change the AI prompt instantly (and perhaps even AB-test it with LLMs as judge?) but people need time to train for a new process.

> At CREAO, we pushed AI-native operations into every function: Product release notes: AI-generated from changelogs and feature descriptions. Feature intro videos: AI-generated motion graphics. Daily posts on socials: AI-orchestrated and auto-published. Health reports and analytics summaries: AI-generated from CloudWatch and production databases.

Using AI for public-facing announcements is a bit of a minefield to be honest. I think it's valuable to have knowledgeable humans do most of this. But maybe AI can be acceptable if you clearly label that it's AI and you genuinely don't have the human bandwidth to do it anymore.

> I believe one-person companies will become common. If one architect with agents can do the work of 100 people, many companies won't need a second employee.

Oh boy.

> the CTO working 18-hour days

This is actually the least believable part of this post to me. I'd somewhat believe if you said 14-16 hours, but working a 18 hour day seems like a straight up bad idea. Even assuming you value absolutely nothing else in life other than work, you'd get more done in 14-16 hrs w/ more leisure and sleep than in 18 without it.

by 2001zhaozhao

4/13/2026 at 11:54:14 PM

Complete nonsense.

For example:

> Our old architecture was scattered across multiple independent systems. A single change might require touching three or four repositories. From a human engineer's perspective, it is manageable. From an AI agent's perspective, opaque. The agent can't see the full picture. It can't reason about cross-service implications. It can't run integration tests locally.

Your agents can't connect the dots across three or four repositories? Even I, at home, have a set up where an agent is perfectly aware of what is in what repositories and access them as needed, and I'm just an amateur.

by ekjhgkejhgk

4/14/2026 at 1:16:58 AM

It’s way easier with a mono repo.

by woeirua