Decisions that eroded trust in Azure – by a former Azure Core engineer

4/3/2026 at 5:29:17 AM

I think this is especially problematic (from Part 4 at https://isolveproblems.substack.com/p/how-microsoft-vaporize...):

"The team had reached a point where it was too risky to make any code refactoring or engineering improvements. I submitted several bug fixes and refactoring, notably using smart pointers, but they were rejected for fear of breaking something."

Once you reach this stage, the only escape is to first cover everything with tests and then meticulously fix bugs, without shipping any new features. This can take a long time, and cannot happen without the full support from the management who do not fully understand the problem nor are incentivized to understand it.

by branko_d

4/3/2026 at 6:05:54 AM

This isn't incentivized in corporate environment.

Noticed how "the talent left after the launch" is mentioned in the article? Same problem. You don't get rewarded for cleaning up mess (despite lip service from management) nor for maintaining the product after the launch. Only big launches matter.

The other corporate problem is that it takes time before the cleanup produces measurable benefits and you may as well get reorged before this happens.

by praptak

4/3/2026 at 6:26:12 AM

This is the root of the issue. For something like Azure, people are nor fungible. You need to retain them for decades, and carefully grow the team, training new members over a long period until they can take on serious responsibilities.

But employees are rewarded for showing quick wins and changing jobs rapidly, and employers are rewarded for getting rid of high earners (i.e. senior, long-term employees).

by InsideOutSanta

4/3/2026 at 6:58:39 AM

> For something like Azure, people are nor fungible

What I've learned from a decade in the industry is that talent is never fungible in low-demand areas. It's surprisingly hard to find people that "get it" and produce something worthwhile together.

by delusional

4/3/2026 at 8:47:23 AM

I would say "systems design" rather than low-demand.

People who can "reduce" a big system to build on a few simple concepts are few and far between. Most people just add more stuff instead.

by silvestrov

4/3/2026 at 10:27:33 AM

I think those people are around, they are just not rewarded by this kind of system. They can propose plans and fixes, they just don't get implemented.

by aeonik

4/3/2026 at 10:59:19 AM

“Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better.” - Edsger Wybe Dijkstra

by srirangr

4/3/2026 at 3:31:00 PM

When things become too complicated, no one dares to make new systems. And if you don’t make new systems ofc you have to learn system design the other way around — by fixing every bug of existing systems.

by markus_zhang

4/3/2026 at 6:08:28 PM

Simple ain’t Easy

- Rich Hickey

by jimbokun

4/3/2026 at 3:04:16 PM

[dead]

by Whyachi

4/3/2026 at 11:46:50 AM

There are often retention problems with lean budgets, and after training staff they often do just leave for a more lucrative position.

Loyalty will often not be rewarded, as most have seen companies purge decade long senior staff a year before going public.

It is very easy to become cynical about the mythology of silicon valley. =3

by Joel_Mckay

4/3/2026 at 8:29:06 AM

What is a low-demand area?

by auggierose

4/3/2026 at 9:00:00 AM

A geographic area where there's not abundant opportunity for software developers. Usually everywhere outside the major metro areas. It was primarily meant to discount experiences from SF or Seattle where I'm sure finding talent is easy enough, assuming you are willing to pay.

by delusional

4/3/2026 at 12:14:55 PM

I thought of this not as geographic but in terms of what’s sexy vs not. Low Demand = not

by grvdrm

4/3/2026 at 5:31:09 PM

Right, like running a sanitation department for a city. Who wants to do that? No one, but it's pretty important and everyone will raise hell and almost riot when it's not working.

by chasd00

4/3/2026 at 9:26:34 PM

Totally. I’m in insurance. So much is unsexy but critical. And that’s where you see a lot of folks churning on core systems, process, etc that makes insurance actually work vs any headline tech/investment/AI stuff. Don’t get me wrong - wins there too. But 22 year old Harvard grads aren’t going for underwriting assistant jobs (to use an example)

by grvdrm

4/3/2026 at 3:28:41 PM

This is a human problem. We humans praise the doctors that can put the patients with terminal illnesses alive for extended periods, but ignore those who tell us the principles to prevent getting those illnesses in the first place. We throw flowers and money to doctors who treat cancers, but do we do the same to the ones who tell us principles to avoid cancers? No.

The same for MSFT or any other similar problem. Humans only care when the house is on fire — in the modern Capitalism it means the stock goes down 50%, and then they will have the will to make changes.

That’s also why reforms rarely succeeded, and the ones that succeeded usually follows a huge shitstorm when people begged for changes.

by markus_zhang

4/3/2026 at 4:51:32 PM

> Humans only care when the house is on fire

In corporate context it's because that's, in theory, an effective use of resources:

If 20 teams are constantly "there is a huge risk of fire", a lot of mental energy is wasted figuring out how to stack rank those 20 and how real of a fire risk there is. If instead you wait when there is a real fire, you can get the 15 teams actually fixing that one.

In practice, you've probably noticed that the most politics-playing & winning teams are the teams which are really effective at :

1) faking fires

2) exaggerating minor fires

3) moving fast & breaking things on purpose (or at least as a nice side effect) to create more fires in their area of ownership* , and get rewarded with more visibility & headcount to fix those fires.

* As long as they have firm grip of that area... If they don't, they risk having it re-orged to another team.

by 72f988bf

4/5/2026 at 3:52:45 AM

>If instead you wait when there is a real fire, you can get the 15 teams actually fixing that one.

In this case, with Microsoft's really amazing revenue stream, a charismatic management team can distort reality for quite some time and convince the right people within the company that there is no fire.

by replyifuagree

4/3/2026 at 6:30:18 PM

Yeah the more "honest" side at least tried to fix it after the fire. The demagogue ones like to fake fire and move fast.

by markus_zhang

4/4/2026 at 1:22:03 PM

This is a capitalism problem.

If you treat people well and give them the means to survive without trying to wring every red cent you can out of them, they'll be more likely to stick around and keep providing value.

by estimator7292

4/3/2026 at 3:12:44 PM

[dead]

by salemh

4/3/2026 at 4:45:11 PM

> You don't get rewarded for cleaning up mess (despite lip service from management) nor for maintaining the product after the launch

I have never worked at a shop or on a codebase where "move fast & break things, then fix it later" ever got to the "fix it later" party. I've worked at large orgs with large old codebases where the % of effort needed for BAU / KTLO slowly climbs to 100%. Usually some combination of tech debt accumulation, staffing reduction, and scale/scope increases pushing the existing system to its limits.

This is related to a worry I have about AI. I hear a lot of expectations that we're just going to increase code velocity 5x from people that have never maintained a product before.

So moving faster & breaking more things (accumulating more tech debt) will probably have more rapid catastrophic outcomes for products in this new phase. Then we will have some sort of butlerian jihad or agile v2.

by steveBK123

4/3/2026 at 5:03:51 PM

People are still trying to figure out how to use AI. Right now the meme is it's used by juniors to churn out slop, but I think people will start to recognize it's far more powerful in the hands of competent senior devs.

It actually surprised me that you can use AI to write even better code: tell it to write a test to catch the suspected bug, then tell it to fix the bug, then have it write documentation. Maybe also split out related functionality into a new file while you're at it.

I might have skipped all that pre-AI, but now all that takes 15 minutes. And the bonus creating more understandable code allows AI to fix even more bugs. So it could actually become a virtuous cycle of using AI to clean up debt to understand more code.

In fact, right now, we're selling technical debt cleanup projects that I've been begging for for years as "we have to do this so the codebase will be more understandable by AI."

by asdfman123

4/4/2026 at 1:23:00 PM

Having worked on many long-lived projects for 5+ years at big firms, I think theres an aspect of project management being a dark art which will conflict with the hopes & dreams of AI.

Developer productivity is notoriously difficult to measure. Even feature velocity, cadence or volume improvements are rarely noticed & acknowledged by users for long. They will always complain about speed and somehow notice slowdowns (and invent them in their head as well).

I once joined a team that was in crises, they couldn’t ship for 6 months due to outages. We stabilized production, put in tests, introduced better SDLC, and started shipping every 1-2 weeks. I swear to you that it was not more than a few months before stakeholders were whinging about velocity again. You JUST had zero, give me a break.

If you get a 3x one-off boost by adopting AI and then that’s the new normal, you’ll be shocked how little they pat you on the back for it. Particularly if some of that 3x is spent on tickets to “make the code easier for AI to understand”, testing, and low priority tickets in the backlog no one had bothered doing previously (seen a lot of these anecdotes). And god help you if your velocity slips after that 3x boost, they will notice the hell out of that.

by steveBK123

4/8/2026 at 8:50:50 AM

Problem is that if you want to be a serious cloud provider, you have to do exactly that. I slowly move my apps off of any Microsoft services, because they tend to be slow and buggy.

Also they too often remove features of their products and I have no desire to migrate working stuff because MS wants to move people to other products.

And these tend to be worse in recent times. Exemplary for that is PowerAutomate for me. Theoretically a neat tool that is well integrated into the cloud landscape. Practically you cannot implement reliable workflows with it because of numerous reasons.

> If you’re running production workloads on Azure or relying on it for mission-critical systems, this story matters more than you think.

Well, it doesn't explode, but I really question how reliable some of these systems really are. In my experience, not at all. There was or is some genuinely good engineering below some of these systems, but I think all the buggy fluff build upon it really introduces friction.

by raxxorraxor

4/6/2026 at 6:42:00 AM

Perhaps an important question is: why is it not incentivized in corporate environments?

I think, however, that perhaps I'm asking in the wrong arena. Unless there are people here reading this who work in the areas of a corporate environment at the level at which those decisions are made, it would really amount to guessing and stereotypes. Generally, I like to think that just about anyone can grasp that a well-made product will sell better due to its nature. I think that there must be some kind of mutual disconnect between both sides where one continues to see improvements important, and the other fundamentally does not (or does not have a functional means to measure and verify it).

by registeredcorn

4/3/2026 at 7:24:18 PM

Meanwhile, failure to clean up this particular mess was a key factor in losing a trillion dollars in market cap, according to the author.

by jimbokun

4/4/2026 at 2:34:37 PM

It’s also a customer problem.

In a product where a customer has to apply (or be aware of updates), it’s easier to excite them about new features instead of bug fixes.

Especially for winning over new customers.

If the changelog for a product’s last 5 releases are only bug fixes (or worse “refactoring” that isn’t externally visible), most will assume either development is dead or the product is horribly bug ridden - a bad look either way.

by BobbyTables2

4/3/2026 at 6:48:58 AM

Its a cool talent filter though, if you higher people the set of people that quit on doomed projects and how fast they quit is a real great indicator of technological evaluation skills.

by cineticdaffodil

4/3/2026 at 8:27:52 PM

> This isn't incentivized in corporate environment.

Course it is. But only by the winners who reward the employees who do the valuable work. Microsoft has all sorts of stupid reasons why they have lots of customers - all basically proxies for their customers' IT staff being used to administrating Microsoft-based systems - but if they mess up the core reasons to use a cloud enough they will fail.

by philipallstar

4/3/2026 at 4:53:10 PM

You do but you then make a career out of it : you become the fixer ( and it can be a very good career , either technical or managerial)

by Agingcoder

4/3/2026 at 8:30:30 AM

No joke, I worked at a place where in our copy of system headers we had to #define near and far to nothing. That was because (despite not having supported any systems where this was applicable for more than a decade) there was a set of files that were considered too risky to make changes in that still had dos style near and far pointers that we had to compile for a more sane linear address space. https://www.geeksforgeeks.org/c/what-are-near-far-and-huge-p...

Now, I'm just a simple country engineer, but a sane take on risk management probably doesn't prefer de facto editing files by hijacking keywords with template magic compared with, you know just making the actual change, reviewing it, and checking it in.

by monocasa

4/3/2026 at 5:51:27 AM

Once you reach this stage, the only escape is to jump ship. Either mentally or, ideally, truly.

You're in an unwinnable position. Don't take the brunt for management's mistakes. Don't try to fix what you have no agency over.

by gherkinnn

4/3/2026 at 6:37:44 AM

unfortunately, what you will find is that unless you get lucky, the next ship is more of the same.

The system/management style is ingrained in corporate culture of large-ish companies (i would say if it has more than 2 layers of management from you to someone owning the equity of the business and calling the shots, it's "large").

It stems from the fact that when an executive is bestowed the responsibility of managing a company from the shareholders, the responsibility is diluted, and the agent-principle problem rears their ugly head. When several more layers of this starts growing in a large company, the divergence and the path of least resistance is to have zero trust in the "subordinates", lest they make a choice that is contrary to what their managers want.

The only way to make good software is to have a small, nimble organization, where the craftsman (doing the work) makes the call, gets the rewards, and suffers the consequences (if any). That aligns the agent-principle together.

by chii

4/3/2026 at 6:55:05 AM

Hierachy is the enemy of succeding projects and information flow. The more important and complex hierarchy in a culture the less likely it is to have a working software industry. Germanys and japanese endless :"old vs young, seniority vs new, internal vs external, company wide management vs project local management come to mind. Its guerilla vs army, startup vs company allover..

by cineticdaffodil

4/3/2026 at 12:25:23 PM

As someone on DACH space, the internal/external goes to the extreme of not being allowed any company infrastructure used by the internals, including some basic stuff like the coffee machine, or canteen.

I had team lunches that only happened, because naturally the team couldn't care less about the regulations, and found workarounds, like meeting by "chance" on the same place, and apparently there were no other set of tables available.

by pjmlp

4/4/2026 at 2:59:07 AM

> I would say if it has more than 2 layers of management from you to someone owning the equity of the business and calling the shots, it's "large"

By that metric, my 50 employee company is "large".

by bigstrat2003

4/4/2026 at 6:58:11 AM

well, does this company have more than 2 layers of management? Why do you need that much for only 50 people, instead of enpowering those people to make choices (after training and providing guidance on what makes for a good choice in various circumstances)?

by chii

4/3/2026 at 1:09:10 PM

I was once in such a position. I persuaded management to first cover the entire project with extensive test suite before touching anything. It took us around 3 months to have "good" coverage and then we started refactor of parts that were 100% covered. 5 months in the shareholders got impatient and demanded "results". We were not ready yet and in their mind we were doing nothing. No amount of explanation helped and they thought we are just adding superficial work ("the project worked before and we were shipping new features! Maybe you are just not skilled enough?") Eventually they decided to scrap whole thing. Project was killed and entire team sacked.

by varispeed

4/3/2026 at 6:28:23 PM

I’m a developer and if a team spent five months only refactoring with zero features added I would fire you too.

Refactoring and quality improvements must happen incrementally and in parallel with shipping new features and fixing bugs.

by jimbokun

4/4/2026 at 2:35:47 AM

I'm a director and one of our teams just spent 8 months doing just that and it was totally justified. They're finally coming up for air and the foundation is significantly improved.

There's nuance here. Every project/team/org is different.

by bmurphy1976

4/4/2026 at 6:52:46 AM

Welcome to Microsoft! Enjoy the ever-growing backlog of bugs to fix!

by eviks

4/3/2026 at 11:21:21 AM

> first cover everything with tests

Beware this goal. I'm dealing with the consequences of TDD taken way too far right now. Someone apparently had this same idea.

> management who do not fully understand the problem nor are incentivized to understand it

They are definitely incentivized to understand the problem. However the developers often take it upon themselves to deceive management. This happens to be their incentive. The longer they can hoodwink leadership, the longer they can pad their resume and otherwise play around in corporate Narnia.

It's amazing how far you can bullshit leaders under the pretense of how proper and cultured things like TDD are. There are compelling metrics and it has a very number-go-up feel to it. It's really easy to pervert all other aspects of the design such that they serve at the altar of TDD.

Integration testing is the only testing that matters to the customer. No one cares if your user service works flawlessly with fake everything being plugged into it. I've never seen it not come off like someone playing sim city or factorio with the codebase in the end.

by bob1029

4/3/2026 at 4:19:48 PM

Customers don’t care about your testing at all. They care that the product works.

Like most things, the reality is that you need a balance. Integration tests are great for validating complex system interdependencies. They are terrible for testing code paths exhaustively. You need both integration and unit testing to properly evaluate the product. You also need monitoring, because your testing environment will never 100% match what your customers see. (If it does, you’re system is probably trivial, and you don’t need those integration tests anyway.)

by dpark

4/3/2026 at 5:55:11 PM

Integration tests (I think we call them scenario tests in our circles) also only tend to test the happy paths. There is no guarantees that your edge cases and anything unusual such an errors from other tiers are covered. In fact the scenario tests may just be testing mostly the same things as the unit tests but from a different angle. The only way to be sure everything is covered is through fault injection, and/or single-stepping but it’s a lost art. Relying only on automated tests gives a false sense of security.

by axelriet

4/3/2026 at 2:52:38 PM

Unit tests are just as important as integration tests as long as they're tightly scoped to business logic and aren't written just to improve coverage. Anything can be done badly, especially if it is quantified and used as a metric of success (Goodhart's law applies).

Integration tests can be just as bad in this regard. They can be flakey and take hours, give you a false sense of security and not even address the complexity of the business domain.

I've seen people argue against unit tests because they force you to decompose your system into discrete pieces. I hope that's not the core concern here becuase a well decomposed system is easier to maintain and extend as well as write unit tests for.

by caoilte

4/3/2026 at 4:53:47 PM

The problem with unit tests these days is that AI writes them entirely and does a great job at it. That defeats the purpose of unit tests in the first place since the human doesnt have the patience to review the reams of over-mocked test-code produced by AI.

The end-result of this are things like the code leak of claude code presumably caused by ai generated ci/cd packaging code nobody bothered to review since the attitude is: who reviews test or ci/cd code ? If they break big-deal, ai will fix it.

by bwfan123

4/3/2026 at 3:38:16 PM

“Premature abstraction” forced by unit tests can make systems harder to maintain.

by senderista

4/3/2026 at 6:17:48 PM

It can but more often it’s the opposite.

Code that’s hard to write tests for tends to be code that’s too tightly coupled and lacking proper interface boundaries.

by jimbokun

4/3/2026 at 4:34:50 PM

the problem is people make units too small. A unit is not an isolated class or function. (It can be but usually isn't) a unit is one of those boxes you see on those architecture diagrams.

by bluGill

4/3/2026 at 4:15:53 PM

Inability to unit test is usually either a symptom of poor system structure (e.g. components are inappropriately coupled) or an attempt to shoehorn testing into the wrong spot.

If you find yourself trying to test a piece of code and it’s an unreasonable effort, try moving up a level. The “unit” you’re testing might be the wrong granularity. If you can’t test a level up, then it’s probably that your code is bad and you don’t have units. You have a blob.

by dpark

4/3/2026 at 9:17:44 PM

If you're writing the tests after writing the code, you're not doing TDD though.

by carols10cents

4/3/2026 at 8:36:18 AM

> Once you reach this stage, the only escape is to first cover everything with tests and then meticulously fix bugs

The exact same approach is recommended in the book "Working effectively with legacy code" by Michael Feathers, with several techniques on how to do it. He describes legacy code as 'code with no tests'.

by hikarudo

4/3/2026 at 4:08:16 PM

"Show me the incentives, and I will show you the outcomes" - Charlie Munger

I once worked in a shop where we had high and inflexible test coverage requirements. Developers eventually figured out that you could run a bunch of random scenarios and then `assert true` in the finally clause of the exception handler. Eventually you'd be guaranteed to cover enough to get by that gate.

Pushing back on that practice led to a management fight about feature velocity and externally publicized deadlines.

by coredog64

4/3/2026 at 3:34:37 PM

It is so hard to test those codebases too. A lot of the time there's IO and implicit state changes through the code. Even getting testing in place, let alone good testing, is often an incredibly difficult task. And no one will refactor the code to make testing easier because they're too afraid to break the code.

by staticassertion

4/3/2026 at 6:33:04 AM

> I submitted several bug fixes and refactoring, notably using smart pointers, but they were rejected for fear of breaking something.

And that, my friends, is why you want a memory safe language with as many static guarantees as possible checked automatically by the compiler.

by dbdr

4/3/2026 at 12:53:04 PM

Language choices won't save you here. The problem is organizational paralysis. Someone sees that the platform is unstable. They demand something be done to improve stability. The next management layer above them demands they reduce the number of changes made to improve stability.

by sidewndr46

4/3/2026 at 3:16:04 PM

Usually this results in approvals to approve the approval to approve making the change. Everyone signed off on a tower of tax forms about the change, no way it can fail now! It failed? We need another layer of approvals before changes can be made!

by teeray

4/3/2026 at 1:51:00 PM

Yeah I've seen that move pulled. Funnily enough by an ex-Microsoft manager.

by cogman10

4/3/2026 at 9:59:35 AM

Hence the rewrite-it-in-Rust initiative, presumably. Management were aware of this problem at some level but chose a questionable solution. I don't think rewriting everything in Rust is at all compatible with their feature timelines or severe shortages of systems programming talent.

by mike_hearn

4/3/2026 at 3:01:40 PM

In a rewrite you can smuggle in a quality lift

by cineticdaffodil

4/3/2026 at 1:40:25 PM

I had a memory management problem so I introduced GC/ref counting and now I have a non-deterministic memory management problem.

by CoolGuySteve

4/5/2026 at 6:54:35 AM

Ref counting is deterministic. Rust memory management is also deterministic: the memory is freed exactly when the owner of the data gets out of scope (and the borrow checker guarantees at compile time there is no use after that).

by dbdr

4/5/2026 at 2:19:35 PM

Cool now use the reference on another thread.

by CoolGuySteve

4/5/2026 at 10:21:32 PM

If you would use Rust, you would know that problem is solved too.

Rust solves a lot of problems, and introduces others

The promiscuous package management, chiefly. Not unusual for building a imlle programme in Rust brings in 200+ crates, from unknown authors on the Internet...

What could possibly go wrong?

by worik

4/3/2026 at 10:45:33 AM

They could have started with simple Valgrind sessions before moving to Rust though. Massive number of agents means microservices, and microservices are suitable for profiling/testing like that.

by bayindirh

4/3/2026 at 12:31:54 PM

Visual Studio has had quite some tooling similar to it, and you can have static analysis turned on all the time.

SAL also originated with XP SP2 issues.

Just like there have been toons of tools trying to fix C's flaws.

However the big issue with opt-in tooling is exactly it being optional, and apparently Microsoft doesn't enforce it internally as much as we thought .

by pjmlp

4/3/2026 at 12:38:40 PM

> However the big issue with opt-in tooling is exactly it being optional,

That's true, and that's a problem.

> and apparently Microsoft doesn't enforce it internally as much as we thought .

but this, in my eyes, is a much bigger problem. It's baffling considering what Microsoft does as their core business. Operating systems high impact software.

> Visual Studio has had quite some tooling similar to it, and you can have static analysis turned on all the time.

Eclipse CDT, which is not capable as VS, but is not a toy and has the same capability: Always on static analysis + Valgrind integration. I used both without any reservation and this habit paid in dividends in every level of development.

I believe in learning the tool and craft more than the tools itself, because you can always hold something wrong. Learning the capabilities and limits of whatever you're using is a force multiplier, and considering how fierce competition is in the companies, leaving that kind of force multiplier on the table is unfathomable from my PoV.

Every tool has limits and flaws. Understanding them and being disciplined enough to check your own work is indispensable. Even if you're using something which prevents a class of footguns.

by bayindirh

4/3/2026 at 3:36:18 PM

I think the core business of MSFT has always been — building a platform, grab everyone in and seek rent. Bill figured out from 1975 so it has been super successful.

OS was that platform but in Azure it is just the lowest layer, so maybe management just doesn’t see it, as long as the platform works and government contracts keep coming in. Then you have a bunch of yes-man engineers (I’m so surprised that any principle engineer, who should be financially free, could push out plans described by the author in this series) who gives the management false hopes.

by markus_zhang

4/3/2026 at 3:54:55 PM

One reason why Windows is a mess, is that Satya sees Azure as actually Azure OS, Windows version of OS/360.

Ideally everyone would be using it via services hosted there, with the browser or mobile devices as thin clients.

Just two months ago,

https://blogs.windows.com/windowsexperience/2026/02/26/annou...

by pjmlp

4/3/2026 at 6:10:09 PM

It’s org-dependent. On Windows, SAL and OACR are kings, plus any contraption MSR comes up with that they run on checked-in code and files bugs on you out of the blue :) Different standards.

by axelriet

4/3/2026 at 6:38:40 AM

I was waiting for that comment :) Remember that everybody, eventually, calls into code written in C.

by axelriet

4/3/2026 at 8:10:00 AM

If 90% of the code I run is in safe rust (including the part that's new and written by me, therefore most likely to introduce bugs) and 10% is in C or unsafe rust, are you saying that has no value?

Il meglio è l'inimico del bene. Le mieux est l'ennemi du bien. Perfect is the enemy of good.

by dbdr

4/3/2026 at 8:48:07 AM

That is an unexpected interpretation. Use the best tool for the job, also factoring what you (and your org) are comfortable with.

by axelriet

4/4/2026 at 11:47:17 AM

[flagged]

by RyujiYasukochi

4/3/2026 at 9:54:18 AM

Depends on which OS we are talking about.

I know a few where that doesn't hold, including some still being paid for in 2026.

by pjmlp

4/3/2026 at 8:28:40 AM

If you're sufficiently stubborn, it's certainly possible to call directly into code written in Verilog, held together with inscrutable Perl incantations.

High-level languages like C certainly have their place, but the space seems competitive these days. Who knows where the future will lead.

by tux3

4/3/2026 at 10:15:11 AM

If you want something extra spicy, there are devices out there that implement CORBA in silicon (or at least FPGA), exposing a remote object accessible using CORBA

by p_l

4/3/2026 at 8:45:55 AM

You didn’t miss the smiley, did you? :)

by axelriet

4/3/2026 at 12:41:31 PM

I didn't miss the smiley =)

by tux3

4/3/2026 at 2:58:59 PM

It’s worse than that. Eventually everybody calls into code that hits hardware. That is the level that the compiler (ironically?) can no longer make guarantees. Registers change outside the scope of the currently running program all the time. Reading a register can cause other registers on a chip to change. Random chips with access to a shared memory bus can modify the memory that the comipler deduced was static. There be dragons everywhere at the hardware layer and no compiler can ever reason correctly about all of them, because, guess what, rev2 of the hardware could swap a footprint compatible chip clone that has undocumented behavior that. So even if you gave all you board information to the compiler, the program could only be verifiably correct for one potential state of one potential hardware rev.

by milesvp

4/3/2026 at 3:31:05 PM

Sure, but eliminating bugs isn't a binary where you either eliminate all of them or it's a useless endeavor. There's a lot of value in eliminating a lot of bugs, even if it's not all of them, and I'd argue that empirically Rust does actually make it easier to avoid quite a large number of bugs that are often found in C code in spite of what you're saying.

To be clear, I'm not saying that I think it would necessarily be a good idea to try to rewrite an existing codebase that a team apparently doesn't trust they actually understand. There are a lot of other factors that would go into deciding to do a rewrite than just "would the new language be a better choice in a vaccuum", and I tend to be somewhat skeptical that rewriting something that's already widely being used will be possible in a way that doesn't end up risking breaking something for existing users. That's pretty different from "the language literally doesn't matter because you can't verify every possible bug on arbitrary hardware" though.

by saghm

4/3/2026 at 6:05:26 PM

The hardware only understand addresses and offsets, aka pointers :)

by axelriet

4/3/2026 at 6:06:57 PM

All the more reason to have memory safety on top.

by mlsu

4/3/2026 at 10:52:59 AM

Did you miss the part that writes about the "all new code is written in Rust" order coming from the top? It also failed miserably.

by flohofwoe

4/3/2026 at 12:34:40 PM

That was quite interesting and now I will take another point of view of the stuff I shared previously.

However given how Windows team has been anti anything not C++, it is not surprising that it actually happened like that.

by pjmlp

4/3/2026 at 6:17:46 PM

It came from the top of Azure and for Azure only. Specifically the mandate was for all new code that cannot use a GC i.e. no more new C or C++ specifically.

I think the CTO was very public about that at RustCon and other places where he spoke.

The examples he gave were contrived, though, mostly tiny bits of old GDI code rewritten in Rust as success stories to justify his mandate. Not convincing at all.

Azure node software can be written in Rust, C, or C++ it really does not matter.

What matters is who writes it as it should be seen as “OS-level” code requiring the same focus as actual OS code given the criticality, therefore should probably be made by the Core OS folks themselves.

by axelriet

4/3/2026 at 6:44:43 PM

I have followed it from the outside, including talks at Rust Nation.

However the reality you described on the ground is quite different from e.g. Rust Nation UK 2025 talks, or those being done by Victor Ciura.

It seems more in line with the rejections that took place against previous efforts regarding Singularity, Midori, Phoenix compiler toolchain, Longhorn,.... only to be redone with WinRT and COM, in C++ naturally.

by pjmlp

4/6/2026 at 5:41:29 PM

Because neither C nor C++ creates friction.

The whole memory safety chapter is a human problem first and foremost.

Some humans haven’t written a memory-safety bug in decades, but it requires a discipline the recent hire never acquired.

I always advocated fixing issues at their root. Humans write bugs, fix the humans. Somehow this was always regarded as taboo ever since I started at Microsoft in 2013.

by axelriet

4/3/2026 at 8:43:12 PM

May I ask, what kind of training does the new joins of the kernel team (or any team that effectively writes kernel level code) get? Especially if they haven't written kernel code professionally -- or do they ONLY hire people who has written non-trivial amount of kernel code?

by markus_zhang

4/5/2026 at 10:48:08 PM

There is no formal training (like bootcamp or classes) but the larger org has extensive documentation (osgwiki) and you are expected to learn and ramp-up by yourself.

I don’t think there is any kernel code writing experience requirement but the hiring bar is sky-high, you have to demonstrate that you are a programmer.

by axelriet

4/3/2026 at 9:17:33 AM

Once you reach this stage, honestly the only escape is real escape. Put your papers in and start looking for a job elsewhere, because when they go down, they will go down hard and drag you with them. It's not like you didn't try.

by neya

4/4/2026 at 5:56:47 AM

Though this doesn't make much sense on its surface - a bug means something is already broken, and he tells of millions of crashes per month, so it was visibly broken. 100% chance of being broken (bug) > some chance of breakage from fixing it

(sure, the value of current and potential bug isn't accounted for here, but then neither is it in "afraid to break something, do nothing")

by eviks

4/4/2026 at 6:16:26 AM

I've experienced a nearly identical scenario where a large fleet of identical servers (Citrix session hosts) were crashing at a "rate" high enough that I had to "scale up" my crash dump collection scripts with automated analysis, distribution into about a hundred buckets, and then per-bucket statistical analysis of the variables. I had to compress, archive, and then simply throw away crash dumps because I had too many.

It was pure insanity, the crashes were variously caused by things like network drivers so old and vulnerable that "drive by" network scans by malware would BSOD the servers. Alternatively, successful virus infections would BSOD the servers because the viruses were written for desktop editions of Windows and couldn't handle the differences in the server edition, so they'd just crash the system. On and on. It was a shambling zombie horde, not a server farm.

I was made to jump through flaming hoops backwards to prove beyond a shadow of a doubt that every single individual critical Microsoft security patch a) definitely fixed one of the crash bugs and b) didn't break any apps.

I did so! I demonstrated a 3x improvement in overall performance -- which by itself is staggering -- and that BSODs dropped by a factor of hundreds. I had pages written up on each and every patch, specifically calling out how they precisely matched a bucket of BSODs exactly. I tested the apps. I showed that some of them that were broken before suddenly started working. I did extensive UAT, etc.

"No." was the firm answer from management.

"Too dangerous! Something could break! You don't know what these patches could do!" etc, etc. The arguments were pure insanity, totally illogical, counter to all available evidence, and motived only by animal fear. These people had been burned before, and they're never touching the stove again, or even going into the kitchen.

You cannot fix an organisation like this "from below" as an IC, or even a mid-level manager. CEOs would have a hard time turning a ship like this around. Heads would have to roll, all the way up to CIO, before anything could possibly be fixed.

by jiggawatts

4/4/2026 at 6:25:06 AM

Yeah, long periods of total disfunction get ingrained

Though just to ref my original point

> burned before, and they're never touching the stove again

Except they are sitting on the stove with their asses burning, which cuts all the needed cooling off their heads!

by eviks

4/4/2026 at 9:43:21 AM

The better analogy is that they ran out of the kitchen in a panic, and left the pots on the burners. Some time later there is smoke curling up from under the kitchen door, but they’re used to the burning smell by now so it’s “not that big a deal”.

by jiggawatts

4/4/2026 at 2:02:33 PM

> Once you reach this stage, the only escape is to first cover everything with tests and then meticulously fix bugs, without shipping any new features.

Isn't this where Oracle is with their DB? Wasn't HN complaining about that?

by bombcar

4/3/2026 at 5:39:57 AM

Or to simplify the product and rebuild.

by idorosen

4/3/2026 at 12:26:19 PM

“Rebuild” is also a four-letter word though at this stage too. The customer has a panel of knob-and-tube wiring and aluminum paper-wrapped wire in the house. They want a new hot tub. They don’t want some electrician telling them they need to completely rewire their house first at huge expense, such that they cannot afford the hot tub anymore. They’ll just throw the electrician out and get some kid in a pickup truck (“You’re Absolutely Right Handyman LLC”) to run a lamp cord to their new hot tub. Once the house burns to the ground, the new owners will wire their new construction correctly.

by teeray

4/3/2026 at 5:41:50 AM

Exactly. But he’s right about management, first the problem must be acknowledged and that may make some people look bad.

by axelriet

4/3/2026 at 9:34:39 AM

writing tests and then meticulously fixing bugs does not increase shareholders' value.

by egorfine

4/4/2026 at 8:30:56 PM

Dave Cutler and his team are a clear counter-example. They famously shipped Windows NT with zero known bugs, which clearly brought enormous shareholder value.

The problem, of course, is that this sort of thing doesn’t bring value next quarter.

by branko_d

4/3/2026 at 7:23:19 AM

once you reach the stage, the only escape is to give up on it. and move on.

somethings are beyond your control and capabilities

by rk06

4/3/2026 at 6:37:03 AM

if the service is so shitty, why are people paying so much fucking money for it?

is microsoft committing an accounting fraud?

by doctorpangloss

4/3/2026 at 10:03:01 AM

I worked at a startup that was using Azure. The reason was simple enough - it had been founded by finance people who were used to Excel, so Windows+Office was the non-negotiable first bit of IT they purchased. That created a sales channel Microsoft used to offer generous startup credits. The free money created a structural lack of discipline around spending. Once the startup credits ran out, the company became faced with a huge bill and difficulty motivating people to conserve funds.

At the start I didn't have any strong opinion on what cloud provider to use. I did want to do IT the "old fashioned way" - rent a big ass bare metal or cloud VM, issue UNIX user accounts on it and let people do dev/test/ad hoc servers on that. Very easy to control spending that way, very easy to quickly see what's using the resources and impose limits, link programs to people, etc. I was overruled as obviously old fashioned and not getting with the cloud programme. They ended up bleeding a million dollars a month and the company wasn't even running a SaaS!

I ended up with a very low opinion of Azure. Basic things like TCP connections between VMs would mysteriously hang. We got MS to investigate, they made a token effort and basically just admitted defeat. I raged that this was absurd as working TCP is table stakes for literally any datacenter since the 1980s, but - sad to say - at this time Azure's bad behavior was enabled by a widespread culture of CV farming in which "enterprise" devs were all obsessed with getting cloud tech onto their LinkedIn. Any time we hit bugs or stupidities in the way Azure worked I was told the problem was clearly with the software I'd written, which couldn't be "cloud native", as if it was it'd obviously work fine in Azure!

With attitudes like that completely endemic outside of the tech sector, of course Microsoft learned not to prioritize quality.

We did eventually diversify a bit. We needed to benchmark our server software reliably and that was impossible in Azure because it was so overloaded and full of noisy neighbours, so we rented bare metal servers in OVH to do that. It worked OK.

by mike_hearn

4/3/2026 at 8:26:02 PM

"Basic things like TCP connections between VMs would mysteriously hang"

This is like a car that can't even get you two blocks from home. Amazing.

by jrl

4/3/2026 at 12:44:13 PM

I have had bad experiences across all major vendors.

The main reason I used to push for Azure instead during the last years was the friendliness of their Web UIs, and having the VS Code integration (it started as an Azure product after all).

by pjmlp

4/3/2026 at 2:47:53 PM

Friendliness?

VSCode integration out of the box, that I can understand. But I have a really hard time calling Azure UI "friendly". Everything is behind layers of nested pointy-clicky chains with opaque or flat out misleading names.

To make things worse, their APIs also follow the same design. Everything you actually would want to do is behind a long sequence of pointer-chasing across objects and service/resource managers. Almost as if their APIs were built to directly reflect their planned UI action sequences.

by bostik

4/3/2026 at 2:52:59 PM

Yes, some of us grew out of the 1970's approach to command line, unless there is no other way.

GCP is the worse some options are only available on the CLI, without any visual feedback on the dashboard.

by pjmlp

4/3/2026 at 12:18:35 PM

Corporate inertia. Sibling comment uses the term "hostage situation" which I admit is pretty apt.

Microsoft is an approved vendor in every large enterprise. That they have been approved for desktop productivity, Sharepoint, email and on-prem systems does not enter the picture. That would be too nuanced.

Dealing with a Large Enterprise[tm] is an exercise in frustration. A particular client had to be deployed to Azure because their estimate was that getting a new cloud vendor approved for production deployments would be a gargantuan 18-to-24 month org-wide and politically fraught process.

If you are a large corp and have to move workloads to the cloud (because let's be honest: maintaining your own data centres and hardware procurement pipelines is a serious drag) then you go with whatever vendor your organisation has approved. And if the only pre-approved vendor with a cloud offering is Microsoft, you use Azure.

by bostik

4/3/2026 at 7:15:35 AM

The US government’s experts called Azure “a pile of shit”; they got overruled.

https://www.propublica.org/article/microsoft-cloud-fedramp-c...

by rawgabbit

4/3/2026 at 7:42:05 AM

Because Azure customers are companies that still, in 2026 only use Windows. Anyone else uses something else. Turns out, companies like that don't tend to have the best engineering teams. So moving an entire cloud infrastructure from Azure to say AWS, probably is either really expensive, really risky or too disruptive to do for the type of engineering team that Azure customers have. I would expect MS to bleed from this slowly for a long time until they actually fix it. I seriously doubt they ever will but stranger things have happened.

by hunterpayne

4/3/2026 at 9:59:17 AM

Turns out outside companies shipping software products aspiring to be the next Google or Apple, most companies that work outside software industry also need software to run their business and they couldn't care less about HN technology cool factor.

They use whatever they can to ship their products into trucks, outsourcing their IT and development costs , and that is about it.

by pjmlp

4/3/2026 at 11:56:18 AM

Agreed, though only up to a point. Companies that need software to run their business, need that software to run.

When your operations are constantly hampered by Azure outages, and your competitors' are not, you're not going to last if your market is at all competitive. Thankfully for many companies, a lot of markets aren't, I suppose, at least for the actors who have established a successful rent and no longer need to care how their business operations are going.

by Balinares

4/3/2026 at 3:28:12 PM

I have worked at two retail companies where AWS was a no no. They didn't want to have anything depending on a competitor(Amazon). So they went the Azure route.

by MyHonestOpinon

4/3/2026 at 8:20:41 AM

CFOs love it because Microsoft does bundle pricing with office. Plus they love to give large credits to bootstrap lock-in.

by bradleyjg

4/3/2026 at 3:36:00 PM

You’re assuming the alternatives don’t have just as many issues. There’s been exactly one “whistleblower” who is probably tiptoeing the line of a lawsuit. I wouldn’t assume just because there isn’t a similar disgruntled gcp or aws engineer doesn't mean they don't have similar ways.

by tw04

4/3/2026 at 4:08:24 PM

this made me look into how cloud hypervisors actually work on HW level.. they all offload it to custom HW (smart nic, fpga, dpu, etc..). cpu does almost nothing except for tenant work. AWS -> Nitro, Azure -> FPGA, NVIDIA sells DPUs.

Here is interactive visual guide if anyone wants to explore - https://vectree.io/c/cloud-virtualization-hardware-nitro-cat...

by functional_dev

4/5/2026 at 7:10:20 AM

VM management does not run on the FPGA; it’s regular Win32 software on Windows, with aspirations to run some equivalent, someday, on the SoC next to the FPGA on the NIC. The programmable hardware is used for network paths and PCIe functions, where it can project NICs and NVMe devices to VMs to bypass software-based, VMBus-backed virtual devices, all of which end up being serviced on the host who controls the real hardware. Lookup SR-IOV for the bypass. So yes, that’s I/O bypass/offload, but the VM management stack offload is a distinct thing that does not require an FPGA, just a SoC.

by axelriet

4/3/2026 at 9:10:06 AM

most the upper management of companies who use them have dont have the technical competence to see it. (eg: banks, supermarket chains, manufacturing companies)

once they are in, no one likes to admit they made a mistake.

by miyuru

4/3/2026 at 3:56:10 PM

Depending on the space you work in, you have almost no choice at all. If you're building for government then you're going to use Microsoft, almost "end of story".

by staticassertion

4/3/2026 at 6:49:21 AM

It’s more of a hostage situation.

by fxtentacle

4/5/2026 at 1:31:45 AM

Yeah it’s entirely business people and executives who make these decisions in most companies. Not the ones who use it or implement on it.

by llama052

4/3/2026 at 9:14:42 AM

Because the alternatives are also in similar state.

AWS or GCP are all pretty crap. You use any of them, any you'll hit just enough rough edges. The whole industry is just grinding out slop, quality is not important anywhere.

I work with AWS on a daily basis, and I'm not really impressed. (Also nor did GCP impress me on the short encounter I had with it)

by fodkodrasz

4/3/2026 at 12:48:12 PM

I don't know about AWS or the rest of GCP, but in terms of engineering, my experience of GCE was at least an entire order of magnitude better than what the article alleges about Azure. Security and reliability were taken extremely seriously, and the quality of the engineering was world-class. I hope it has stayed like this since then. It was a worthwhile thing to experience.

by Balinares

4/3/2026 at 3:57:15 PM

This isn't it at all. AWS does not have the same sorts of insane cross-tenancy exploits that Azure has had, for example.

The reason that Azure has so many customers is very simply because Azure is borderline mandated by the US government.

by staticassertion

4/2/2026 at 10:27:55 PM

I don't know if any of this is true, but as a user of Azure every day this would explain so much.

The Azure UI feels like a janky mess, barely being held together. The documentation is obviously entirely written by AI and is constantly out of date or wrong. They offer such a huge volume of services it's nearly impossible to figure out what service you actually want/need without consultants, and when you finally get the services up who knows if they actually work as advertised.

I'm honestly shocked anything manages to stay working at all.

by yoyohello13

4/3/2026 at 2:39:08 AM

I’ve created a bunch of fresh Azure accounts over the past few years and each time I’ve found myself sitting there dumbfounded anew at how garbage the experience is.

There has been weird broken jank at just about every step of the process at one point or another. Like, I’m a serious person trying to set something up for a production workload, and multiple times along the way to just having a working account that I can log into with billing configured, I’ll get baffling error messages like [ServiceKeyDepartureException: Insufficient validation expectancy. Sfhtjitgfxswinbvgtt-33-664322888], and the whole thing will simply not work until several hours later. Who knows why!?

I evaluated some Azure + Copilot Studio functionality for a project recently, which required more engagement with their whole 365 ecosystem than I’d had in a long time and it had many of the same problems but worse. Just unbelievably low quality software for the price and how popular it is. Every step of the way I hit some stupid issue. The people using this stuff are clearly not the people buying it.

by macNchz

4/3/2026 at 3:29:04 AM

I've joked that on some services, when you're clicking buttons, you're actually opening tickets that a human needs to action.

That scenario is an example. You complete an action on a web page and nothing works. You make no further changes and hours later it works perfectly. Your human wasn't fast enough that day.

by doubled112

4/3/2026 at 3:46:04 AM

That's the "digital escort" process mentioned in the very long OP. Understandably, the US government got mad when they found out that cheap Chinese tech support staff were being used for direct intervention on "secure" VMs.

by pjc50

4/3/2026 at 5:37:57 AM

That's not what the "problem" was. It's that cheap American support people were "escorting" foreign Microsoft SWEs, so they could manage and fix services they wrote and were the subject matter experts for in the sovereign cloud instances which they otherwise would have no access to.

And this was NOT for the government clouds we have that hold classified data. Those are air-gapped clouds that physically cannot be accessed by anyone who doesnt have a TS clearance and physically go into a SCIF.

source: I work in a team very closely related the team who designed digital escort.

by locusofself

4/3/2026 at 7:01:53 AM

I would definitely fight against calling anything I work on „digital escort”.

by ozim

4/3/2026 at 8:09:02 PM

Yeah, it’s not a great name. But it originates from the government. When somebody without a security clearance needs to go to a secure area, they must be escorted by somebody.

by locusofself

4/3/2026 at 3:36:41 PM

When the blog post mentioned Hegseth and “digital escort” in the same sentence, I was surprised to learn it wasn’t about his OnlyFans habit at his work desktop.

by pavlov

4/3/2026 at 5:52:03 AM

Yes but this misses the underlying point: this is the same software. It suffers from the same defects. If your management stack keeps crashing and leaking VMs you are seeing a reduction in the operational capacity of the fleet. If you are still there just tour Azure Watson and tell me if you’d want the military to rely on that system in wartime? Don’t forget things like IVAS and God knows what else that are used during operations while Azure node agents happily crash and restart on the hosts. The system should be no-touch and run like an appliance, which is predicated on zero crashes or 100% crash resiliency. In Windows Core we pursued a single Watson bucket with a single hit until it was fixed. Different standards.

by axelriet

4/3/2026 at 5:57:17 AM

I'm only commenting on parent comment's understanding of what digital escort process is specifically. Escort is used by all kinds of teams that are just doing day-to-day crap for various resource providers across azure. I've never worked anywhere close to Azure Core so I don't know about these more low-level concerns. Overall I agree and sympathize with your assessment of the engineering culture.

by locusofself

4/3/2026 at 6:05:40 AM

You also make it sound like getting a JIT approved is getting keys to the kingdom. It's not -- every team has it's own JIT policies for their resources. Should there be far less manual touches? Ideally. But JIT is better than persistent access at least, and JIT policies should be scoped according to principle of least privilege. If that is not happening, it's a failure at the level of that specific org.

by locusofself

4/3/2026 at 6:54:50 AM

Policies vary. The node folks get access to the nodes and the fabric controller by necessity.

I guess we agree on the point where it should not be necessary, which echoes Cutler’s original intent of “no operational intervention.”

This is not an impossible task, after all it’s just user-mode code calling into platform APIs.

by axelriet

4/3/2026 at 9:14:21 AM

200 requests a day, lol

by voganmother42

4/5/2026 at 11:00:08 PM

on average :)

by axelriet

4/3/2026 at 8:31:01 AM

> I've joked that on some services, when you're clicking buttons, you're actually opening tickets that a human needs to action.

I just experienced one startup where the buttons just happen to only work during business hours on the US west coast.

by bigfatkitten

4/3/2026 at 4:11:11 PM

Infrastructure-as-a-ServiceNow Ticket

by coredog64

4/3/2026 at 8:17:39 AM

> when you're clicking buttons, you're actually opening tickets that a human needs to action

I had one public cloud vendor sales literally admit this was the case with their platform. But they were now selling "the new one" which is supposed to be better.

It was, a lot. But only compared to the old one.

by szszrk

4/3/2026 at 1:14:19 AM

I remember being impressed with the Azure docs... until I spend a week implementing something, only to have it completely fail when deployed to the test environment because GraphAPI did not work as documented. The beautiful docs were a complete lie.

These days I don't even bother looking at the docs when doing stuff with Azure.

by chillfox

4/3/2026 at 2:23:27 AM

I can’t count the number of times the docs have been totally wrong.

by throwaway173738

4/3/2026 at 7:12:17 AM

And they were actually like that pre-LLM, in 2019, when I was implementing stuff for a car company on azure. They spent _hundreds of thousands_ on cosmosDB, for less performance than a raspberry pi running Postgres.

by h6d_100c

4/3/2026 at 12:51:42 PM

Every marketing page and just about every second documentation page goes on and on about how fantastic CosmosDB performance and scalability is. Meanwhile, the best performance I have ever managed to squeeze out of it could be generously classified as "glacial".

Whenever I read its docs I feel like I'm being gaslit.

by jiggawatts

4/3/2026 at 5:55:50 AM

Pretty surprised to hear this. I would think (assuming they are LLM written as parent suggests), that MS could throw a large context "pro" LLM at the code base and you should get perfect docs, updated every release?

More perfect than a person where I might mistakenly copy/paste or write "Returns 404" but the LLM can probably see actually return a 401.

I'm not a stranger to LLMs hallucinating things in responses but I'd always assumed that disappeared when you actually pointed it at the source vs some nebulous collection of "knowledge" in a general LLM.

by GCUMstlyHarmls

4/3/2026 at 6:31:04 AM

Is it your first time using an LLM? No, they generate plausible-sounding bullshit no matter the input. Sometimes that bullshit is useful. Other times it isn't.

by gzread

4/2/2026 at 11:12:22 PM

We migrated some services to AKS because the upper management thought it was a good deal to get so many credits, and now pods are randomly crashing and database nodes have random spikes in disk latency. What ran reliably on GCP became quite unpredictable.

by ragall

4/3/2026 at 3:01:16 AM

Exact same story at my place. Upper management decided it's a good idea to build on Azure because Microsoft promised some benefits. Things that ran reliable on GCP now need active firefighting on Azure

by nibbleyou

4/2/2026 at 11:38:05 PM

Interesting! We're using AKS with huge success so far, but lately our Pods are unresponsive and we get 503 Gateway Timeouts that we really can't trace down. And don't get me started on Azure Blob Tables...

by SeriousM

4/2/2026 at 11:54:50 PM

In our case this was only a month ago, and now we're stuck because management thought it was a good idea to sign a hefty spend commitment.

by ragall

4/3/2026 at 12:48:05 AM

In our case, we spent to much time of engineer time just to put up with Azure but there’s no good ROI. It took sometime for the upper management to realize Azure is shit and cut the cost

by a012

4/3/2026 at 1:00:24 AM

Don't they have an SLA? You can break that open if they don't perform.

by jacquesm

4/3/2026 at 5:11:10 AM

To what end? I've never seen an SLA which is clear cut enough to be worth pursuing if you want more than a free t-shirt.

by oasisbob

4/3/2026 at 5:19:51 AM

> I've never seen an SLA which is clear cut enough to be worth pursuing if you want more than a free t-shirt.

I have, regularly. I am not sure what kind of business you are running but parties that rely on service providers for critical (primary business process driving) components routinely agree to SLAs with large penalties and the ability to open up an existing contract in case of non-performance. Obviously you would have to be willing to pay for such a service in the first place otherwise there is no point in setting up an SLA, this won't be cheap. But we're definitely not talking about 'free t-shirts' here, more about direct liability, per hour penalties and so on.

by jacquesm

4/3/2026 at 5:37:18 AM

I'm thinking ISPs, colo, cloud.

By the time SLA thresholds are being breached you've been through months (or years) of pain. They're not strong enough or specific enough to save you from a bad provider. ymmv

by oasisbob

4/3/2026 at 5:53:04 AM

Colo and cloud providers that provide real SLAs exist. But they're pricey because they tend to insure against breach of that that SLA and they pass on the cost of that insurance. If you're a run-of-the-mill e-commerce company then it probably doesn't make much sense. But if you yourself are providing critical services to others and they have you by the short hairs in case you don't perform you better make sure that you're not going to end up holding the bag.

One simple example: energy market services, 15 minute ahead and day ahead markets require participants to have the ability to perform or they will be penalized severely, to the point where they can lose that access, the damage of which could easily be in the 10's of millions to 100's of millions depending on their size. Asset owners and utilities both would be able to hit them hard if they do not perform, the asset owners for lost income and the utilities for both government penalties and possibly for outages and all associated costs. These are not the kind of contracts you enter into lightly.

by jacquesm

4/3/2026 at 2:08:47 AM

Exactly what I was thinking. But then again, from what I've seen, the persons responsible for monitoring uptimes are often much further removed from the C suite in these "committed-spend" companies.

by fakedang

4/3/2026 at 8:15:08 AM

Gcp is hard to beat on k8s stuff. Performance and stability is crazy good.

But it's not aws are famous and costs money. Hence moving away seems like a good idea :)

by Bombthecat

4/2/2026 at 11:28:18 PM

I’ve worked with their consultants and they were lovely. They hate Azure too.

by ryoshu

4/2/2026 at 11:41:02 PM

I imagine that no one likes Azure.

by everdrive

4/3/2026 at 12:04:00 AM

The only good thing Microsoft azure ever did for me was provide a very easy way to exploit their free trial program in the early 2010s to crypto mine for free. It couldn’t do much, but it was straight up free real estate for CPU mining. $200 or 2 weeks per credit/debit card.

by Forgeties79

4/3/2026 at 5:51:02 AM

Ah, I did the same, but wasn't the experience/UI back then pretty nice too?

I haven't used azure since then, but I remember the web interface was way more polished than aws and things worked ok (spinning up a VM was fast etc).

So I'm confused by how everyone seems to hate it now.

by riffraff

4/3/2026 at 6:48:30 PM

I did it all in the command line so can’t say

by Forgeties79

4/3/2026 at 3:01:56 AM

I used it for MMO goldfarming - circa 2012/2013

by tmpz22

4/3/2026 at 4:05:17 AM

Damn that’s impressive. Wasn’t it all command line at the time?

by Forgeties79

4/3/2026 at 6:34:26 AM

We use Azure for desktops and we pay $600/month for 4 cores, getting performance comparable to a $60 Intel N100 chip.

by user34283

4/3/2026 at 3:40:31 PM

We tried this (and M$ sold it hard) and never went to production with it (except for a couple of niche use cases). It was obviously not going to meet expectations before we were half way through the PoC.

by kjs3

4/3/2026 at 9:02:28 AM

Azure container apps are a great (idea) and work mostly fine as long as you don’t need to touch them. But they’re just like GCR or what fargate should be - container + resources and off you go.

We ran many internal workloads on ACA, but we had _so may issues_ with everything else around ACA…

by maccard

4/3/2026 at 12:45:39 AM

Only C level likes Azure

by a012

4/3/2026 at 12:24:20 AM

Yeah no shade on the consultants. I’ve worked with some good ones too.

by yoyohello13

4/3/2026 at 4:25:01 AM

The part about prioritizing "aggressive feature velocity" over "core fundamentals" is true.

The push is as insane as push to AI.

At the same time fundamental improvements like migrating to .net core, or reducing logs is actively deprioritised. If it were not for compliance, we would not have any core engineering improvement at all

Honestly, I was not even aware of rust push, probably cause no one in my org could do rust. I am glad we did not move to AKS though

by rk06

4/3/2026 at 10:31:53 AM

Oh my goodness, yes. And how often their role assumption does not work!

I need privileges to do thing A, so I assume the role, and even though the role is shown as active, the buttons are still greyed out. Sometimes it works after 10 minutes and 7x F5, most often however I do a complete relogin with MFA in an incognito window. Not distracting at all, and even that does not work sometimes.

by kbrkbr

4/3/2026 at 10:52:32 AM

Using a magic link[0] from Microsoft refreshes the token instantly, but you have to do in a new tab. It's worked for me anytime permissions don't update after checking out a PIM role.

0: https://aka.ms/pim/tokenrefresh

by hypeatei

4/3/2026 at 12:17:20 PM

Thank-you! thank-you, thank-you, thank-you.

[This is the single most helpful tip/link from HN I have ever found, much appreciated]

by jjkaczor

4/3/2026 at 4:22:06 PM

I have been a frustrated user as well. Their services seem to be held together by duct tape. For instance, an online endpoint creation failed after 90minutes with internal-error and no clue what the error is. Support tickets are routed overseas to consultants who dont have a clue - and their job is a daily email keeping the customer warm. All-in-all, and as OP says, its amazing that it is still hanging together. Some services work reliably but not all.

by bwfan123

4/3/2026 at 9:32:12 AM

Question: how would you compare it to AWS or GCP?

by mettamage

4/3/2026 at 6:29:50 AM

A business man at a prior employer sympathetic with my younger, naive "Microsoft sucks" attitude told me something I remember to this day:

Microsoft is not a software company, they have never been experts at software. They are experts at contracts. They lead because their business machine exceeds at understanding how to tick the boxes necessary to win contract bids. The people who make purchasing decisions at companies aren't technical and possibly don't even know a world outside Microsoft, Office, and Windows, after all.

This is how the sausage is made in the business world, and it changed how I perceived the tech industry. Good software (sadly) doesn't matter. Sales does.

This is why most of Norway currently runs on Azure, even though it is garbage, and even though every engineer I know who uses it says it is garbage. Because the people in the know don't get to make the decision.

by petterroea

4/3/2026 at 3:48:21 PM

I’d say, they are very good at making platforms and grab everyone lock-in. But they need a good platform first. Azure seems like the first platform that is kinda shitty from the beginning and did not improve much.

MBASIC was good and filled a void so it got used widely from the beginning. The language is their first platform. Later the developer tools like the IDE, compilers, still pretty solid if you ask me.

MS-DOS and Windows are their next platform. It started OK with DOS — because CP/M was not great either. But the stability of Windows sucked so they brought in David Cutler’s team to make NT. It definitely grabbed the home/office market but didn’t do well for the server market.

X-BOX is their third platform, which started very well but we all know the story now.

Azure is their fourth platform, started shitty and still not good. The other platforms have high vintage points but Azure may not have one.

by markus_zhang

4/3/2026 at 6:11:25 PM

Those are mostly end-user or hosting platforms you mention (and their problems), what really makes MS tick is the enterprise platforms.

Windows networks, Active Directory,etc. Azure is the continuation of that, those who run AD oftne default to Azure (that offers among other things hosted or hybrid AD environments).

by whizzter

4/3/2026 at 6:28:53 PM

Yeah those too, sorry I never worked with the MSFT stack in corporate, except for my first company when my IT knowledge was still minimum.

by markus_zhang

4/3/2026 at 11:33:26 AM

That's true for Azure, where contracts are signed due to free credits given over Office and Windows usage.

However, there is a reason why everyone uses Office and Windows. Office is the only suite that has the complete feature set (Ask any accountant to move to Google Sheets). Windows is the only system that can effectively run on any hardware (PnP) and have been that way for decades.

This is due to superior software on the aspects that matter to customers

by breppp

4/3/2026 at 1:56:58 PM

People use Windows because Office runs on Windows, and Windows ran in any shitty cheap beige box. This is the whole story since the 1990's.

On hardware: it's because Windows has a stable kernel ABI and makes it very simple for hardware vendors to write proprietary drivers. Linux kind of forces everybody to upstream their device drivers, which is good and bad at the same time - DKMS is something relatively new.

But yeah, the NT kernel is very nice, the problem with Windows is the userland.

by rescbr

4/4/2026 at 10:32:32 AM

I use to think that too.

But if you really look at it the "comfort zone" problem isn't too big of an issue in itself that a few training workshops and brief acclimatization periods for other tool suites can't solve. Making accountants move to Google Sheets is actually doable given enough incentive; there really isn't a lack of features in Sheets against Excel so much as there is a difference of implementation. In fact, for many purposes Sheets and GSuite could even be the "superior software" if only one bothers to make good use of it.

The problem is more that companies hesitate to take the dive because they can't be sure any of the alternatives will stay stable in the long run. Google is infamous for abruptly shutting down applications and none of the other competitors have built enough of a repute yet to ensure longterm reliability.

Microsoft has been (and continues) riding on its first-mover advantage as an unmovable establishment for decades. It has worked out till now, but who knows till when.

by dartharva

4/3/2026 at 12:56:44 PM

The selling point of Excel is not the feature set, it's that people know Excel and are usually very resistant to learning something new.

by steve1977

4/3/2026 at 1:25:35 PM

As someone who’s compared spreadsheet feature sets, though: it’s also very much the feature set.

by gilrain

4/3/2026 at 2:07:49 PM

Well, in a way it is of course, because if your reference is Excel, then you want the feature set of Excel.

Or what specifically do you mean?

by steve1977

4/3/2026 at 2:48:18 PM

Sheets and Numbers are spreadsheets. Excel is an application platform and programming language that’s convinced people it’s just a spreadsheet.

by kristjansson

4/3/2026 at 5:39:05 PM

VBA, PowerQuery, structured references, the newer formulae like XLOOKUP, dynamic array-spill formulae, map/filter/reduce/lambda, various obscure financial stuff.

Sheets and Calc don't have these.

by _dain_

4/5/2026 at 10:37:19 AM

The problem is that it encourages people to use excel for things that should never be in a spreadsheet in the first place. I mean if you're reaching for VBA, building complex PowerQuery pipelines, and writing nested LAMBDA functions just to process your data, imho you have outgrown excel. Just because you can build an entire solution in Excel because you already know the interface, doesn't mean you should...

Also, don't get me started on the newer functions such as XLOOKUP and Dynamic... Relational data belongs in a relational database. If you are joining tables and filtering massive arrays, you should be using standard SQL Arrays, it makes it so much easier to troubleshoot long term.

by baridbelmedar

4/3/2026 at 12:58:46 PM

Windows is the only system that can effectively run on any hardware

...as long as that hardware is Intel-based (and a select few ARM-based boards nowaways). And the reason that it runs on all that hardware is because of Microsoft's business contracts with hardware vendors, not because of their software quality -- that's immaterial, as Microsoft generally does not write the drivers.

by tremon

4/3/2026 at 1:49:29 PM

Compare the experience in Linux or Mac for getting some random no-name device working with Windows.

A lot of it is the fact that the OS has created a very complex yet consistent system of device compatibility that was completely absent from all competitors who are still behind on that aspect or alternatively the choice of kernel design architecture

by breppp

4/3/2026 at 4:40:34 PM

It's been like two decades since I used windows on a computer I own, but I always had a way harder time getting hardware to work with windows than I have with linux. I still shudder when I remember trying to track down drivers from different vendors, while avoiding the malware they shipped with it versus letting it just work.

edit:

I just remembered when I first used CUPS to configure a printer in 2003. It blew my mind with how easy it was, and I think that was the moment when I decided to start using linux as my primary desktop. Pre-Novell Suse at the time if im remembering correctly.

by dec0dedab0de

4/3/2026 at 11:40:00 AM

This is in many ways a smart way to understand the problem, but it doesn't mean that microsoft contracts mean you're stuck with bad software. There are several verticals where Microsoft and Azure actually were smart and chose a better software product to sell on their platform than what they had in house.

One example is when they stopped trying to develop a inferior product to EMR and Dataproc, and essentially just outsourced the whole effort to a deal made between them and Databricks. Because of this I assume many enterprise azure customers have better running data solutions in that space than they wouldve had they gone with just AWS or GCP.

On the other hand, having worked for Microsoft on an Azure team, there are plenty of areas that critically need a rewrite (for dozens of different reasons), and such a solution is never found (or they just release some different product and tell those with different needs to migrate to that), where they keep on building what can only really be described as hot-fixes to meet urgent customer demands that make it harder to eventually do said critical rewrite.

by zjaffee

4/3/2026 at 11:55:16 PM

The Databricks thing was a ploy. They then pushed Azure Synapse Analytics and forced all internal teams to stop using Azure Databricks. Synapse was half baked and then they are now pushing Microsoft Fabric which is even less baked.

by jwoq9118

4/3/2026 at 4:12:29 PM

About a year ago the whole situation changed and Microsoft started to push everyone to their own Data Engineering solution (Fabric) that back then was really half-baked.

by haddr

4/3/2026 at 2:27:17 PM

A overly reductionist argument. They described any commercial software company because in the end, you sell or you die. Microsoft has incredible software people and incredible software that coexists with the shitty software people and shitty software.

by rhyperior

4/3/2026 at 5:25:38 PM

Agree. You could say the exact same thing about Oracle, for example.

by bananamogul

4/3/2026 at 6:37:34 AM

But that also means that if you as a user/customer can make choices based on technical merits, you'll have a significant advantage.

by dbdr

4/3/2026 at 6:56:21 AM

An advantage how? Maybe you'll have one or two more 9s of uptime than your competitors; does that actually move the needle on your business?

by lmm

4/3/2026 at 8:13:35 AM

The biggest expense in software is maintenance. Better software means cheaper maintenance. If you actually want to have a significant cost advantage, software is the way to go. Sadly most business is about sales and marketing and has little to do with the cost or quality of items being sold.

by hunterpayne

4/3/2026 at 10:01:49 AM

Why wouldn't it move the needle? Less time spent, less frustration, more performance, more resources focused on the business?

by Aissen

4/4/2026 at 10:40:09 AM

You massively underestimate the difference in employee productivity not having to fight user-hostile software every step of the way can make. Not to mention cascading cost savings and infra flexibility distancing yourself from Microsoft products can grant.

by dartharva

4/3/2026 at 8:14:08 AM

It will depend on each case and what makes the marketed solution inferior. If it's overly complex and you will save development time. If it's unstable you'll save debugging time. If it's bloated you will save on hardware costs. Etc...

by dbdr

4/3/2026 at 9:47:44 AM

matters less than we would like it to

after all startups/scaleups/bigtech companies that make a lot of money can run on Python for ages, or make infinite money with Perl scripts (coughaws)

and it matters even less in non-tech companies, because their competition is also 3 incompetent idiots on top of each other in a business suite!

sure, if you are starting a new project fight for good technical fundamentals

by pas

4/3/2026 at 7:22:25 AM

Most customers don't really have the knowledge needed to make choices based on technical merits, and that's why the market works as it does. I'm willing to say 95% of people on HN have this knowledge and are therefore biased to assume others are the same way. It's classic XKCD 2501.

by petterroea

4/3/2026 at 6:19:24 PM

I think this is spot on. Everything at the R&D phase of a project indicates that an Azure service is going to work for the use case. I've been reading the docs and though 'wow this is perfect!'. Then you get to implementation and realize its a buggy mess that barely does what you wanted to do in the first place, with ton of caveats.

Of course that realization comes when you are already at the point of no return, probably by design.

by yoyohello13

4/3/2026 at 10:09:35 AM

My lesson was when European companies followed US tech into offshoring, and how quality doesn't play any role as long as the software delivers, from business point of view.

Especially relevant when shipping software isn't the product the company sells.

by pjmlp

4/3/2026 at 2:24:02 PM

Finnish public sector is also heavy Azure user. Their common ethos is that modern cloud services(=azure) are in many respects more secure than on-premises data centers. In addition, they are cost-effective and reliable.

by kakoni

4/3/2026 at 5:06:31 PM

I mean, if you ignore all the heaps of impressive software Microsoft does ship, sure.

by phillipcarter

4/3/2026 at 5:22:33 PM

It’s been a while. The underinvestment shows. Across the industry as well.

by mixmastamyk

4/3/2026 at 6:34:25 AM

[dead]

by szundi

4/2/2026 at 9:28:13 PM

What are we reading here? These are extraordinary statements. Also with apparent credibility. They sound reasonable. Is this a whistleblower or an ex employee with a grudge? The appearance is the first. Is it? They’ve put their name to some clear and worrying statements.

> On January 7, 2025… I sent a more concise executive summary to the CEO. … When those communications produced no acknowledgment, I took the customary step of writing to the Board through the corporate secretary.

Why is that customary? I have not come across it, and though I have seen situations of some concern in the past, I previously had little experience with US corporate norms. What is normal here for such a level of concern?

More, why is this public not a court case for wrongful termination?

Is Azure really this unreliable? There are concrete numbers in this blog. For those who use Azure, does it match your external experience?

by vintagedave

4/2/2026 at 11:58:01 PM

>Is Azure really this unreliable? There are concrete numbers in this blog. For those who use Azure, does it match your external experience?

IME, yes.

I'm currently working as an SRE supporting a large environment across AWS, Azure, and GCP. In terms of issues or incidents we deal with that are directly caused by cloud provider problems, I'd estimate that 80-90% come from Azure. And we're _really_ not doing anything that complicated in terms of cloud infrastructure; just VMs, load balancers, some blob storage, some k8s clusters.

Stuff on Azure just breaks constantly, and when it does break it's very obvious that Azure:

1. Does not know when they're having problems (it can take weeks/months for Azure to admit they had an outage that impacted us)

2. Does not know why they had problems (RCAs we're given are basically just "something broke")

3. Does not care that they had problems

Everyone I work with who interacts with Azure at all absolutely loathes it.

by bumblehean

4/3/2026 at 10:16:28 AM

But doesn’t this experience contradict what OP is saying in a way. If azure is always breaking wouldn’t that imply that changes like “adding smart pointers” are being introduced into the codebase?

by chris_money202

4/3/2026 at 10:48:19 AM

I don't think it contradicts the OP. OP says the system is unreliable. Memory leaks that lead to out of memory failures for example. Smart pointers would stabilize things. (Also note that OP says their smart pointers PR was rejected).

by andyg_blog

4/3/2026 at 7:43:26 PM

That's a generalized statement. Smart pointers can stabilize things, if used wrongly they can cause just as many issues. Sprinkling in smart pointers such that there is now mixed use with smart and raw pointers can cause double frees, and huge maintenance issues. So, creating a single PR to introduce smart pointers in my opinion is not necessarily "stability". He should have created an architecture plan and got upstream and downstream aligned.

by chris_money202

4/3/2026 at 10:21:45 PM

Completely agree on alignment. Without it, it's a shortcut to rejection. I actually wrote a lot about this in a blog post I called "Minimum Reviewable Unit" https://gieseanw.wordpress.com/2025/03/21/minimum-reviewable...

by andyg_blog

4/3/2026 at 2:46:26 AM

As a former MSFTy it does sound weird to me too. I didn’t see what Axels level was but a lot of people work for Microsoft and not many of them can expect to email the CEO and get a response. It seems a bit like a crash out, not the first I’ve seen levied at Azure, won’t be the last. They probably think it’s a mental health episode, if you’re an important CEO crazy people will email you all the time and the staff probably filter them out before they see it. Also this is a lot of internal gossip, I would be worried that airing this publicly would impinge on future career opportunities, even healthy orgs would appreciate some discretion.

I’m sure everything he said is completely true, Azure is one of the few tech stacks I refuse to work with and the predominant reason I left.

If you’ve joined an org and nothing works the reason is usually that the org is dysfunctional and there is often very little you can do about it, and you’re probably not the first person who’s tried and failed at it.

by cjbgkagh

4/3/2026 at 12:29:30 PM

While Microsoft is hierarchical - but it did encourage reaching out in a "flat" manner internally.

In my experience - a loooong time ago ago now - executive leadership would participate in high-level escalations/critsits for large/key customers on calls. I was just a lowly field-engineer - but over the course of nearly 4-years, was on calls about 5 times with some of the big-names from that era that everyone knows about... And they seemed to emit enough empathy with the specific customer situation to move things forward.

However - being on the "other-side-of-the-fence" (i.e. external, consulting with Microsoft customers - some of them who even spend $1.5billion/year in M365/Azure licensing) and assisting clients with issues and remediations for the last 10-years, things are no longer the same. No amount of escalation gets further than occasionally reaching some level of the product team - and it can take 8-12 months before that even occurs. Troubleshooting and deep-engineering support skills for cloud customers are typically non-existent, and the assigned resources seem to just wait until the issue resolves itself...

by jjkaczor

4/3/2026 at 7:12:35 AM

Never worked at a FAANG, but from what I read from their cultures I don't think a letter to the CEO from a senior engineer would go entirely unnoticed there. CEO's might receive crazy letters, but hopefully not regularly from their senior engineering staff..

by tinco

4/3/2026 at 1:04:18 PM

Microsoft has a large PR department to put out such false impressions. The culture has changed, AFAIK you used to be able to email Bill Gates and be fairly confident he would read it, but you better be sure it was worth reading or he would fire you. Now they’re unlikely to fire you but they’re unlikely to read it either.

Senior leadership seems to be more far sequestered now, a bit like Trump, surrounded by lackeys giving them an entirely false impression of the world. That’s how they could legitimately believe they were going to bury the IPhone.

by cjbgkagh

4/3/2026 at 10:29:16 PM

putting aside that MS is too huge to even just know about the names of your senior engineers across the globe and that the mail might have gone directly to spam

there is still the issue that this might have been classified as "a crazy letter"

a lot of the article reminds me of people which might (or might not) have competency but insists they know better and are very stubborn and very bad and compromising on solutions. The subtext of the articles is not that far afar from "everyone does everything wrong, I know better, but no one listens to me". If you frame it like that it very much sounds like a "crazy" letter.

Strictly speaking it reminds me a lot about how Pirate Software spoke about various EA related topics. (Context: Pirate Software was a streamer and confidence man who got complemented up due to family connections and "confidently knew" everything better while having little skill or contributions and didn't know when to stop having a "confidently bad" opinion. Kinda sad ending given that he did motivate people to peruse their dream in game design and engage themself for animal protection.).

Or how I did do so in the past. Appearing very confident in your know-how ironically isn't always good.

And in case it's not clear: The writing reminding me of it and having patters of someone trying to create a maximally believable writing to make MS look bad doesn't mean that he behaves like that or that the writing is intended to be seen that way.

It's more about how we have a lot of "information" which all look very believable, but in the end miss means to both: Verify many of the named "facts". And, more importantly, judge the sentiment/implicit conveyed information.

Especially if we just take the mentioned "facts" without the implicit messages and ignore the him<->management communication issues I would guess a lot of that is true.

by dathinab

4/3/2026 at 3:39:47 PM

A "Senior Software Engineer" at Microsoft is someone with a pulse and 3 years of experience (due to title inflation); so despite the "senior" in the title definitely not "senior engineering staff".

by gigel82

4/3/2026 at 5:51:42 PM

I like how caring about fiduciary responsibility is a mental health episode or personality disorder to enough people in the comments. Simply being employed gives you a vested interest in keeping an operation above board and healthy. If you have a stock plan, you have equal rights to comment on issues as some low IQ private equity chief that does an end run to manipulate a company for their own benefit. The cattle psychology of most IT workers and mid level managers never ceases to amaze me.

by kev009

4/3/2026 at 7:02:12 PM

It is emblematic of a crash out, I’ve seen a lot of them, and I’ve thought about doing it myself. I understand the impulse first hand. I did quit over my misgivings but I did not write a blog post about it, that would have been a career ended for me. I am expected to keep secrets as part of my job.

I work for myself now, for less money, but I do get to build things to the quality level that I want.

by cjbgkagh

4/3/2026 at 7:20:24 PM

Sure, it is a "crash out" in the sense that you have an infinitesimal chance of changing things unless the founder is still involved because the situation didn't happen in a vacuum. And yep, you are narrowing your path significantly... sycophants have no tolerance for hardliners. But the mental heath episode is the opposite, to keep reporting for duty once you know something is deleterious. A lot of white collar people are on psychotropic drugs or otherwise self-medicate with alcohol and addictions and it's not difficult to diagnose why. Congrats on going your own way.

by kev009

4/2/2026 at 10:00:32 PM

In my experience Azure is full of consistency issues and race conditions. It's enough of an issue that I was talking about new OpenAI models becoming available via Bedrock on AWS and how convenient that was since I wouldn't have to deal with Azure and my colleague in enterprise architecture went on an unprompted rant about these exact issues. It's not the first time something like this has happened and I've experienced these issues first hand, so yes. I'd say reliability is a critical issue for Azure and it hasn't gotten better each time I've gone back to check.

by ZeroCool2u

4/2/2026 at 11:06:49 PM

I recall seeing some pretty damning reports from a security pentester that was able to escape from a container on Azure and found the management controller for the service was years old with known critical unpatched vulnerabilities. Always been a bit sceptical of them since then

by rando1234

4/3/2026 at 9:20:52 AM

A decent portion of Azure Web Apps internals hasn't moved past .net core 2.3

by twisteriffic

4/2/2026 at 11:21:22 PM

Large orgs make decisions that prioritize short-term metrics over long-term quality all the time and nobody tracks whether those tradeoffs actually paid off. The decision to ship fast and fix later sounds reasonable in a meeting setting until articles like this surface and the reality comes through clearly.

by convexly

4/3/2026 at 9:43:50 AM

> sounds reasonable in a meeting setting until articles like this surface

No. It sounds reasonable past that. Because shipping features will make shareholders happy while an article like this will change nothing.

by egorfine

4/4/2026 at 9:23:14 AM

Microsoft Azure holds 20% of global cloud infrastructure market, 2nd largest behind AWS. Seems to be working.

by aswegs8

4/3/2026 at 2:44:19 AM

I am sort of confused how NDA and such agreements employees sign would allow for an employee to post such an article without being sued by Microsoft?

by VladVladikoff

4/3/2026 at 5:22:26 AM

Wild guess, touching this with a 10-foot pole risks validating his claims. If they sue for breach of NDA, it means his claims are factually correct, and if they sue for libel and it goes to court, they may be forced to submit documents they don't want to.

by HighGoldstein

4/3/2026 at 4:44:06 AM

Most likely, the author was let go in mass layoff, and they forgot about NDA.

by rk06

4/3/2026 at 9:32:08 AM

NDAs are usually signed when you join the company, not leave it.

Signing a non-disparagement agreement is often a condition for receiving severance, although I'm not sure what MSFT's policy on this is.

by decimalenough

4/3/2026 at 4:28:17 AM

If they can swing it as legit whistleblowing somehow, they might be ok.

by justinclift

4/3/2026 at 3:33:53 AM

Interesting point. Time will tell.

by axelriet

4/3/2026 at 5:38:13 AM

What I meant is that it’s customary to write to the Board through the Secretary as opposed to write directly or through some other channel.

by axelriet

4/3/2026 at 9:23:40 AM

Thanks for the direct reply! I wasn’t aware it was ever customary to write to a board.

But I do see you have very clear concerns.

One thing I don’t fully follow is: how did it get from such a nicely designed system, built by Dave Cutler, to this — simply moving fast and building tech debt?

by vintagedave

4/4/2026 at 5:45:49 AM

I was there when the SDET role was eliminated.

Our team of 8 SDE and 5 SDET became a team of 8 devs who also owned tons of QA frameworks. It was awful, except each tiny we deleted a test suite for not being valuable; that was awesome!

If the Azure org was flooded with senior engineers who did not have senior-level experience architecting production software... That explains a lot.

by darthwalsh

4/3/2026 at 9:08:23 PM

Writing to the board is not customary. When you do so, it is customary to do it through the secretary.

by nimonian

4/3/2026 at 4:48:01 AM

> What are we reading here? These are extraordinary statements. Also with apparent credibility.

I left Microsoft in 2014. Already back then I could see this sort of stuff starting to happen.

The Office Org was mostly immune from it because they had a lot of lifers, people who had been working on the same code for decades and who thought through changes slowly.

But even by 2014 there were problems hiring developers who knew C++, or who wanted to learn it. COM? No way. One one team we literally had to draw straws once to determine who was going to learn how to write native code for Windows.

It wasn't even a talent thing, Windows development skills are a career dead end outside of Microsoft. They used to be a hot commodity, and Microsoft was able to hire the best of the best from industry. Now they have to train people up, and Microsoft doesn't offer any of the employment perks that they used to use to attract top talent (Seattle used to be a low CoL area, everyone had private offices, job stability).

When I started at Microsoft in 2007, the interview bar included deep knowledge of how computers worked. It wasn't unusual to have meetings drop down to talking about assembly code. Your first day after orientation was a bunch of computer parts and you were told to "figure out how to setup your box".

Antivirus wasn't mandatory. The logic was if you got a virus, they made a mistake hiring you and you deserved to be fired.

When your average developer can go that deep on any topic, you can generally leave engineers well enough alone and get good software.

by com2kid

4/3/2026 at 12:54:21 PM

> But even by 2014 there were problems hiring developers who knew C++, or who wanted to learn it. COM? No way.

It doesn't help that there are some teams that are hardcore in keeping things as they are and don't want any tooling that might improve COM development experience.

To this day Microsoft is yet to have any COM related tooling for C++ as easy to use as C++ Builder does it.

MFC, ATL, WRL, WIL,.... you name it.

The only time it seemed they finally got it, with C++/CX, there was a group that managed to kill this product, replace it with C++/WinRT, with no tooling other than the command line IDL compiler, now also abandoned as they refocused into windows-rs.

by pjmlp

4/3/2026 at 2:05:19 PM

Oh, as somebody who wrote C++/CX code at the time, I was very pissed when they replaced it with WinRT.

by rescbr

4/3/2026 at 2:50:25 PM

Same here, the way it was removed, without tooling parity on Visual Studio, revealed a complete lack of respect for those of us paying licenses.

This was one of the reasons I eventually moved back into distributed systems, and was pissed enough that I keep dismounting the WinUI marketing.

by pjmlp

4/3/2026 at 6:13:57 AM

Antivirus wasn’t mandatory in 2007 after the 2003 Blaster Worm, that required no user action to compromise the PC? Wild

by derwiki

4/3/2026 at 8:14:32 AM

On the other hand there was e.g. CVE-2021-1647 where Microsoft's antivirus would compromise the PC with no user action.

(At least I think that's the one I'm thinking of. It's marked as a high-severity RCE with no user interaction but they don't give any details. There was definitely at least one CVE where Windows Defender compromised the system by unsafely scanning files with excessive privileges.)

by ptx

4/4/2026 at 4:53:18 AM

People forget that prior to Microsoft releasing Defender, antivirus on Windows was universally bad. Like "make your machine almost unusable" bad.

This was also before SSDs as well.

With local build times already measured in multiple hours (large C++ code bases, lots of caches obj files loaded from central build servers to make local incremental rebuilds even possible), Microsoft didn't want to make things worse by forcing any bloat on developer machines.

by com2kid

4/3/2026 at 6:34:56 AM

Maybe they fired everyone who was working there in 2003. Would explain some things.

by gzread

4/3/2026 at 4:14:21 PM

“One team we literally had to draw straws once to determine who was going to learn how to write native code for Windows.”

Jesus, you have tons of people who are willing to do that, even now. Microsoft just don’t care to hire from non-target schools, or ordinary professionals and train them —- sure the reason is, people believe that you cannot improve mediocrity, which I don’t believe so.

On a completely different page, most of the generals and advisors and high level bureaucrats of the first Emperor of the Han dynasty came from exactly one county — the county of Pei. But in peaceful time they are just “ordinary people”.

by markus_zhang

4/4/2026 at 4:50:20 AM

> Jesus, you have tons of people who are willing to do that, even now. Microsoft just don’t care to hire from non-target schools, or ordinary professionals and train them

Microsoft was never elitist about what schools they hired from. When I was there almost anyone who applied from an accredited CS program got at least a phone screen.

But no one in their right mind, in 2012 (when this particular incident happened!), would voluntarily pick up native Windows development skills. It was obvious even then that it was a dead end market.

The number of companies hiring native Windows developers is tiny, and the pay isn't all that good.

It isn't quite COBOL bad, but it isn't a growing market.

by com2kid

4/4/2026 at 1:54:36 PM

I don’t know, I’d love to do the job. Where do I sign up? I know some C from my OS projects, a bit of C++ with my SDL game projects, nothing professional. I also write a lot of Python and SQL. I’m in Canada. Don’t care about salary as long as it’s kinda stable and above 90k CAD, that’s about 1/3 salary cut. QT is good too, I did one project.

I actually tried my luck on LinkedIn but without answer, so would love to get a reference somewhere. If people say my side projects are not enough, which is probably true, I can focus on some Win32 programming for 3 months and see if it works.

But if they only hire greybeards who bagged 20 years of experience then I’m out of luck.

Of course the ideal job is some system programmers job, but I understand that’s too hard, so a notch above is the next good option.

by markus_zhang

4/5/2026 at 12:36:03 AM

Every large company is doing layoffs right now. Getting an interview 10 years was your best bet. Heck even in 2022 I heard they were brining everyone in for interviews.

by com2kid

4/2/2026 at 11:27:32 PM

Yeah I thought that was extreme. An engineer going to the board of any corporation let alone Microsoft is not normal or customary IME. That could explain why they got no response.

by chasd00

4/3/2026 at 2:40:18 PM

When you see significant risks to the org and its value, and they go completely unaddressed by management, the board is the final step before going to the public. It is the board’s duty to the public owners to make sure management isn’t driving the company into the ground.

It would be interesting to see this raised in the next shareholders meeting as a question of whether the board and exec team are actually competent and doing their work.

A man can dream anyway. When there is this much money on the line, sometimes people actually get held somewhat accountable.

by jtbayly

4/3/2026 at 3:13:13 PM

It's a baffling flaw in human nature. The board should have cared about these issues, but in practice communications to and from the board are tightly controlled, and communications outside of those constraints are discarded.

This occurs whether or not it makes sense. Machiavelli actually warns about the specifically: if someone else controls access to you and communication with you, they have real leverage over you.

by everdrive

4/3/2026 at 4:14:03 AM

Not on day one. Imagine it took two years to get there.

by axelriet

4/5/2026 at 12:35:45 AM

“customary” referred to the path through the Secretary, as opposed to writing directly to members. Besides that, depending on the nature of the communication, if everything fails, you may need to be sure you talk to people who will unconditionally put the best interests of the company ahead of any other consideration. The Board is one such group. See what Boeing did with the report of the mechanic who saw flaws in the 737 MAX’s door plugs. Was that worthy of a letter to the CEO, then the Board if no reaction? Or just talk to your dismissive manager and let the planes crash? I made a judgment call, which I entirely own.

by axelriet

4/2/2026 at 9:54:09 PM

The CEO is accountable to the board. If they are derelict in their obligations to the company, that's where you need to raise a stink so they can fix it.

by bigbuppo

4/2/2026 at 11:07:44 PM

Well, yeah, that’s what a board does, but I think the issue is whether it is customary to go to the board directly in this situation. The answer is a resounding NO. Very odd, but cool idea and approach.

by ohyoutravel

4/3/2026 at 2:59:56 AM

Maybe naive, but why not? If it's a serious enough issue, and you're not getting anywhere through your management chain all the way up to the CEO, why is it novel to contact the people the CEO reports to? They're not royalty, they're other human beings who also eat, piss and fart like everyone else.

by ryandrake

4/3/2026 at 3:13:48 AM

Before 6 years of Google I’d co-sign what you said, but it never ever plays out that way.

The law of the jungle is an iron law, make people around you feel bad, be a tattletale, and you’re choosing to be ostracized.

That said yr interlocutor disturbs me a bit because yes, they certainly will make it out to be a mental health episode. But the implicit deal there is “STFU. You can even take paid health leave.” It’s not healthy either. BigCo is insane I’ll never work for one again without outrageous comp.

You’d be stunned by even the simplest story. Ex. a year in some crazy shit was going down and my manager asked for my thoughts on a topic, I was honest and basically said “I don’t think it’s a good idea, but in my experience, raising issues involving people only raises more issues.” He swore up and down it wouldn’t be a problem, eventually made a deal I could email it to him privately. Next 1:1 with my area lead was horrible, them seeing red, hearing a mistranslated version of what I said, and I had 0 warning.

by refulgentis

4/3/2026 at 8:13:48 AM

I guess you're in the US?

In Europe I speak up all the time, even to people who are not in Europe.

(Usual disclaimed that this is my opinion.)

by _zoltan_

4/3/2026 at 2:06:31 PM

I loved working with Zurich

by refulgentis

4/5/2026 at 12:18:03 AM

“customary” referred to the path through the Secretary, as opposed to writing to individual members. I added a clarification at the bottom of the page.

by axelriet

4/4/2026 at 10:44:57 AM

Yeah but I can't conceive a world where a Board would care about technical complaints from an employee about engineering decisions several levels downstream of the CEO's executive domain.

by dartharva

4/2/2026 at 10:12:31 PM

Yes it is that unreliable. Even when given free credits, I would rather pay for the offerings from Amazon/Google.

by zipy124

4/2/2026 at 10:21:39 PM

He is, I think, Swiss, perhaps a cultural difference?

by lokar

4/3/2026 at 4:13:17 AM

We like things well done, but also integrity and accountability.

by axelriet

4/3/2026 at 7:53:04 PM

>> He is, I think, Swiss, perhaps a cultural difference?

> We like things well done, but also integrity and accountability.

Unless they involve secret bank accounts [1], refugees [2], and/or nazis [3] :-)

All props to you, though, for speaking out. This is going to help a lot of folks understand why things are going the way they are with Azure, and MS.

[1] https://www.theguardian.com/news/2022/feb/22/how-swiss-banki...

[2] though in fairness, it appears to be changing, but we shall see. https://www.swissinfo.ch/eng/culture/how-switzerland-s-views...

[3] https://www.theguardian.com/world/1999/dec/11/1

by ninjagoo

4/5/2026 at 7:22:29 AM

Everyone is entitled to their opinion.

by axelriet

4/2/2026 at 10:24:00 PM

Azure is when you have a different version of the same product/api in each region.

by pRusya

4/3/2026 at 11:38:22 PM

I notice the title mentions the author is a former employee but he never mentions the terms on which he left.

by Hammershaft

4/4/2026 at 12:42:14 PM

at the bottom of part 4 -

> The org’s leadership responded with strong defensiveness and denial. Not long afterward, the organization terminated my employment.

by sevenseacat

4/2/2026 at 10:25:07 PM

The post is so dramatized and clearly written by someone with a grudge such that it really detracts from any point that is trying to be made, if there is any.

From another former Az eng now elsewhere still working on big systems, the post gets way way more boring when you realize that things like "Principle Group Manager" is just an M2 and Principal in general is L6 (maybe even L5) Google equivalent. Similarly Sev2 is hardly notable for anyone actually working on the foundational infra. There are certainly problems in Azure, but it's huge and rough edges are to be expected. It mostly marches on. IMO maturity is realizing this and working within the system to improve it rather than trying to lay out all the dirty laundry to an Internet audience that will undoubtedly lap it up and happily cry Microslop.

Last thing, the final part 6 comes off as really childish, risks to national security and sending letters to the board, really? Azure is still chugging along apparently despite everything being mentioned. People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.

by Anon1096

4/3/2026 at 8:32:36 AM

>risks to national security and sending letters to the board, really?

Yes, really, and guess what the DoD did on Aug 29, 2025, exactly 234 days after I warned the CEO of potential risks?

https://www.propublica.org/article/microsoft-china-defense-d...

It wasn’t specifically about the escort sessions from any particular country, though, but about the list of underlying reasons why direct node access was necessary.

by axelriet

4/2/2026 at 10:43:39 PM

> People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.

Or… you’ve just normalised the deviation.

One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.

After about three or four weeks everyone adapts, learns what they can and can’t criticise without fallout, and settles into the mud to wallow with everyone else that has become accustomed to the filth.

As an Azure user I can tell you that it’s blindingly obvious even from the outside that the engineering quality is rock bottom. Throwing features over the fence as fast as possible to catch up to AWS was clearly the only priority for over a decade and has resulted in a giant ball of mud that now they can’t change because published APIs and offered products must continue to have support for years. Those rushed decisions have painted Azure into a corner.

You may puff your chest out, and even take legitimate pride in building the second largest public cloud in the world, but please don’t fool yourself that the quality of this edifice is anything other than rickety and falling apart at the seams.

Remind me: can I use IPv6 safely yet? Does it still break Postgres in other networks? Can azcopy actually move files yet, like every other bulk copy tool ever made by man? Can I upgrade a VM in-place to a new SKU without deleting and recreating it to work around your internal Hyper-V cluster API limitations? Premium SSDv2 disks for boot disks… when? Etc…

You may list excuses for these quality gaps, but these kinds of things just weren’t an issue anywhere else I’ve worked as far back as twenty years ago! Heck, I built a natively “all IPv6” VMware ESXi cluster over a decade ago!

by jiggawatts

4/3/2026 at 6:45:06 PM

> One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.

Wellllll ... my observations after many cycles of this are:

- wtfs/day exclaimed by people interacting with *a new codebase* are not indicative of anything. People first encountering the internals of any reasonably interesting system will always be baffled. In this context "wtf" might just mean "learning something new".

- wtfs/day exclaimed by people learning about your *processes and workflows* are extremely important and should be taken extremely seriously. "wtf, did you know all your junior devs are sharing a single admin API token over email?" for example.

by 12_throw_away

4/3/2026 at 2:22:38 AM

> One of the few reliable barometers of an organisation (or their products) is the wtf/day exclaimed by new hires.

Eh, I don't think this is exactly as reliable as you'd expect.

My previous job had a fairly straight forward code base but had fairly poor reliability for the few customers we had, and the WTF portions usually weren't the ones that caused downtime.

On the other hand, I'm currently working on a legacy system with daily WTFs from pretty much everyone, with a greater degree of complexity in a number of places, and yet we get fewer bug reports and at least an order of magnitude if not two more daily users.

With all of that said... I don't think I've used any of Microsoft's new software in years and thought to myself "this feels like it was well made."

by zdragnar

4/3/2026 at 1:10:44 PM

The rapid decay of WTF/day over time applies to both new employees and new customers.

> currently working on a legacy system

"Legacy" is the magic word here! Those customers are pissed, trust me, but they've long ago given up trying to do anything about it. That's why you don't hear about it. Not because there are no bugs, but because nobody can be bothered to submit bug reports after learning long ago that doing so is futile.

I once read a paper claiming that for every major software incident (crash, data loss, outage, etc...) between only one in a thousand to one in ten thousand will be formally reported up to an engineer capable of fixing the issue.

I refused to believe that metric until I started collecting crash reports (and other stats) automatically on a legacy system and discovered to my horror that it was crashing multiple times per user per day, and required on average a backup restore once a week or so per user due to data corruption! We got about one support call per 4,500 such incidents.

by jiggawatts

4/3/2026 at 1:21:15 PM

The customers aren't pissed, we're doing demos to new departments and lining up customizations and expansion as quickly as we can. We're growing faster than ever within our largest customer.

I also didn't say there are no bugs or complaints, I said the system is more stable. But yes, there are fewer bugs and complaints, especially on the critical features.

I didn't use the word legacy to mean abandoned, just that it's been around a long time, we're maintaining it while also building newer features in newer tech, as opposed to my previous company which was a green field startup.

by zdragnar

4/4/2026 at 1:10:08 AM

> But yes, there are fewer bugs and complaints

How do you know?

By that question I mean: Do you think there are fewer bugs because you hear fewer complaints from humans, or because you have a no-humans-involved mechanism for objectively evaluating the rate of bugs?

Even if you have a mechanical method for collecting bug reports, crash logs, or whatever, that can still obscure the true quality of the codebase.

One such example that I keep thinking about was the computer game Path of Exile. It has "super fans" that all have 10,000 hours of playtime that will swear up and down that it is one of the best games ever. When I first played it, I found so many little bugs and issues that I had more fun jotting them down than actually playing the game! I collected pages and pages of bullet points. None were crash bugs that would have been logged, and every one was the type of thing that players would eventually learn to work around by avoiding scenarios that caused the issue. I.e.: "Don't click to fast after going through a door because your orientation will be random on the other side, so you might be sent back to where you came from", that kind of thing.

Honestly and objectively measuring the quality of a software application (or any product) is hard.

by jiggawatts

4/3/2026 at 10:44:17 AM

I mean, the org had already decreed everything needed to be rewritten in Rust according to the account.

by justincormack

4/3/2026 at 1:05:09 AM

> Last thing, the final part 6 comes off as really childish, risks to national security and sending letters to the board, really?

That struck me too. Maybe i've never worked high enough in an org (im unclear how highly ranked the author of the piece is) but i've never been in an org where going over your boss's boss's boss's boss's head and writing a letter to the board was likely to go well.

That said, i could easily believe that both Azure is an absolute mess and that the author of the piece was fired because of how he went about things.

by bawolff

4/3/2026 at 8:41:44 AM

I didn’t say it went well. Actually I said it didn’t go well :(

by axelriet

4/3/2026 at 2:43:56 PM

Yes, but you are writing as if you expected it to go well.

by bawolff

4/3/2026 at 5:14:07 PM

I’d say they’re writing as if they expect everyone to share their ideals, and they’re responding to violations of those ideals in a naive and disempowered way. That doesn’t mean they’re wrong in those ideals, but the way they tried to fix things… they interacted with Cutler, why didn’t they try to influence him to fix things? Or any other senior technical leaders?

by rhyperior

4/3/2026 at 6:48:22 PM

I never said I didn’t. It was a multi-year escalation and I shared my concerns widely along with concrete options.

The thing is in corporate environments people avoid admitting anything is wrong because that would make them look bad and also disavows the bosses who promoted them, so the true best interest of the company takes a back seat.

by axelriet

4/4/2026 at 9:45:49 AM

Yeah I get that, I have been in a similar situation. At some point you just want to pull the trigger instead of going along with values that aren't yours. Sure, there is always the way of just silently disagree and move on, but who cares. We live one life not to maximize our careers, but to be who we are. Was an interesting read and wish all the best to you in the future.

by aswegs8

4/3/2026 at 2:43:09 AM

[flagged]

by beoberha

4/3/2026 at 4:12:50 AM

Lol, no.

It is true that writing to the board will get you noticed, and that you might not like the consequences. If you value having the job then don’t write to the board. Even if you are right, being noticed like that isn’t going to endear you to your boss.

But if you care more about doing the right thing then writing to the board is the right thing to do. And after a few years of working at Microsoft you might not value your job very much either and you too might decide to go out in style.

Go watch the last episode of Chernobyl again.

by db48x

4/3/2026 at 7:15:39 AM

Windows is ~500 times bigger than Azure, give or take, by machine count, and still many times larger by loc, modules, users, whatever else you want to measure. The heavy lifting (VM/containers, I/O, the things that cannot not be done just like that) is handled by the Windows folks anyway. The only hard part is the VM placement, everything else is mostly regular software engineering, some of medium-hard complexity but nothing that can excuse the need for constant human intervention.

by axelriet

4/3/2026 at 3:45:33 AM

Thanks for the free psychology assessment, I appreciate it, but I believe I’m fine. The series omits lots of details.

by axelriet

4/3/2026 at 5:30:53 AM

Hi, I hope you are doing good. From my personal experience, complaining about your manager to skip level manager is called Career Suicide.

There is nothing good that can come out of it,, except getting fired.

by rk06

4/3/2026 at 8:12:55 AM

It is, but “Microsoft runs on trust” they say. They also say the CEO’s inbox is always open, actually the CEO himself says it in the yearly mandatory training video on business conduct. So it should be safe, in theory, to openly speak out in the best interest of the customers, no? Rhetorical question :)

by axelriet

4/3/2026 at 10:31:15 AM

I feel like emailing the CEO in this case is just a no-op, the inbox is gatekeeped by his staff and very unlikely he saw your email.

That said, “inbox always open” means you should come with a problem AND a very well detailed solution. But question becomes if you had a detailed solution that was good, why wasn’t it ran up the org chart with buy in and why did it have to skip to the top.

by chris_money202

4/3/2026 at 6:53:44 PM

The answer is “intricate politics and misplaced personal interests.”

by axelriet

4/3/2026 at 10:17:21 PM

But that's part of a great solution... it sounds like you might have had a good technical solution, but that's only half the solution in enterprise. If your technical solution requires another team to completely retool, its not a great solution overall.

by chris_money202

4/3/2026 at 12:23:16 PM

it is not. the real world says one thing and does another.

here is how real world tech companies actually function: https://www.seangoedecke.com/how-to-ship/

by rk06

4/3/2026 at 9:50:36 AM

Don't believe everything people say. Watch what they do.

By the way, are you not worried about NDAs and such?

by auggierose

4/3/2026 at 12:11:08 PM

Yes, there is how things are said to work, and how they actually work.

by RajT88

4/3/2026 at 6:21:45 AM

[dead]

by shake_head_pig2

4/3/2026 at 3:40:39 AM

It reminded me of this one:

https://wtfmitchel.medium.com/how-to-get-fired-from-microsof...

A lot of similarities, except the medium author was not part of PG but support. He also had recently suffered a brain injury.

by RajT88

4/3/2026 at 11:10:28 AM

"While some may see this as a dick move and I wasn’t exactly proud of it, but I actually waited for Daniel’s wife, Katie, to go into labor before bringing all of this up with his management."

Holy cow! Now I've unfortunately witnessed some ugly office behavior too, but this is quite another level.

by guenthert

4/3/2026 at 3:46:36 AM

Before or after publishing his article?

by axelriet

4/3/2026 at 12:09:27 PM

It was the genesis of the events in the article.

by RajT88

4/3/2026 at 6:30:45 AM

like 5 minutes after.

by nvr219

4/3/2026 at 6:40:23 AM

Redacted to avoid getting doxxed (my original reply showed disdain for the parent comment and agreed with Axel's writing).

by 0______0

4/3/2026 at 7:19:44 AM

Former 1010 Overlake RnD here too :)

by axelriet

4/3/2026 at 9:55:21 AM

from a philosophy grad. both these responses are logical fallacies.

1: it's bad, but so is everything else (ad populum, everyone does it so it's ok).

2: it can only be because the author has a personality disorder or psychotic break (ad hominem)

by lukewarm707

4/2/2026 at 10:42:08 PM

AWS and Google Cloud are both huge and are significantly better in UX/DX. My only experience with Azure was that it barely worked, provided very little in the way of information about why it didn't. I only have negative impressions of Azure whereas at least GC and AWS I can say my experiences are mixed.

by kraemahz

4/3/2026 at 5:05:47 AM

> From another former Az eng now elsewhere still working on big systems, the post gets way way more boring when you realize that things like "Principle Group Manager" is just an M2 and Principal in general is L6 (maybe even L5) Google equivalent. Similarly Sev2 is hardly notable for anyone actually working on the foundational infra.

Before the days of title inflation across the industry, a a Principal at Microsoft was a rare thing. When I was there, the ratio was maybe 1 principal for every 30 developers. Principals were looked up to, had decades of experience, and knew their shit really well. They were the big guns you called in to fix things when the shit really hit the fan, or when no one else could figure out what was going on.

by com2kid

4/3/2026 at 1:16:51 PM

One of Microsoft's problems is their pay is significantly lower than FAANG and so you very very rarely see people with expertise in the same verticals jump to Azure. I get that "the deal" at Microsoft is lower pressure for lower pay but it really hinders the talent pipeline. There are some good home grown principals and seniors, but even then I think the people I worked with would have done well to jump around and get a stint at another cloud provider to see what it's like. Many of them started as new grads and their whole career was just at Azure.

Meanwhile when I was at another company we would get a weekly new hire post with very high pedigree from other FAANGs. And with that we got a lot of industry leading ideas by osmosis that you don't see Azure getting.

by Anon1096

4/3/2026 at 4:15:17 PM

Yeah the deal has also changed. Right as I was leaving the messaging started changing a lot and there was a clear top down “you all need to work harder”. They hired an ex Amazon guy to run my org which really drove the message home.

To be fair though I think Microsoft has decided they are fine with rank and file being mediocre. I don’t know how interested they are in competing for top talent except for at the top.

by fleshdaddy

4/4/2026 at 1:41:54 AM

> I get that "the deal" at Microsoft is lower pressure for lower pay but it really hinders the talent pipeline.

The deal used to be a lower cost of living in a major coastal city, an amazing campus (it is seriously lovely), every engineer had their own office, serious job security, and an unbelievable health care plan.

Seattle exploded in price, they moved to open offices, Microsoft started doing mass layoffs, and they gutted the healthcare plan (by the time I left the main plan on offer was a high deductible with a miserable prescription formulary).

Hard to attract talent when there is no big differentiator.

Of course in the 90s the deal was work there 10 years retire a millionaire. Easy to attract talent when that is the offer ...

by com2kid

4/3/2026 at 2:54:01 AM

I believe the author was referring to this https://www.propublica.org/article/microsoft-digital-escorts....

Microsoft hired Chinese engineers to manage US Department of Defense Azure VMs.

by rawgabbit

4/3/2026 at 4:00:27 AM

Thanks. That reference is correct. The point is why those sessions were necessary because there is no reason, a-priori, to do manual touches on production systems, DoD or not.

by axelriet

4/3/2026 at 12:16:24 AM

> risks to national security

Microsoft is the go to solution for every government agency, FEDRAMP / CMMC environments, etc.

> People come in all the time crying that everything is broken and needs to be scrapped and rewritten but it's hardly ever true.

This I'm more sympathetic to. I really don't think his approach of "here's what a rewrite would look like" was ever going to work and it makes me think that there's another side to this story. Thinking that the solution is a full reset is not necessarily wrong but it's a bit of a red flag.

by staticassertion

4/3/2026 at 12:23:11 AM

At no point during the reading I got sense that he's suggesting something radical. Where specifically is he pointing out rewrite?

"The practical strategy I suggested was incremental improvement... This strategy goes a long way toward modernizing a running system with minimal disruption and offers gradual, consistent improvements. It uses small, reliable components that can be easily tested separately and solidified before integration into the main platform at scale." [1]

[1] https://isolveproblems.substack.com/p/how-microsoft-vaporize...

by kklisura

4/3/2026 at 12:25:38 AM

> The current plans are likely to fail — history has proven that hunch correct — so I began creating new ones to rebuild the Azure node stack from first principles.

> A simple cross-platform component model to create portable modules that could be built for both Windows and Linux, and a new message bus communication system spanning the entire node, where agents could freely communicate across guest, host, and SoC boundaries, were the foundational elements of a new node platform

Yes, I read that part as well and found it a bit confusing to reconcile with this one.

The vibe from my quotes is very much "I had a simple from-scratch solution". They mention then slowly adopting it, but it's very hard to really assess this based on just the perspective of the author.

He also was making suggestions about significantly slowing down development and not pursuing major deals, which I think again is not necessarily wrong but was likely to fall on deaf ears.

by staticassertion

4/3/2026 at 3:54:15 AM

Interesting point. The two stances are not contradictory. The end result is a new stack, so you are right saying that was the intent. However how you get there on a running system is through stepwise improvements based on componentization and gradual replacement until everything is new. Each new component clears more ground. I never imagined an A/B switch to a brand new system rewritten from scratch.

by axelriet

4/3/2026 at 3:34:41 AM

> Microsoft is the go to solution for every government agency, FEDRAMP / CMMC environments, etc.

I've been involved with FEDRAMP initiatives in the past. That doesn't mean as much as you'd think. Some really atrocious systems have been FEDRAMP certified. Maybe when you go all the way to FEDRAMP High there could be some better guardrails; I doubt it.

Microsoft has just been entrenched in the government, that's all. They have the necessary contacts and consultants to make it happen.

> Thinking that the solution is a full reset is not necessarily wrong but it's a bit of a red flag.

The author does mention rewriting subsystem by subsystem while keeping the functionality intact, adding a proper messaging layer, until the remaining systems are just a shell of what they once were. That sounds reasonable.

by outworlder

4/3/2026 at 3:58:33 AM

Thanks. That was exactly the plan. Full rewrites are extremely risky (see the 2nd System syndrome) as people wrongly assume they will redo everything and also add everything everyone always wanted, and fix all dept, and do it in a fraction of the time, which is delusional and almost always fail. Stepwise modernization is a proven technique.

by axelriet

4/3/2026 at 6:25:38 AM

As someone who had worked adjacent to the functionally-same components (and much more) at your biggest competitor, you have my sympathy.

Running 167 agents in the accelerator? My gawd that would never fly at my previous company. I'd get dragged out in front of a bunch of senior principals/distinguished and drawn and quartered.

And 300k manual interventions per year? If that happened on the monitoring side , many people (including me) would have gotten fired. Our deployment process might be hack-ish, but none of it involved a dedicated 'digital escort' team.

I too have gotten laid off recently from said company after similar situation. Just take a breath, relax, and realize that there's life outside. Go learn some new LLM/AI stuff. The stuff from the last few months are incredible.

We are all going to lose our jobs to LLM soon anyway.

by guardiangod

4/3/2026 at 9:38:44 AM

> I've been involved with FEDRAMP initiatives in the past. That doesn't mean as much as you'd think. Some really atrocious systems have been FEDRAMP certified. Maybe when you go all the way to FEDRAMP High there could be some better guardrails; I doubt it.

I never said otherwise. I said that Microsoft services are the defacto tools for FEDRAMP. I never implied that those environments are some super high standard of safety. But obviously if the tools used for every government environment are fundamentally unsafe, that's a massive national security problem.

> Microsoft has just been entrenched in the government, that's all.

Yes, this is what I was saying.

> The author does mention rewriting subsystem by subsystem while keeping the functionality intact, adding a proper messaging layer, until the remaining systems are just a shell of what they once were. That sounds reasonable.

It sounds reasonable, it's just hard to say without more insight. We're getting one side of things.

by staticassertion

4/3/2026 at 3:47:35 AM

FedRAMP means nothing. It’s a checkbox. National security stuff has a different standard.

by Spooky23

4/3/2026 at 9:40:00 AM

It "means nothing" that the way that government systems get set up for government data is all using Microsoft tooling?

by staticassertion

4/4/2026 at 11:40:49 AM

The tail wags the dog. GCP and Workspace have had better FedRAMP certs for ages.

by Spooky23

4/4/2026 at 2:27:20 PM

I have no idea what you're talking about. This has nothing to do with having "better fedramp certs". If you are setting up fedramp or cmmc you will be heavily, heavily pressured and incentivized to do so with Microsoft tooling.

"Better" isn't relevant, which is my entire point. The reason people choose Microsoft isn't "it's better for this", it's because every consultancy out there, every government agency or affiliate, etc, is going to push Microsoft very very hard.

by staticassertion

4/5/2026 at 1:59:44 AM

I’ve been in the space for 30 years. Nobody is pressuring anyone to buy Microsoft because of FedRAMP, and Microsoft is not even close to having any advantage with respect to FedRAMP vs their competitors.

FedRAMP is demonstration that the solution met some assessment of controls in alignment with NIST 800-53. As a checkbox, it’s almost as dumb as FIPS 140, and like FIPS, you need to asses risk for your implementation regardless of these things.

Microsoft wins deals because their product catalog is well engineered to incentivize bundled subscriptions that drive marginal adoption. The user facing products are better, Entra is generally right there, and that’s a pivot into many other scenarios that drive spend.

by Spooky23

4/5/2026 at 8:02:06 PM

What's the most common architecture you see for CMMC enclaves, especially those built by outside consulting firms?

by staticassertion

4/6/2026 at 3:33:20 AM

I don’t work in defense, and neither does FedRAMP.

by Spooky23

4/6/2026 at 1:51:25 PM

Great, then I'll tell you. It's a bunch of FedRAMP certified Microsoft services.

by staticassertion

4/2/2026 at 11:01:10 PM

I think he did kind of point at the lack of seniority in the org, so I'm not sure he was trying to exaggerate with the titles.

I'm really struck that they have such Jr people in charge of key systems like that.

by lokar

4/3/2026 at 4:30:05 PM

Juniors love to hack out new things and in the mean time they can take the blame if needed, fair trade, won’t you say?

by markus_zhang

4/2/2026 at 11:35:15 PM

> risks to national security …really?

Really. Apparently the Secretary of War agrees with him.

by abtinf

4/3/2026 at 12:41:48 AM

In fairness the SECWAR is hardly a computing expert.

But in this case the SECWAR has been properly advised. If anything it's astonishing that a program whereby China-based Microsoft engineers telling U.S.-based Microsoft engineers specific commands to type in ever made it off the proposal page inside Microsoft, accelerated time-to-market or not.

It defeats the entire purpose of many of the NIST security controls that demand things like U.S.-cleared personnel for government networks, and Microsoft knew those were a thing because that was the whole point to the "digital escort" (a U.S. person who was supposed to vet the Chinese engineer's technical work despite apparently being not technical enough to have just done it themselves).

Some ideas "sell themselves", ideas like these do the opposite.

by mpyne

4/3/2026 at 1:03:19 AM

> If anything it's astonishing that a program whereby China-based Microsoft engineers telling U.S.-based Microsoft engineers specific commands to type in ever made it off the proposal page inside Microsoft, accelerated time-to-market or not.

> It defeats the entire purpose of many of the NIST security controls that demand things like U.S.-cleared personnel for government networks, and Microsoft knew those were a thing because that was the whole point to the "digital escort" (a U.S. person who was supposed to vet the Chinese engineer's technical work despite apparently being not technical enough to have just done it themselves).

That is beyond bad. Proof of this?

by jacquesm

4/3/2026 at 3:01:40 AM

https://www.propublica.org/article/microsoft-digital-escorts...

by mpyne

4/3/2026 at 4:09:22 AM

Holy fuck. Ok, this will change things considerably for some companies I'm working with that had moved their stuff to Azure. Thanks. More than I can express on here.

by jacquesm

4/3/2026 at 3:17:22 AM

Being compliant with the letter of the requirements at 1/3 of the cost is absolutely an idea that sells itself.

by lmm

4/3/2026 at 1:37:04 AM

I'd like to suggest calling him SECDEF, not SECWAR.

IMHO the country should not capitulate to Trump's power grabs, even if Congress refuses to perform their oversight duties.

by CoastalCoder

4/3/2026 at 3:01:02 AM

I'm sympathetic to the viewpoint but I'm not in the habit of policing the names people use for themselves.

I've certainly done more than my fair share of jobs in the Navy where the office I was formally billeted to had long since ceased to actually exist as described due to office renamings. Often things as simple as a department section being elevated into a department branch and people using the new name even while they wait 1-2 years for the manpower records to be fixed and the POM process to cycle through for program resourcing. But still, seems hard to treat it as a crime at one level when no one blinked an eye at the lower level.

Maybe Congress will eventually step in, but in the meantime the American voters made their choice about who they want to run these agencies, so...

by mpyne

4/3/2026 at 5:49:51 AM

The main title of the office is still “secretary of defense”, the executive order added a secondary title of the department and the office, it didn't replace the primary titles.

by cwillu

4/3/2026 at 3:45:56 PM

> the American voters made their choice about who they want to run these agencies

The American voters don't get to override the U.S. constitution. The American voters also voted in the U.S. Congress, which has the sole authority to name the department and title. My representatives have not voted to change the law. Do you not care about the rule of law?

> I'm not in the habit of policing the names people use for themselves.

I'm sure you think you're being clever, but this is such a bad faith argument.

by jubilanti

4/3/2026 at 8:22:53 PM

> Do you not care about the rule of law?

Of course I do. I hope the rest of my fellow Americans will someday care as much as I do about it. It's clearly not the case today.

But, is it illegal to refer to Secretary Hegseth as the SECWAR?

If so, would it be legal to refer to him as the SECDEF? After all, that isn't the formal term that Congress established his position as under 10 USC 113.

It's not hard to see all the cans of worms that emanate from the topic. I said already that this is Congress's purview, and they have had ample opportunity to put a stake in the ground on their position in response...

by mpyne

4/3/2026 at 12:34:55 PM

These agencies such as the Department of Defense, whose secretary is...?

The department's name is *legally* the Department of Defense. If they want to change it, they can go to Congress and do it the legal way. They have a majority. There's nothing stopping them except for their disregard for the sanctity of the law.

by jazzypants

4/3/2026 at 6:38:25 AM

We could call him by what he does: SECMASSMURDERER

by gzread

4/3/2026 at 2:04:19 AM

The United States does not have a Secretary of War, and has not since 1947.

by asteroidburger

4/3/2026 at 3:10:28 PM

Uhm:

> The United States secretary of defense (SecDef), secondarily titled the secretary of war (SecWar),[b] is the head of the United States Department of Defense (DoD), the executive department of the U.S. Armed Forces, and is a high-ranking member of the cabinet of the United States.[8][9][10]

Wikipedia

by 7bit

4/3/2026 at 12:44:25 AM

To be fair, it's not like Hegseth is a super high-signal source. Hegseth says lots of stuff, some of which are even true!

by pinkmuffinere

4/3/2026 at 8:26:51 AM

This was such a genuinely weird moment for me when reading the article.

"yadda yadda and then also the secretary of defence agreed it was bad"

I'm just reading along and going, "yeah that sounds really bad if a secretary level position is being cited... wait a second, isn't that actually the guy who is literally famous for being stupid??"

I never expected to be living through a real life version of "the emperor's new clothes", like, how is anyone quoting this guy about anything?

by wredcoll

4/3/2026 at 2:36:40 AM

I've worked at both Microsoft and Google in the past 6 years and the notion that msft "Principal" is equivalent to goog L5 is crazy.

by nrds

4/3/2026 at 3:16:36 AM

Meaning Msft Principal is below L5? I got the same feedback from one of my friends who works at Google. She said quality of former MSFT engineers now working at Google was noticeably lower.

by dh2022

4/3/2026 at 8:44:12 PM

I mean imputed prestige within the organization. Being an L5 is nothing; it's the promote-or-fire cutoff at Google AFAIK. But being a Principal is slightly more than nothing; it's two levels above the promote-or-fire cutoff.

I mean, _now_, sure, I'd assume Microsoft Principals should be hired around L4 at Google. But that's just due to a temporary inbalance in the decline of legacy organizations. Give it a few years and it will even back out and msft 64 will be in the middle of L5 range like levels.fyi claims.

by nrds

4/3/2026 at 10:21:54 PM

L5 hasn't been the promote or fire cutoff at Google for perhaps a decade. L4 is the new L5, mostly because Google would have to pay L5s more, and it has been terrified of personnel costs for years.

But even so, an L5 at Google is basically a nobody as far as prestige or convincing other people to adopt your plan goes. Even L6 is basically just an expert across several mostly local teams. L7 is where the prestige gets going.

by compiler-guy

4/3/2026 at 8:22:26 AM

I mean if you go by pay in the UK a Microsoft principle is equivalent to an L4 at Google if levels.fyi is too be believed....

by zipy124

4/3/2026 at 10:51:46 AM

The problem is that what he writes is very plausible and explains a lot about why Azure is so unreliable and insecure. The author didn't mention the shameful way Microsoft leaked a Golden SAML key to Chinese hackers. This event absolutely was a threat to national security.

by UltraSane

4/3/2026 at 1:56:12 PM

If your reaction is emblematic of the way people reacted to his points internally that does give more credibility to his side of the story IMHO

by dgellow

4/4/2026 at 10:48:15 AM

Do you contest the fact that Microsoft royally fumbled OpenAI out of sheer incapability of providing what's supposed to be its core business despite having all deals in its favor? Because that's the most damning validation against Azure in recent times.

by dartharva

4/3/2026 at 7:37:58 AM

Chugging along? Very clear you're not a customer using Azure.

by Hikikomori

4/3/2026 at 5:40:19 AM

For reference, author was a Senior Software Engineer, ie. mid-level engineer.

by whoamii

4/3/2026 at 4:33:50 PM

He was from the kernel team, though. I always put their experience x2.

by markus_zhang

4/3/2026 at 5:00:19 PM

Unless you’re still senior after 10+ years. Add to that switching companies and coming back same level…

by whoamii

4/3/2026 at 6:28:14 PM

I don't disagree with you. I wish there were some good counter points from the Azure team. There was one from the Azure team in who commented on the article, but I feel that comment to be a bit weak.

by markus_zhang

4/3/2026 at 7:01:02 PM

Life works in mysterious ways. Whoever you are, bring it on and prove any of my points wrong.

by axelriet

4/3/2026 at 10:40:29 PM

Like the one where 1.5T in value went pfoof due to reasons you mentioned? I will let people judge whether your arguments are most likely, or whether this is bubble syndrome. Hint: there is a large distance between use of smart pointers and market effects.

by whoamii

4/4/2026 at 3:06:05 AM

Hint: there is a small distance, almost zero, between your product being unreliable and losing your most flagship customer and the trust of the DoD.

Factor that there are limits to what I was willing to share in that story, limits that I’ll certainly not cross here.

Now, if you think that I’m wrong and that Microsoft found an entirely different way of shooting itself in the foot, I’d love to hear your perspective.

by axelriet

4/3/2026 at 9:31:00 AM

Yes it's easy to critique any large system or organisation, to then go over everyone's head and cry to the CEO and Board is snake like behaviour especially offering you self as the answer to fix it. OP will be marked as a troublemaker and bad team member.

by small_model

4/3/2026 at 9:53:55 AM

Maybe. That would be a dent in the shiny culture of trust Microsoft is proud to run on, though.

by axelriet

4/4/2026 at 5:04:19 AM

The grudge is simple and doesn't detract one thing from a very well articulated blog: you do you job as an engineer of pointing out problems, even proposing solutions, and they fire you for doing exactly the job. It's infuriating enough just from reading it, idk how you can't see any legitimacy on what the guy is complaining. You have your right of free speech to complain about shitty jobs if you want, there's no honor bound to maintain silence here.

by axel479343

4/2/2026 at 10:52:35 PM

[flagged]

by irishcoffee

4/2/2026 at 10:39:35 PM

He might sound like he has a grudge but you sound like you’re personally invested. Shill?

by sabedevops

4/3/2026 at 1:18:57 AM

I've seen Azure OpenAI leak other customer's prompt responses to us under heavy load.

https://x.com/DaveManouchehri/status/2037001748489949388

Nobody seems to care.

by Manouchehri

4/3/2026 at 2:08:13 AM

This is insane, when you say azure OpenAI, do you mean like github copilot, microsoft copilot, hitting openai’s api, or some openai llm hosted on azure offering that you hit through azure? This is some real wild west crap!

by jmogly

4/3/2026 at 2:47:26 AM

The latter, their arrangement with OpenAI enabled this.

by nkozyra

4/3/2026 at 6:16:15 AM

I have noticied a similar bug on Copilot. I noticed a chat session with questions that I had no recollection of asking. I wonder if it's related. I brushed it off as the question was generic.

by pratyushnair01

4/3/2026 at 12:43:34 PM

I would guess that Copilot uses Azure OpenAI.

In my small sample size of a bit over a 100 accidentally leaked messages, many/most of them are programming related questions.

It's easy to brush it off as just LLM hallucinations. Azure OpenAI actually shows me how many input tokens were billed, and how many input tokens checked by the content filter. For these leaked responses, I was only billed for 8 input tokens, yet the content filter (correctly) checked >40,000 chars of input token (which was my actual prompt's size).

by Manouchehri

4/3/2026 at 2:12:04 PM

I'd assume they mean https://azure.microsoft.com/en-us/products/ai-foundry/models...

by SahAssar

4/3/2026 at 2:20:39 PM

Correct.

by Manouchehri

4/3/2026 at 5:44:16 PM

If this is real, the scary part isn't that it happened. The scary part is Microsoft not acknowledging/publishing/warning that it happened. "We gave your data to other people" is one of those things you should really tell people.

by CobrastanJorji

4/3/2026 at 2:40:00 AM

That is absolutely insane.

by AmVess

4/3/2026 at 3:02:12 AM

Yeah, I saw over 100 leaked messages.

Fun ones include people trying to get GPT to write malware.

  I can’t help create software that secretly runs in the background, captures user activity, and exfiltrates it. That would meaningfully facilitate malware/spyware behavior.

  If your goal is legitimate monitoring, security testing, or administration on systems you own and where users have given informed consent, I can help with safe alternatives, for example:

  - Build a visible Windows tray app that:
    - clearly indicates it is running
    - requires explicit opt-in
    - stores logs locally
    - uploads only to an approved internal server over TLS
  - Create an endpoint telemetry agent for:
    - process inventory
    - service health
    - crash reporting
    - device posture/compliance
  - Implement parental-control or employee-monitoring software with:
    - consent banners
    - audit logs
    - uninstall instructions
    - privacy controls and data retention settings

  I can also help with defensive or benign pieces individually, such as:

  - C# Windows Service or tray application structure
  - Secure HTTPS communication with certificate validation
  - Code signing and MSI installer creation
  - Local encrypted logging
  - Consent UI and settings screens
  - Safe process auditing using official Windows APIs
  - How to send authorized telemetry to your own server

  If you want, I can provide a safe template for a visible C# tray app that periodically sends approved system-health telemetry to your server

by Manouchehri

4/3/2026 at 4:37:26 AM

Hope that person with the chest pain went to the doctor

by holden_nelson

4/3/2026 at 12:04:00 PM

by axelriet

4/3/2026 at 8:10:00 AM

Should be a high severity incident if data isoation has failed anywhere. And that is for SaaS let alone cloud provider.

by mememememememo

4/3/2026 at 8:12:40 AM

Did you anomomize those? Did Azure dox them or send the templated version?

by mememememememo

4/3/2026 at 12:39:35 PM

Azure sent them to me like that.

I only saw two companies mentioned in the messages I got back. I reached out to both to try to confirm, but never heard back.

by Manouchehri

4/6/2026 at 5:39:09 PM

[dead]

by shadkhan

4/2/2026 at 5:58:15 PM

It's a nice read. Thank you for sharing this.

> Microsoft, meanwhile, conducted major layoffs—approximately 15,000 roles across waves in May and July 2025 —most likely to compensate for the immediate losses to CoreWeave ahead of the next earnings calls.

This is what people should know when seeing massive layoffs due to AI.

by pRusya

4/3/2026 at 2:04:18 AM

I honestly thought this was one of the weaker points of the article.

The OpenAI deal almost certainly related purely to GPU capacity, which had little to do with the article. The layoffs would have happened regardless.

IMO - churn, and generalization is the root cause. Engineers are thrown on projects for a year with little prior experience, leave others to pickup the pieces, etc. There's no longer a sense of ownership, and I'm sure the recent wave of layoffs isn't helping with this.

by maxwg

4/3/2026 at 4:22:37 AM

GPUs is something that can be fixed simply by throwing money at it.

by axelriet

4/3/2026 at 7:22:10 AM

[flagged]

by hrmtst93837

4/3/2026 at 5:36:45 AM

[flagged]

by hrmtst93837

4/3/2026 at 2:43:21 PM

[flagged]

by GandalfHN

4/3/2026 at 6:50:14 AM

[flagged]

by hrmtst93837

4/2/2026 at 8:24:25 PM

"For fiscal 2025, Microsoft CEO Satya Nadella earned total pay of $96.5 million, up 22% from a year earlier." -CNBC.com

and

"I also see I have 2 instances of Outlook, and neither of those are working." -Artemis II astronaut

by schlauerfox

4/2/2026 at 9:17:00 PM

> 2 instances of Outlook

That's 2 too many.

by tantalor

4/2/2026 at 9:55:24 PM

They should have used the third outlook they didn't know about... Outlook, Outlook (new), and the well-hidden Outlook (classic) that actually works.

by bigbuppo

4/2/2026 at 10:19:41 PM

That outlook was part of the ablative outlook armor thats suppose to burn off on reentry

by cyanydeez

4/3/2026 at 12:47:08 AM

Do you have a source for that? I don’t see what impact consumer email software would have with the composition of the heat shield.

by cebert

4/3/2026 at 2:25:45 AM

I believe this one would fall under "incongruity theory"

https://en.wikipedia.org/wiki/Theories_of_humor#Incongruity_...

by anonymars

4/3/2026 at 3:35:14 AM

He's saying it's bulky junk that's best torched.

by esafak

4/3/2026 at 2:35:43 AM

its a joke, no sources required

by john_strinlai

4/3/2026 at 2:48:11 AM

Well "Outlook (new)" finally stopped OOM-ing on my very normal-sized inbox, so I went back to using it over Outlook Classic... Can't say I notice a difference much these days.

(Not a residential inbox, the "I work in IT" sized inbox with all the email alerts about jobs failing...)

by NortySpock

4/3/2026 at 4:34:19 PM

"Classic" was never very stable to begin with, and seems to be getting less stable every monthly patch cycle. Decades-old problems remain unfixed, and "new" Outlook still doesn't have all the features of the old one (or compatibility with in-house programs that use MAPI integration, or COM add-ins). "Classic" must have been such a spaghetti-fied mess that they thought they couldn't actually fix it at all and needed to replace it. But I'm not sure that's really the solution... is it ever?

by greatquux

4/3/2026 at 10:04:32 AM

https://bsky.app/profile/did:plc:jzhiqz7fb5dj6h7cydluryvn/po...

for anyone else who hasn't seen it

by pas

4/3/2026 at 6:05:46 PM

Nadella gets the money for getting Outlook onto the ship. Having it actually work would have been a bonus, sure, but it's not the goal.

by CobrastanJorji

4/3/2026 at 10:52:55 AM

Microsoft’s annual revenue ($245Bn) is 2.5x Tesla ($95Bn), and Musk was angling for a trillion dollar compensation package.

Artemis II astronaut was piloting a spaceship not a Tesla.

Makes you think.

by jodrellblank

4/2/2026 at 10:59:25 PM

Some previous colleague of mine has to work with Azure on their day to day, and everything explained in this article makes a lot of sense when I get to hear about their massive rantings of the platform.

12 years ago I had to choose whether to specialize myself in AWS, GCP or Azure, and from my very brief foray with Azure I could see it was an absolute mess of broken, slow and click-ops methodology. This article confirms my suspicions at that time, and my colleague experience.

by OldOneEye

4/2/2026 at 7:41:46 PM

> The direct corollary is that any successful compromise of the host can give an attacker access to the complete memory of every VM running on that node. Keeping the host secure is therefore critical.

> In that context, hosting a web service that is directly reachable from any guest VM and running it on the secure host side created a significantly larger attack surface than I expected.

That is quite scary

by nope1000

4/3/2026 at 3:52:19 AM

It is kind of a fundamental risk of IMDS, the guest vms often need some metadata about themselves, the host has it. A hardened, network gapped service running host side is acceptable, possibly the best solution. I think the issue is if your IMDS is fat and vulnerable, which this article kind of alludes to.

There’s also the fact that azure’s implementation doesn’t require auth so it’s very vulnerable to SSRF

by jmogly

4/3/2026 at 4:25:33 AM

You could imagine hosting the metadata service somewhere else. After all there is nothing a node knows about a VM that the fabric doesn’t. And things like certificates comes from somewhere anyway, they are not on the node so that service is just cache.

by axelriet

4/3/2026 at 8:11:45 AM

Hosting IMDS on the host side is pretty much the only reasonable way to provide stability guarantees. It should still work even if the network is having issues.

That being said, IMDS on AWS is a dead simple key-value storage. A competent developer should be able to write it in a memory-safe language in a way that can't be easily exploited.

by cyberax

4/3/2026 at 10:35:05 AM

“No, there is another”—Yoda, The Empire Strikes Back :)

What you describe carries the risk that secrets end up in crash dumps and be exfiltrated.

Imagine an attacker owns the host to some extent and can do that. The data is then on disk first, then stored somewhere else.

You probably need per-tenant/per-VM encryption in your cache, since you can never protect against someone with elevated privileges from crashing or dumping your process, memory-safe or not.

Then someone can try to DoS you, etc.

Finally it’s not good practice to mix tenant’s secrets in hostile multi-tenancy environments, so you probably need a cache per VM in separate processes…

IMHO, an alternative is to keep the VM's private data inside the VMs, not on the host.

Then the real wtf is the unsecured HTTP endpoint, an open invitation for “explorations” of the host (or the SoC when they get there) on Azure.

eBPF+signing agent helps legitimate requests but does nothing against attacks on the server itself; say, you send broken requests hoping to hit a bug. It does not matter if they are signed or not.

This is a path to own the host, an unnecessary risk with too many moving parts.

Many VM escapes abuse a device driver, and I trust the kernel guys who write them a lot more than the people who write hostable web servers running inproc on the host.

Removing these was a subject of intense discussions (and pushbacks from the owning teams) but without leaking any secret I can tell you that a lot of people didn’t like the idea of a customer-facing web server on the nodes.

by axelriet

4/3/2026 at 5:32:23 PM

Of course, putting the metadata service into its own separate system is better. That's how Amazon does it with the modern AWS. A separate Nitro card handles all the networking and management.

But if you're within the classic hypervisor model, then it doesn't really matter that much. The attack surface of a simple plain HTTP key-value storage is negligible compared to all other privileged code that needs to run on the host.

Sure, each tenant needs to have its own instance of the metadata service, and it should be bound to listen on the tenant-specific interface. AWS also used to set the max TTL on these interface to 1, so the packets would be dropped by routers.

by cyberax

4/6/2026 at 8:59:31 PM

>negligible attack surface of a simple-plain HTTP…

…unless you use a general-purpose web server with its own set of challenges as far as policies and configuration. I’ll leave it there.

by axelriet

4/3/2026 at 5:24:27 AM

Ah yes great point, awesome article by the way —- thought provoking, shocking, really crazy stuff. Hopefully some good comes of it, godspeed.

by jmogly

4/3/2026 at 2:21:43 AM

This is well documented: https://learn.microsoft.com/en-us/azure/virtual-machines/ins...

Why would an Azure customer need to query this service at all? I was not aware this service even exists- because I never needed anything like it. AFAI can tell, this service tells services running on the VM what SKU the VM is. But how is this useful to the service? Any Azure users could tell how they use IMDS? Thanks!

by dh2022

4/3/2026 at 3:22:50 AM

> Why would an Azure customer need to query this service at all? I was not aware this service even exists- because I never needed anything like it.

The "metadata service" is hardly unique to Azure (both GCP & AWS have an equivalent), and it is what you would query to get API credentials to Azure (/GCP/AWS) service APIs. You can assign a service account² to the VM¹, and the code running there can just auto-obtain short-lived credentials, without you ever having to manage any sort of key material (i.e., there is no bearer token / secret access key / RSA key / etc. that you manage).

I.e., easy, automatic access to whatever other Azure services the workload running on that VM requires.

¹and in the case of GCP, even to a Pod in GKE, and the metadata service is aware of that; for all I know AKS/EKS support this too

²I am using this term generically; each cloud provider calls service accounts something different.

by deathanatos

4/3/2026 at 3:34:12 AM

Mainly for getting managed-identity access tokens for Azure APIs. In AWS you can call it to get temporary credentials for the EC2’s attached IAM role. In both cases - you use IMDS to get tokens/creds for identity/access management.

Client libraries usually abstract away the need to call IMDS directly by calling it for you.

by jmogly

4/3/2026 at 5:25:33 AM

Thank you, and everyone else who responded. So then this type of service seems to be used by other cloud providers (AWS). What makes this Azure service so much more insecure than its AWS equivalent?

Thanks again!

[edited phrasing]

by dh2022

4/3/2026 at 6:47:09 AM

Having it running on host (!), and the metadata for all guest VMs stored and managed by the same memory/service (!!), with no clear security boundary (!!!).

It's like storing all your nuke launch codes in the same vault, right in the middle of Washington DC national mall. Things are okay, until they are not okay.

by guardiangod

4/3/2026 at 10:56:05 AM

Lovely explanation :)

by axelriet

4/3/2026 at 8:35:26 AM

I use GCP, but it also has the idea of a metadata server. When you use a Google Cloud library in your server code like PubSub or Firestore or GCS or BigQuery, it is automatically authenticated as the service account you assigned to that VM (or K8S deployment).

This is because the metadata server provides an access token for the service account you assigned. Internally, those client libraries automatically retrieve the access token and therefore auth to those services.

by Arbortheus

4/3/2026 at 4:26:24 AM

There is a bunch of things a VM needs when first starting from a standard image. Think certificates and a few other things.

by axelriet

4/3/2026 at 3:19:51 AM

Managed identity is enabled via that endpoint, for example.

by jimbobimbo

4/3/2026 at 8:57:18 AM

We run a significant amount of stuff on spot-instances (AKS nodes) and use the service detect, monitor and gracefully handle the imminent shutdown on the Kubernetes side.

https://learn.microsoft.com/en-us/azure/virtual-machines/lin...

by hydroxyethane

4/3/2026 at 3:17:03 AM

To have a new vm configure itself at boot

by lokar

4/3/2026 at 6:58:01 AM

What happens when someone asks an AI model to fuzz test that...

by h6d_100c

4/3/2026 at 8:37:54 AM

[flagged]

by hrmtst93837

4/2/2026 at 10:26:17 PM

Scary is the understatement of the day. I can't imagine the environment where someone think that architecture is a good idea.

by xorcist

4/3/2026 at 10:58:13 AM

And yet, there we are.

by axelriet

4/3/2026 at 3:13:00 AM

Instead of zero trust, it is 110% trust.

by rawgabbit

4/3/2026 at 11:25:30 PM

Well I have zero trust in Microsoft, so they've achieved that at least.

by solid_fuel

4/3/2026 at 8:15:54 AM

[dead]

by jamiemallers

4/3/2026 at 12:21:38 AM

Like, what did the OP expect?

by jldugger

4/2/2026 at 11:59:48 PM

The personal account makes a lot of sense, although I could easily see why the OP was not successful. Even if you are an excellent engineer, making people do things, accept ideas, and in general hear you requires a completely different skill altogether - basically being a good communicator.

The second thing is that this series of blog posts (whether true or not, but still believable) provides a good introduction to vibe coders. These are people who have not written a single line of code themselves and have not worked on any system at scale, yet believe that coding is somehow magically "solved" due to LLMs.

Writing the actual code itself (fully or partially) maybe yes. But understanding the complexity of the system and working with organisational structures that support it is a completely different ball game.

by _pdp_

4/3/2026 at 3:53:32 AM

I disagree.

I've worked on honing my communication skills for 20 years in this industry. Every time I have failed to get the desired result, I have gone back to the drawing board to understand how I can change how I'm communicating to better convey meaning, urgency, and all that.

After all that I've finally had an epiphany. They simply don't care. They don't care about quality, about efficiency, about security. They don't care about their users, their employees, they don't care about the long term health of the company. None of it. Engineers who do care will burn out trying to "do their job" in the face of management that doesn't care.

It's getting worse in the tech industry. We've reached the stage where leaders are in it only for themselves. The company is just the vehicle. Calls for quality fall on deaf ears these days.

by gtowey

4/3/2026 at 10:15:28 AM

yes, so situational awareness is even more fundamental than communication

especially because people hired by people hired by people (....) hired by founders (or delegated by some board that's voted by successful business people) did not get there by being engineering minded.

and this is inconceivable for most engineering minded people!

they don't care because their world, their life, their problems and their solutions are completely devoid of that mindset.

some very convincing founder types try to imitate it, some dropouts who spent a few years around people who have this mindset can also imitate it for a while, but their for them it's just a thing like the government, history, or geography, it's just there, if there's a hill they just go around, they don't want to understand why it's there, what's there, what's under it, what geological processes formed it, why, how, how long it will be there ...

by pas

4/3/2026 at 8:31:53 AM

This will explain it too you:

https://www.youtube.com/watch?v=rStL7niR7gs&list=PLInW-j_Odl...

by hunterpayne

4/3/2026 at 5:17:11 PM

Yeah, uhh:

> I've worked on honing my communication skills for 20 years in this industry.

That's because the skills weren't good enough.

by phillipcarter

4/3/2026 at 10:14:48 PM

So the takeaway isn't how good or bad I may be at communicating, it's that I was fundamentally speaking a language that was wholly orthogonal to the interests of leadership. No matter how good I became at making persuasive arguments about fixing technical debt and preventing outages, the management simply didn't care about those things. They say they they do, because it would sound insane to say otherwise, but they largely keep their goals and motivations clandestine.

Which for many engineers who got into this industry because they loved solving problems, it can be quite a shocking realization.

by gtowey

4/4/2026 at 2:47:28 AM

Which is why you both listen to what they say, and pay attention to what they do, and what they prioritise. You use the actions to figure out where they were coming from with the message, and then you adapt your message to suit that.

> Which for many engineers who got into this industry because they loved solving problems, it can be quite a shocking realization.

It's just another problem to solve, based on the same foundational skill set you develop as an engineer: Observation, interpretation, analysis, experimentation, and implementation.

All-hands meetings are boring as hell, but they'll give me all sorts of signal about various managers up the line. I'll also take any opportunity I can get to be "in the room where it happens" when decisions are made (or speak to people who were in the room) while I'm building up a mental picture of what motivates someone.

If they're glory hunters, I'll figure out how to pitch my thing as something they can brag about. If they're people oriented (rare, but it happens), I'll pitch the human impact angle. If they're money pinchers, it's all about that $/month savings figure, put it front and centre in the opening sentence.

Everyone has an angle, a bias of some description. If you watch what projects do and don't get approved, and what language was used in them, you'll be successful too.

by Twirrim

4/4/2026 at 9:59:15 AM

If I am using a service, I do not care about your communicating...I want reliability...

by pts_

4/3/2026 at 1:21:31 AM

> Even if you are an excellent engineer, making people do things, accept ideas, and in general hear you requires a completely different skill altogether - basically being a good communicator.

I was thinking like this for a while but, now, I think this expectation is majorly false for a senior individual contributor. Especially when someone who can push out a detailed series of blogposts and has tried step-wise escalation.

Communication is a two-way street. Unlike the individual contributors, the management is responsible for listening and responding to risk assesments by the senior members and also ensuring that the technical competence and experienced people are retained in a tech company. If a leader doesn't want to keep an open ear, they do not belong there. If there is a huge attrition of highly senior people from non-finalized projects, you do not belong leadership either. Both cases are mentioned in the article.

Unfortunately our socioeconomic and political culture in the West has increasingly removed responsibilities and liabilities from the leadership of the companies. This causes people with lackluster technical, communication and risk assesment mentality being promoted into leadership positions.

So outside of a couple completely privately owned companies or exceptionally well organized NGOs, it will be increasingly difficult to find good leaders.

by okanat

4/3/2026 at 2:06:08 AM

Even before vide coding this problem existed.

The truth is, only small companies build good stuff. Once a company becomes big enough, the main product that it originally started on is the only good thing that is worth buying from them - all new ventures are bound to be shit, because you are never going to convince people to break out of status quo work patterns that work for the rest of the company.

The only exception to this has been Google, which seems to isolate the individual sectors a lot more and let them have more autonomy, with less focus on revenue.

by ActorNightly

4/3/2026 at 2:48:09 AM

OP was not successful because they didn't want to fix the problems he discussed. I have been in the same exact situation, and no level of communication skills would have been successful in changing their minds.

by AmVess

4/3/2026 at 11:24:26 AM

Or they did, but they needed/wanted to do something else more.

That's usually based on either (a) more perspective, or (b) lack of foundational depth.

by Terretta

4/3/2026 at 7:14:21 PM

Maybe they didn’t have sufficient visibility at the ground level to make proper decisions.

by axelriet

4/3/2026 at 2:01:53 AM

Absolutely textbook "Brilliant Jerk". Dude just whines and whines and whines. If you're so good, why can't you get anybody to work with you?

by grensley

4/3/2026 at 2:26:22 AM

I did not get that impression at all. He mentioned quite a few conversations with partner level employees, technical fellow, principal managers.

The impression I got is he tried to fix things, but the mess is so widespread and decision makers are so comfortable in this mess that nobody wants to stick their necks out and fix things. I got strong NASA Challenger vibes when reading this story…

by dh2022

4/3/2026 at 3:27:10 AM

My read is he was not Sr enough in the org to drive any effort to improve things, and could not get someone who was to do it either.

by lokar

4/2/2026 at 10:26:33 PM

This reads pretty bad, and I believe it was. I worked on (and was at least partly responsible for) systems that do the same thing he described. It took constant force of will, fighting, escalation, etc to hold the line and maintain some basic level of stability and engineering practice.

And I've worked other places that had problems similar to the core problems described, not quite as severe, and not at the same scale, but bad enough to doom them (IMO) to a death loop they won't recover from.

by lokar

4/2/2026 at 10:21:38 PM

I had the misfortune of having to use Azure back in 2018 and was appalled at the lack of quality, slowness. I was in GitHub forums, helping other customers suffering from lack of basic functionality, incredible prices with abysmal performance. This article explains a lot honestly.

Google’s Cloud feels like the best engineered one, though lack of proper human support is worrying there compared to AWS.

by ludwigvan

4/2/2026 at 11:25:34 PM

GCP's support is abysmal. Our assigned customer support agent has changed 3 times in as many months. it's really a dice roll if our quota increase requests are even acknowledged or we can get clarification on undocumented system limitations.

by reddozen

4/3/2026 at 2:36:04 AM

I thought that about GCP until I used it more seriously and kept running into issues where they didn’t have some feature AWS had had for ages, and our Google engineers kept saying the answer was to run your own service in Kubernetes rather than use a platform service which did not give me confidence that they understood what the business proposition was.

by acdha

4/3/2026 at 8:28:59 AM

Unless you work in Alphabet's marketing department, then no GCP isn't the best one. The most reliable cloud has always been AWS by a wide margin. The exec in charge of GCP has had to apologize in public on multiple occasions for GCP's reliability problems. Sounds like they have fixed them by now (years later) but that doesn't make up the disaster that was BigQuery.

Also, GCP is more focused on smaller customers so perhaps that's the part that works for you. AWS can be a bit daunting. But AWS actually versions their APIs and publishes roadmaps and timelines for when APIs get added and retired and what you should use instead. GCP will just cancel things on short notice with no replacement.

by hunterpayne

4/3/2026 at 6:02:46 PM

I still remember the day the disks dissappeared on the prod db running on gcp.

by pragmatic

4/4/2026 at 10:52:46 AM

> Google’s Cloud feels like the best engineered one, though lack of proper human support is worrying there compared to AWS.

Also the lack of locations in general. GCP's fleet is tiny compared to both AWS and Azure

by dartharva

4/6/2026 at 2:38:30 AM

GCP's support sucks compared the quality on AWS. On everything else I find GCP very pleasant.

by ptdorf

4/3/2026 at 8:54:55 AM

Axel's engagement with the issue and refusal to give up is admirable. It also demonstrates that code and architecture remain important even in an era when managers believe these subjects can now be handled by LLMs. Imagine if LLMs were mandated for use in such an environment, further distancing SWEs from the code and overarching architectural choices. I am not saying that it can't work. But friction and maturity through experience really matters.

Also explains perfectly why I never met an engineer who was eager to run workloads on Azure. In orgs I worked, either the use of Azure was mandated by management (probably good $$ incentives) or through Microsoft leaning into the "Multi-Cloud for resilience" selling point, to get Orgs shift workloads from competitors.

Its also huge case for open (cloud) stack(s).

by einrealist

4/3/2026 at 5:56:06 PM

Back in 2011 at Fujitsu, I ran one of the earliest Azure production subscriptions outside Microsoft. Windows Azure, mid-2011. I've watched this platform for 15 years from the outside.

Part 1 barely scratches the surface. Read parts 2 through 6.

The 173 agents story, the 200 manual node interventions per day, the WireServer sitting on the secure host side with unencrypted tenant memory mixed in shared address space, the letters to the EVP, the CEO, the Board - not a single acknowledgment.

The most damning thing in this series ... except for technical debt ... is the silence at the top when someone handed them the diagnosis on a plate.

Cutler's original vision was "no human touch." The gap between that and what Azure actually became is where the trillion dollars went.

Go read the rest. It's worth it.

Meanwhile on LinkedIn, there are still comments how adorable Microsoft leadership under Satya is... a carefully crafted PR image.

by LeoStehlik

4/3/2026 at 2:34:45 PM

A tale as old as corporations. Corporate Ladders optimize for Ladder-Climbers, rather than Management Skills or Technical Skills.

Organization Design is tough. And gets even more challenging with size. Unfortunately, Org Design over time falls to those folks that rose up the ladder, rather than folks dedicated to understanding and designing orgs.

Switching from a Traditional org to an Agile one doesn't eliminate the need for thoughtful org design, it just changes the structures and incentives, and understanding and leveraging the interplay of various factors still requires unbiased organizational skills.

Mature companies will often send executives through training around organization design, but separating out the incentives that apply personally to the executives, from what they do for the company, can be challenging. So larger companies will tend to have a org design or operating model team, and very large companies will formalize this as CoE or Transformation Offices.

Still, getting that balance right can be tricky. Looks like MS failed badly in this instance. Maybe they learned from it, maybe they didn't. Judging by the way things are going with Win11, and the lack of response from the EVP, CEO and Board levels, maybe they ignored their internal folks that help with alignment, or more likely, simply laid them off!

by ninjagoo

4/3/2026 at 2:04:58 PM

All those discussions about career suicide. Are you all that afraid to do what you think is right because you could get fired?

What Axel does by coming public with his named attached is remarkable. He gains a lot of respect in my book. Even if it is one sided and details are missing

by dgellow

4/3/2026 at 5:57:25 PM

The career suicide wasn't escalating (although probably that was job suicide). The career suicide is venting and airing all of your former employee's dirty laundry. Unless your former employer is doing something deeply unethical, writing hit pieces against them after you leave is going to make you less attractive to future employers. Before this article, employers would see "experienced and available cloud engineer." After this, employers would see "backstabber who was probably fired for being a pain."

But also, this is Hacker News. Many of us work for companies that are largely making the world worse in exchange for large salaries. Many of us have, probably unconsciously, built our lives around not doing what we think is right in exchange for not getting fired.

by CobrastanJorji

4/3/2026 at 7:07:27 PM

You have all the right words but some are in the wrong order.

by axelriet

4/4/2026 at 6:00:51 PM

that's a boomer ass take the world doesn't work like this anymore

by axel479343

4/3/2026 at 5:24:23 AM

What makes anyone start a new project and think “I know, I’ll use Azure!”? I really don’t get it. Do they have a great sales org? Is it because a phb thinks “well they made Office so it must be good”?

I interviewed with a Dutch energy company migrating infra from AWS -to- Azure and I have no idea what would make them do that (aside from inertia, but then why use Azure in the first place?)

And for some reason Azure usage is rampant in Europe.

by CalRobert

4/3/2026 at 7:51:10 AM

A lot of enterprise orgs are completely helpless without Microsofts' identity solutions. That's what makes it easy to just adopt more and more Microsoft products.

by Tarq0n

4/3/2026 at 8:46:05 AM

In some places the purchasing decisions are not made by technical people. The infrastructure team gets azure budget and that's what they have to work with.

At my work the sales people regularly come to us with some azure discount they got offered on linkedin or some event. Luckily I have the power to tell them to fuck off.

by progbits

4/3/2026 at 5:28:14 AM

At one startup I was in, Azure sales proactively reached out to the CEO on LinkedIn and then we were urged to swap off to it.

by exac

4/3/2026 at 6:25:53 AM

At the startup I worked at in 2023, Azure was considered the only “safe” way to use OpenAI APIs in prod (eg agreements that the data couldn’t be used for training).

Working with Azure was one of the worst parts of that job.

by derwiki

4/3/2026 at 7:03:03 PM

> What makes anyone start a new project and think “I know, I’ll use Azure!”?

Because your org is likely already paying for O365 and "Entra ID" or whatever they call it nowadays, and so it seems like this will all integrate nicely and give you a unified system with consistent identity management across all domains. It won't - omg, believe me it will NOT - but you don't find that out until it's too late.

by 12_throw_away

4/3/2026 at 1:08:55 PM

The one place I worked that used it - got a bunch of free credits for signing up - had some license agreement for some Microsoft service (Teams Oath App or something similar) where a certain percentage of the infra had to be hosted on Azure

Don't remember the details of #2, just that they were a "Microsoft partner" of some sort which was beneficial to integrating with the Microsoft apps the product depended on and appearing as an app in the marketplace. The company built software that ingested IM/chat data from corporations (Teams and I think something older)

by nijave

4/3/2026 at 7:36:44 AM

Where I live (New Zealand) Microsoft is a much larger percentage of IT infrastructure than say Bay Areas startups.

Companies are already used to working with Microsoft. Building on Microsoft's cloud feels natural.

by slyall

4/3/2026 at 11:00:32 AM

It's CYA. Nobody ever got fired for buying IBM, the old saying went. And it was true. Perhaps they should have, but they weren't. Nowadays, Oracle and MS have taken that position. They have the "share of mind," a PR concept that unfortunately succinctly expresses the problem. Someone proposes MS or Oracle, and everybody nods because they've heard about it. If that causes problems, other people will have to solve them anyway.

by tgv

4/3/2026 at 1:35:42 PM

I have literally never met a competent person who takes MS or Oracle seriously.

I confess, I'm a little salty. It's just insane how widespread Azure is when there's no obvious reason to prefer it. Of course, having the whole market be dominated by 3 giant American companies (even in Europe) is annoying in its own right.

by CalRobert

4/3/2026 at 5:25:55 AM

They give free credit to startups if you fill in a few forms.

by jstanley

4/3/2026 at 5:30:02 AM

so does AWS and GCP... but pretty bad if that's the deciding criteria.

by CalRobert

4/3/2026 at 10:01:27 AM

Companies coming from Active Directory and Office.

by irusensei

4/3/2026 at 6:04:42 PM

Lot of SMBs run sql server and .net

lift and shift into the cloud used to be the path of last resistance on Azure.

by pragmatic

4/3/2026 at 2:38:36 PM

I work for a 300+B company that spends nearly $1b a year on AWS.

Microsoft engaged in a relentless romance campaign with our loser EVP and one of his reports for months giving him the cool LinkedIn post opportunities that weak executives crave.

Eventually he started pushing engineering to move to Azure.

We have not yet (many bullets dodged so far) but it’s there and a periodic major time sink entirely due to manipulation and flattery.

The entire “multicloud” push was a marketing effort by Microsoft to try and undermine exec faith in their “what? No, that’s a shit ton of work with zero return on investment” engineering teams.

by foobiekr

4/3/2026 at 3:37:41 PM

I have no doubt Azure sucks, but almost all huge projects like that have systemic issues.

Axel sounds like a pretty smart guy, but wanted to point out I've seen this kind of behavior before, often from mid-level "job-hopping" engineers (sometimes with overly inflated egos) that overconfidently declare everything the organization is doing is BS and they have the magic solution to it.

And yes, sometimes by sending long winded emails to very large internal groups about how their solution will address all the problems if only someone recognize their genius (and eventually give them a VP title and budget). Some of the time, they are well intended but missing crucial historical knowledge about why things are in the state they are and why what they're proposing was tried 5 times before and failed.

by gigel82

4/3/2026 at 1:44:19 AM

Power Platform is of the same quality, I’d avoid it if possible.

I was a principal engineer in the Power Platform org and it always felt like a disorganized mess. Multiple reorganizations per year, changing priorities and service ownership.

by throwawayslop12

4/3/2026 at 5:46:27 AM

These days, at work, I need to support applications build on Azure and Power Platform. Both are a hot mess. We get notifications that our APIM is down for at least 15min every weeks at random times. Power Platform is just a "preview" mess, things break and are not functional.

I complained about it and basically was told to shut up, the industry is using them, so they must be right.

No one is testing anything anymore.

by Foobar8568

4/3/2026 at 6:36:36 AM

It's a bit astounding to realize Ballmer was good.

by jojobas

4/3/2026 at 7:21:05 AM

What were the issues behind "APIM down"?

by truekonrads

4/3/2026 at 11:55:51 AM

Nothing has changed.

by snarfy

4/3/2026 at 2:47:47 AM

I was a career Microsoft stack developer until Azure. Comparing it to AWS immediately forced me to make a decision to move away from their stack and towards AWS.

Just the networking and security infrastructure was complete trash compared to how those things worked in AWS.

Not one regret in my decision.

by ChicagoDave

4/3/2026 at 5:32:21 AM

We run 1000s of machines in Azure. It's garbage. Very few features work. Nodes are always having strange issues, especially on the networking side. And the worst part is that Azure support has 0 interest in actually debugging things. We just got out of an outage today caused by the insanely slow SSDs that they attach to their postgres dbs by default.

by nosefrog

4/3/2026 at 7:58:10 PM

Throwaway since I may want to work at Microsoft again one day.

Given my own experience at Azure I believe all of this. The post demonstrates there are serious management and structural issues throughout a large part, if not all, of the organization. And it definitely sheds some light on my experiences with the networking platform being so fragile and unreliable.

This post lends credence to the idea that large companies only care about security just enough to either not get compromised, or “just” to get mildly compromised. Defense in depth costs too much in management’s eyes, and they consider it a more wise use of resources to patch the holes after they’re made rather than prevent them in the first place.

Thanks to the author for sharing, and I hope your subsequent role is more enjoyable. It feels like the only way to make the structural changes being suggested is to climb the corporate ladder to accumulate sufficient power plus social and political capital, and then get buy in to painstakingly steer that behemoth of an organization in a safer and more sustainable direction.

by throwaway0703

4/3/2026 at 11:10:07 AM

I highly sympathize with the author and as a former user of Azure I agree it's a terrible mess.

However, the author has committed magnificent career suicide. If you are in a dysfunctional environment you don't go from issue to issue and escalate each one, proactively finding problematic issues.

You rather find the underlying issues (e.g. crashes not assigned) prioritize them and fix them.

By constantly whistle blowing on separate issues to as high as the board, he is not trying to improve by evolution but by revolution and in revolutions heads roll

by breppp

4/3/2026 at 7:46:06 PM

The timeline and facts were quite different. Debating an org-wide quality issue on a 100+ member team's alias is not whistleblowing.

by axelriet

4/3/2026 at 2:20:27 AM

The only time I used Azure was for setting up Microsoft as a provider for authentication. Put me through a never-ending loop of asking for a Government of India issued document that was already submitted. Human support was non-existent. Decided never to use Azure in any product after that horrible experience.

If you cannot even get auth right I shudder to think what the rest of the product will be like to deal with should issues arise.

by kshri24

4/3/2026 at 11:57:12 AM

Personally, when asking others about their opinions on various cloud providers, AWS tends to emerge head and shoulders above the rest for one simple reason - AWS works.

And the reason AWS works is that AWS runs on AWS (in stark contrast to Azure and GCP which afaik is not what MS and Google use internally). And when AWS doesn't work, support is there to help you.

To add nuance to this statement, the other providers have their own strengths and standout features, but if you have to approach every single one of their features with suspicion that means you wont build stuff on top of them.

by torginus

4/3/2026 at 1:16:01 PM

I've also noticed AWS tends to have less "magic" global services and tends to favor cell architecture with partitions and isolation.

These super duper magic global services seem to be the cause of most outages since the blast radius is so huge.

On the other hand, the proposition of a magic, infinitely scaleable service endpoint is nice from a developer perspective.

by nijave

4/3/2026 at 1:44:49 PM

Even on AWS, if you go for the managed magic version of the thing, they'll make you pay more, lose some flexibilitym and the relinquished control will change things in a way that benefits AWS (slower scaling, limitations, unnecessary overprovisioning, overhead).

An example - if you scale things manually by provisioning and starting EC2 instances via API - it will be more performant and cheaper than either Lambda or ECS Fargate (or Batch...). But those things at least work reliably.

With the other two cloud providers, you'll likely run into a bug you cannot fix yourself, and you will have no support to help you.

by torginus

4/3/2026 at 12:04:01 AM

from part 2:

> Worse, early prototypes already pulled in nearly a thousand third-party Rust crates, many of which were transitive dependencies and largely unvetted, posing potential supply-chain risks.

Rust really going for the node ecosystem's crown in package number bloat

by arccy

4/3/2026 at 1:32:38 AM

Rust is nowhere close to Node in terms of package number bloat. Most Rust libraries are actually useful and nontrivial and the supply chain risk is not necessarily as high for the simple reason that many crates are split up into sub-crates.

For example, instead of having one library like "hashlib" that handles all different kinds of hashing algorithms, the most "official" Rust libraries are broken up into one for sha1, one for sha2, one for sha3, one for md5, one for the generic interfaces shared by all of them, etc... but all maintained by the same organization: https://github.com/rustcrypto/

Most crypto libraries do the same. Ripgrep split off aho-corastick and memchr, the regex crate has a separate pcre library, etc.

Maybe that bumps the numbers up if you need more than one algorithm, but predominantly it is still anti-bloat and has a purpose...

by dralley

4/3/2026 at 4:19:33 AM

While i agree the exact line “rust libraries are useful and non-trivial” i have heard from all over the place as if the value of a library is how complex it is. The rust community has an elitist bent to it or a minority is very vocal.

Supply chain attacks are real for all package registries. The js ones had more todo with registry accounts getting hacked than the compromised libraries being bad or useless.

by rustystump

4/3/2026 at 9:56:34 AM

I am sensing a "is-odd" and "is-even" vibes from that approach.

by egorfine

4/4/2026 at 6:27:56 AM

Not at all.

Most programs only use one or a few hash functions, so grouping each family into a separate crate reduces compliation time for the majority of users. Could also help when auditing the removal of vulnerable hash functions.

As for ripgrep, the organization is quite sensible:

1. one crate to define an interface for regex matchers

2. one crate to implement the native matcher

3. one crate to implement the PCRE2 matcher

4. one crate to define a safe interface to the underlying PCRE2 library

Depending on the application, any one of 1+2+3+4, 1+2, 1+3+4, or 4 alone could be useful.

by jasomill

4/4/2026 at 10:44:59 AM

Yes I perfectly understand the reasoning and technically it is sound.

It becomes insane once you start thinking of real life implications, specifically supply chain attacks.

Although it's only marginally more insane than the other ecosystems.

by egorfine

4/5/2026 at 2:58:55 PM

The alternative is sometimes that people just copy and paste code from libraries that never gets updated.

by dralley

4/3/2026 at 12:38:15 AM

It really is about time that somebody do something about it.

Start with tokio. Please vend one dependency battery included, and vendor in/internalize everything, thanks.

by Aperocky

4/3/2026 at 1:38:52 AM

There is a difference between individual packages coming out of a single project (or even a single Cargo workspace) vs them coming out of completely different people.

The former isn't a problem, it is actually desirable to have good granularity for projects. The latter is a huge liability and the actual supply chain risk.

For example, Tokio project maintains another popular library called Prost for Protobufs. I don't think having those as two separate libraries with their own set of dependencies is a problem. As long as Tokio developers' expertise and testing culture go into Prost, it is not a big deal to have multiple packages. Similarly different components of the Tokio itself can be different crates, as long as they are built and tested together, them being separate dependencies is GOOD.

Now to use Prost with a gRPC server, I need a different project: tonic which comes from a different vendor: Hyperium. This is an increased supply chain risk that we need to vet. They use Prost. They also use the "h2" crate. Now, I need to vet the code quality and the testing cultute of multiple different organizations.

I have a firm belief that the actual People >>> code, tooling, companies and even licensing. If a project doesn't have (or retain) visionary and experienced developers who can instill good culture, it will ship shit code. So vetting organizations >> vetting indiviual libraries.

by okanat

4/3/2026 at 3:01:56 PM

This write-up is a shining example of why I’ve been rebuilding my business slowly away from Microsoft technology. Entra as IdP is one of the last projects. I’m probably not going to escape Exchange Online, but I’m going to be happy to finally federate the tenant to our internally managed IdP.

My spouse’s employer mandated that everyone move off AWS “because they’re a competitor” (they’re absolutely not), and Microsoft was happy to roll out discounts for Azure.

To say that has gone poorly would be generous. Azure is impressive in its own right, but it’s not comparable to AWS. (Which has its own problems, to be clear.)

The stagnation in Azure is apparent everywhere you look. The capacity issues have only gotten worse. There are still change advisory callouts in the Azure Portal with dates in the year 2020.

by TheNewsIsHere

4/3/2026 at 9:47:00 AM

On a leadership level it seems problematic that they ghosted the feedback. Direcly this leads to people like Axel who feel ownership of the problem to break NDAs and create company harming posts. In my experience they at least respond with corp speak platitudes meaning that they got the feedback and don't understand it or ignore it, but have been taught to always ask for feedback and answer it (but incentives are to ask for feedback, then ignore it).

by luke5441

4/3/2026 at 10:06:45 AM

To be honest, I don’t think this is “company harming”—what would be harming is Azure being pwned if they didn’t know and did nothing, or failing SLA at the wrong time. Now they know.

by axelriet

4/3/2026 at 4:13:11 PM

The ultimate goal is to make customers spend money on Azure. Of course the information you published may make customers less likely to choose Azure, harming Microsoft.

Being pwned can be explained away as an attacker having spent a lot of ressources to do so. Failing SLAs can be a calculated gamble.

I myself am grateful you published this! It gives a great inside view on what is going on in big tech in general and Microsoft specifically.

by luke5441

4/3/2026 at 7:28:47 PM

Well, what you describe is plausible, but it would not be a good long-term strategy and is certain to backfire badly at some point.

Then imagine your systems are key support systems with deep implications in government and the military, and the path you outline is not acceptable.

Onboarding new customers on a sinking ship is dishonest at best, criminal at worst.

So yes, I maintain that it helps more than it harms.

by axelriet

4/3/2026 at 10:34:01 AM

Azure has been repeatedly hacked very severely, and it doesn't seem to make much difference to their adoption.

by mike_hearn

4/3/2026 at 12:42:50 PM

I knew Microsoft was incredibly dysfunctional (you have to understand this if you're supporting their suite and want to succeed), but damn, I'm floored by the incompetence reported on from juniors to the Board and seemingly every step of leadership in between.

Yet I'm also not surprised, because I keep encountering it in non-Microsoft orgs. The current crop of leadership in general seems to be so myopically focused on GTM and share price bumps that even the mere suggestion of a problem is a career-ending move for whoever reported it (ask me how I know). Making matters worse is that Boards and shareholders have let them get away with this for so long, across every major org, that these folks believe in their heart and soul that they're absolutely, infallibly correct. The higher up someone is in an organization, the higher the likelihood they'll reject any and all feedback from "beneath" them that is contrary to their already-decided-upon agenda.

The kicker is that I'm not sure how to actually deal with this in a way that minimizes pain. In my subjective experience, these sorts of companies simply do not change until and unless there's literally no other option other than failure - and then, they're likely to choose failure for the parachute selection instead of doing the hard work of reform. Maybe what's needed is for Microsoft (or any of the legion of similarly dysfunctional enterprises out there) to genuinely fail in a non-recoverable way so as to shock the wider industry/economy into taking serious action on corporate misgovernance.

Maybe failure is the best option.

I don't know. I just know that this isn't tenable.

by stego-tech

4/3/2026 at 12:50:15 PM

> Maybe what's needed is for Microsoft (or any of the legion of similarly dysfunctional enterprises out there) to genuinely fail in a non-recoverable way so as to shock the wider industry/economy into taking serious action on corporate misgovernance.

The naive model of capitalism says that the benefit of market competition is that it's possible for failing companies to get out-competed by non-failing ones. In practice, there's enough of a combination of "natural monopoly", lock-in effects, and anti-competitive practices that the software landscape is covered in companies that are too big to avoid, let alone too big to fail.

by pjc50

4/3/2026 at 1:35:14 PM

That's what I've been trying to impart on folks for a decade, now. The lack of regulations has let apex predators capture the environment, and short of an environmental collapse (as in, the sudden and permanent destruction of compute in general that makes their business unrecoverable), the only solution is hunting the hunters - i.e., government regulations, monopoly breakups, market penalties, etc.

There is no feasible way for someone to out-compete Microsoft, Apple, Google, or Oracle. None. They have to fail in some capacity to a significant, global-economy-harming degree to even provide an opening to competition in the marketplace. Even if AI turned out to be a huge nothingburger tomorrow, they'd still be unassailable.

That is the problem.

by stego-tech

4/3/2026 at 2:21:34 AM

My most memorable anecdote from working in Azure is that they had two products named Purview and the internal MS people I talked to never figured out which one I was trying to use.

by bradleyankrom

4/3/2026 at 9:58:31 AM

Astronauts have the same problem now.

by egorfine

4/3/2026 at 11:26:40 AM

The "no one at Microsoft, not a single soul, could articulate why up to 173 agents were needed to manage an Azure node" really stuck with me. You have to wonder how many other parts of the code lack ownership and are in there just because no one knows what will happen if you take them out.

by TomMasz

4/3/2026 at 1:43:21 PM

This reminds me of discussions of the “MinWin” initiative back in the Windows 7 timeframe, and how the obstacle was that nobody actually knew what you could take out of Windows and still have it work, so they had to be conservative.

by mananaysiempre

4/3/2026 at 12:50:29 AM

I was always very curious why people are using azure. Clunky difficult to setup and crazy prices. I know a person being very happy with them because of the credits they gave it to him. I felt I probably don't have a model that explains what is going on there and that would be cool to know why people pay them vs the competion

by Frannky

4/3/2026 at 12:59:17 AM

In my experience Azure endpoint versus openAI endpoint was way faster and significantly cheaper.

by diamondage

4/3/2026 at 1:16:42 AM

Well, part 3 at least explains something I've observed; the platform is incredibly unstable. The same calls, with the same parameters, will often randomly fail with HTTP 400 errors, only to succeed later(hopefully without involving support). That made provisioning with terraform a nightmare.

I won't even dive too much into all the braindead decisions. Mixing SKUs often isn't allowed if some components are 'premium' and others are not, and not everything is compatible with all instances. In AWS, if I have any EBS volume I can attach it to any instance, even if it is not optimal. There's no faffing about "premium SKUs". You won't lose internet connectivity because you attached a private load balancer to an instance. Etc...

At my company, I've told folks that are trying to estimate projects on Azure to take whatever time they spent on AWS or GCP and multiply by 5, and that's the Azure estimate. A POC may take a similar amount of time as any other cloud, but not all of the Azure footguns will show themselves until you scale up.

by outworlder

4/2/2026 at 11:28:38 PM

So this is why GitHub is having so many problem…

by abtinf

4/3/2026 at 12:20:13 PM

Having now read the six parts, I assume the same management issues, and junior devs all over the place, are the reason why Windows development has become a mess, and Project Reunion went nowhere sane, leaving only Windows employees to care about WinUI 3 and WinAppSDK.

If only we had a return of netbooks, meaning OEMs finally embracing GNU/Linux on consumer stores, instead of being left to technically minded aware of online stores.

by pjmlp

4/3/2026 at 3:56:20 AM

> Few engineers could reliably build the software locally; debugger usage was rare (I ended up writing the team's first how-to guide in 2024); and automated test coverage sat below 40%.

A key clue and explains why so much of what Microsoft puts out is garbage. Wow.

by Yoofie

4/3/2026 at 5:39:53 PM

TBH user as beta tester is probably an official policy. Why debug when someone debugs for you?

by markus_zhang

4/2/2026 at 8:58:56 PM

Title: How Microsoft Vaporized a Trillion Dollars

by gnabgib

4/3/2026 at 4:46:17 AM

As an investor, this is exact how I feel. Everything was skyrocketing until OpenAI “diversified” mid-2025. The company’s market value has dropped more than 1 trillion since late October 1025, so the title is factual. You can rightfully argue and be skeptical about the link I make, but not about the numbers :)

by axelriet

4/3/2026 at 4:47:01 AM

OK, *2025

by axelriet

4/3/2026 at 3:22:23 PM

This is pretty damning, if half of it is true. I don’t work at Microsoft and I don’t have the knowledge to judge the reliability of Azure, but I do have friends who work as users of Azure and the words are not kind, especially the new Fabric database which is said to be crazy to pick for production at this stage — while MSFT switched the certification to Fabric already, pushing its customers to use it.

I’ll never work in a company that uses Azure as its main cloud services, just for the sake of quiet nights.

I do wonder what does it look like inside AWS and GCP, though. Is it the same level of chaos, but just because they started early they got more success? If that’s the case, maybe we can conclude, that very large cloud operation is not sustainable under the current company structure — because either the technical knowledge required is too dense, but companies won’t be able to retain workers, or because companies are forced to join the horn of the marketeers, eventually.

by markus_zhang

4/3/2026 at 4:56:17 PM

This article is like a cockroach in a restaurant dining room. Azure has one, GCP/AWS does not.

by panzagl

4/3/2026 at 7:42:00 PM

The thing with cockroaches is that if even a single one is seen in the dining room and someone calls environmental health, regardless of the restaurant's prestige, they close it with immediate effect until they get their act together and a food sanitation inspection clears them.

At the end, everyone feels better, in particular the customers.

by axelriet

4/3/2026 at 6:31:02 PM

Awaiting for the AWS/GCP one...

by markus_zhang

4/3/2026 at 12:48:33 PM

Using Azure has severely affected my mental health over the last year. Reading these comments has been therapeutic.

by dec0dedab0de

4/3/2026 at 11:29:21 AM

I see that it's fashionable to bash everything MS related in HN, but let's not pretend that the other major cloud providers don't have their own problems (e.g. https://www.ft.com/content/7cab4ec7-4712-4137-b602-119a44f77... or https://blog.barrack.ai/google-gemini-api-key-vulnerability/). We have had a couple of critical services hosted on Azure over ten years already, call me lucky, but we haven't had any major incidents. That said, the AI Foundry side is broken garbage at the moment, but so is also AI stuff from other providers.

by mrsmrtss

4/3/2026 at 1:20:10 PM

Their VMs and load balancers mostly work. Their managed services are a crapshoot. We routinely "self hosted" at the company that used Azure to ensure some semblance of stability.

For instance, our Patroni clusters were much more performant and stable than Azure Single Server Postgres (what a terrible product...).

by nijave

4/3/2026 at 2:30:52 PM

Maybe this is why they retired Single Server PostgreSQL and are now offering only the new Azure Database for PostgreSQL (flexible server). Zero problem with the latter for us so far.

by mrsmrtss

4/2/2026 at 7:15:44 PM

What a fascinating view into how the sausage is made

by Bjartr

4/3/2026 at 2:53:21 AM

This read was a blast from the past. I'm not going to comment on much from OP and instead give a little of my experience there.

Straight out of college in 2017 I joined the Compute Fabric Controller (FC) org as a SWE on an absolutely wonderful team that dealt with mostly container management, VM and Host fault handling & repair policies, and Fabric to Host communication with most of our code in the FC. I drove our team's efforts on the never ending "Unhealthy" node workstream, the final catch-all bucket in the Host fault handler mentioned in OP. I also did heavy work in optimizing repair policies, reactive diagnostics for improved repairs and offline analysis, OS and HW telemetry ingestion from the Host like SEL events into the repair manager in real time, wrote the core repair manager state machine in the new AZ level service that we decoupled from the Fabric, drove Kernel Soft Reboot (KSR)/Tardigrade as a repair action for minimal VM impact for some host repairs, and helped stand up into eventually owning a new warm path RCA attribution service to help drive the root underlying causes of reliability issues and feed some offline analysis back into the live repair manager.

The work was difficult but also really really interesting. For example, Balancing repair policies around reliability is tricky. There's a constant fight in repair policies in grey situations between minimizing total VM downtime vs any VM interruptions/reboots/heals at all, because the repair controller doesn't have perfect information. If telemetry is pointing to VMs being degraded or down on the host, yet in reality they're not, we are the ones inducing the VM downtime by performing an impactful repair. If we wait a little while before taking an impactful repair action, it may be a transient issue that will resolve itself in the moment, at which point we can do much less impactful repairs after like Live Migration if the host is healthy enough. On the flipside, if some telemetry is saying the VMs are up yet they're down in reality and we just don't know it yet, taking time to collect diagnostics and then take a repair action(s) leads to only more overall total downtime.

When I joined in 2017 our team was 7 or 8 people including myself, yet had enough work for at least double that amount of people. On-call was a nightmare the first 2 years. Building Azure back then was like trying to build a car while already sitting behind the steering wheel of that car as it was already barreling down the highway. Everyone on my immediate team the first couple years were a joy to work with, highly competent, hard working, and all of us working absurd hours. For me 60hrs/wk was avg, with many weeks ~80 and a few weeks ~100. Other than the hours though, it was a splendid team environment and I'd like to think we had good engineering culture within our team, though maybe I'm biased. Engineering culture and quality did however vary substantially between orgs and teams. We were heavily under resourced and always needed more headcount, as did nearly every other team in Azure Compute. That never changed during my tenure even though my team's size ballooned to ~20 by 2020, and eventually big enough to where we had to split the team. There was high turnover from the lack of headcount and overwork which was somewhat alleviated by lowering the hiring bar... which obviously opened up another can of worms. This resourcing issue might explain, in part, why Azure is the way it is. We were always playing catchup as a result of the woes of chronic understaffing for years. I eventually burnt out which turned into spiraling mental health, physical health issues, constant panic attacks, and then a full blown mental health crisis after which I took LOA and eventually left the company. I came back briefly for a bit during LOA, and learned that the RCA service I'd built with the help of a coworker (who also left Azure) and was only a small part of our overall workload, had turned into a full fledged team of 9 people dedicated to working on that service in my absence. I know that stating some of this might affect my employment in the future but I don't really care. I know I'm not alone in experiencing burnout working in Azure. It wasn't my manager's fault either, he was amazing. He'd often ask and I would incorrectly yet confidently reassure him that I wasn't burning out but I simply didn't notice the signs. Things are better now though and I'm just happy to be here.

Kudos to the many brilliant people I worked alongside there, I hope you're all doing great.

by jamwhite

4/3/2026 at 6:44:44 AM

2 years of 60+ hours weeks is not good engineering culture, or any kind of culture.

by jojobas

4/3/2026 at 12:12:30 PM

Particularly when simultaneously "We are currently in YC Startup School as Geddy at geddy.io and we plan to launch soon."

by guenthert

4/4/2026 at 12:08:51 AM

> There was high turnover from the lack of headcount and overwork which was somewhat alleviated by lowering the hiring bar...

Seen this game played before, at AWS working on the control plane for outposts. The correct solution here is dedicated operations staff to coordinate with the team and let the developers fast track issues that are resulting in high call volumes, not lowering the hiring bar for the entire team. The problem you run into with high call volumes and small teams is that it disrupts most developers enough that they can't build solutions and deal with the maintenance burden at the same time. You bleed talent because it places way more stress than necessary on the team.

by solid_fuel

4/3/2026 at 9:07:33 AM

The first and most important lesson, that I try to each every young developer starting in the industry: Go home after clocking in your hours negotiated in your contract. Drop your pen. Go home. Sleep well.

And I hope, that every sensible senior developer in here does the same. Lead by example. Maybe it would prevent a few burnouts in this industry.

And if you are a manager, then send your people home after they have clocked in their negotiated hours. For their own well-being. It’s your responsibility. And if it’s not working, then force them to go home.

I hope you are better by now and got through the tough time. All the best for you!

by bytefish

4/5/2026 at 4:06:37 PM

The important nuance: you need to start going home at the correct time on day one. You can't start doing that when you feel overwhelmed already as the expectations have been solidified already.

The corollary is that you also need to show up on time and put in honest effort during.

by jnsaff2

4/3/2026 at 4:39:20 PM

Good luck with that when you’re oncall.

by senderista

4/4/2026 at 12:27:53 AM

On call should go against negotiated hours like 1/3 or 1/4. 3-4 8-hour shifts on call = day off. A single shift requiring active firefighting = day off.

by jojobas

4/3/2026 at 1:26:17 PM

>There was high turnover

This is a huge knowledge drain. You're constantly spending time getting new engineers up to speed and it takes years to relearn all the nuances the last person knew.

You're in a constant cycle of re-learning the hard way instead of proactively applying experience.

by nijave

4/3/2026 at 12:54:33 PM

you have a current US president who has never read the Art of War.

likely most company leadership, besides Hedge fund managers, have never read the Art of War either.

this results in management that lacks a strategic focus - they want to win the next battle (down in a valley, while giving the enemy an upper hand to be on the hill).

your infantry (low-level ICs) are smart and capable - and the org is actively pursing means to deskill them via some shit called (A.I) - your colonels (mid-management) are comfortable in their laurels since anyone who raises a voice is shown the door (hell most of them manage people now & don't fight anymore)

then you wonder why the country, the org is losing. but hey at least we posted a massive valuation.

by dzonga

4/3/2026 at 12:58:29 PM

Why read the Art of War when you wrote (or had ghost-written for you) the Art of the Deal?

by steve1977

4/3/2026 at 8:52:22 AM

I’ve been working with Azure and Azure Germanyfor the past years and have a strong history with AWS.

I cannot count how many times disks were not attaching during AKS rescheduling. We build polling where we polled Entra Id for minutes until it became “eventually” consistent - not trusting a service principal until it was fetched at least one minute consistently. The slowness of Azure Functions was unbearable. On Azure germany IoT Hubs had to be “rebooted” by support constantly - which was a shocking statement in itself. The docs always lying or leaving out critical parts. The whole Premium vs Standard stuff is like selling windows licenses. The role model and UI is absolutely inconsistent.

The stability, consistency of IAM, and speed of AWS in comparison makes me truly wonder how anyone stays with Azure. One reason might be that the Windows instances are significantly cheaper though..

by alex0ptr

4/3/2026 at 2:37:58 PM

So GitHub migrating entirely to Azure is going to reduce the number of outages they are experiencing, right? Right??

by botulidze

4/3/2026 at 3:02:21 PM

Nope.

https://bsky.app/profile/justingarrison.com/post/3mig2gikibs...

by joezydeco

4/3/2026 at 2:28:23 AM

> Few engineers could reliably build the software locally

I've just listened to Longhorn story on Monday and have heard the same thing.

by sakesun

4/3/2026 at 6:05:58 AM

Could you link the story by any chance? I've been using Longhorn for a while and on one particular system, it has an odd tendency to corrupt XFS.

by acheong08

4/3/2026 at 12:25:59 PM

Probably this one https://www.youtube.com/watch?v=RpRZ8BQiiMo

by sakesun

4/3/2026 at 5:50:15 AM

I have been in a Microsoft adjacent company (meaning lots of people bounced to and from Microsoft to it) and all this makes a lot of sense. The almost ideological “everything in house” and politically oriented philosophy they had fits like a glove. Some of the ex Microsoft people hated it, some of them missed it. But the picture they made was pretty bleak.

Given how windows is going what’s described in the article doesn’t seem so shocking either. Even though they need not be correlated products, I can’t help but seeing a similar shortsightedness in the playbooks they are adopting.

by Dansvidania

4/4/2026 at 2:57:59 AM

The problem started because Azure was initially designed and released in a huge rush because Microsoft was so far behind AWS and needed something better.

I am reminded of the research finding that every human-designed complex system that works well started with a simple system that did did just one thing well, and new functions were added one at a time, with each one perfected before moving on to another. Which is the exact opposite of what happened here.

by belviewreview

4/3/2026 at 1:57:18 AM

> That entire 122-strong org was knee-deep in impossible ruminations involving porting Windows to Linux to support their existing VM management agents.

> My day-one problem was therefore not to ramp up on new technology, but rather to convince an entire org, up to my skip-skip-level, that they were on a death march.

> I later researched this further and found that no one at Microsoft, not a single soul, could articulate why up to 173 agents were needed to manage an Azure node

This is most corporates. I'm sure this was celebrated as as a successful project and congratulations to everyone, along with big bonuses, RSU, raises, and promotions, mostly to other orgs to bring this kind of 'success' to other projects (or other companies). These people mostly are gone in less than 2 years. They continue to take 'wins'.

The VPs are dumb as shit, but they need 'successful' projects that have fancy names that they can present to their exec team.

The 173 agents are to give wins to a large number of people and teams, all these people contributed to this successful project.

If it continues, there will be a lessons learned powerpoint, followed by 10x growth in headcount, promotions to everyone and double down. 270 people can deliver a baby in 1 day and all that.

by thelastgallon

4/3/2026 at 2:15:52 AM

In part 2

> This group was now tasked with moving their inherited stack to the new Azure Boost accelerator environment, an effort Microsoft had publicly implied was well underway at Ignite conferences since 2023.

The goal is to attach your projects to something announced by the CEO and ride the career rocketship!

by thelastgallon

4/3/2026 at 1:31:38 PM

Can almost guarantee there were 173 agents because there were 173 silo'd teams with competing goals and priorities working on their own codebases in isolation.

And no, a 174th team doesn't solve it. Communication and collaboration across teams is key

by nijave

4/2/2026 at 9:58:08 PM

This is an insanely blunt look into some serious issues with microsoft.

by acedTrex

4/3/2026 at 4:56:18 PM

> Furthermore, I contributed to brainstorming the early Overlake cards in 2020-2021, drafting a proposal for a Host OS <-> Accelerator Card communication protocol and network stack, when all we had was a debugger’s serial connection. I also served as a Core OS specialist, helping Azure Core engineers diagnose deep OS issues.

What exactly are these "Overlake accelerator cards"? What are they accelerating?

by palmotea

4/3/2026 at 5:04:05 PM

Courtesy of Google: https://glennklockwood.com/garden/Azure-SmartNIC

by layer8

4/3/2026 at 8:27:31 AM

We signed up to go all-in on Azure because our CEO got an xbox to take home to his kids.

by xxxboxxx

4/3/2026 at 2:29:00 PM

Azure Functions have been solid for us. No real weird downtimes and if something happens its usually because we did something wrong.

We don't do very complicated things, mainly App Services with Azure SQL and Azure Functions.

Having said that, Microsoft did botch the .NET 8 -> .NET 10 migration for Azure Functions with Consumption Plan. So yeah ... we're beginning to see some of the cracks.

by CodeCompost

4/3/2026 at 12:43:01 AM

I just do not understand how Azure has the scale it does. You only need to login and click around for a bit to see this is not a coherent system designed by competent people. Let alone try and actually build something on it.

Who are the customers? Who is buying this shit?

by plantain

4/3/2026 at 1:20:47 AM

From my old experience in IT - people just default to Microsoft for everything. They don't want to hassle with learning anything else and assume better the devil you know. Glad I'm out of that world but its wild what people will put up with.

by jmuguy

4/3/2026 at 1:00:41 AM

Microsoft shops. Lots of C# devs gravitate to it naturally. I’m glad I abandoned the MS stack over a decade ago.

by zthrowaway

4/3/2026 at 6:40:12 AM

.NET Core runs just as well on ECS though. And C# tooling is rock solid in VS Code on Mac. No need to touch Azure or Windows.

by matt123456789

4/3/2026 at 2:08:42 AM

People and organizations that built things on top of Microsoft tech. Especially with a long history going back to NT times.

HN, YC, startup environment or academia is a Unix bubble. They all feed into each other. Especially because Linux is gratis which helped all of those to deploy projects/products/papers cheaply. Unix systems traditionally lack much of the upper layers, so it is the responsibility of the company, persons, developers to deal with the OS minutea. You need sysadmins, devops, SREs. Those are common roles again in this Unix bubble. The dependency chains here are usually flatter since it keeps mid-term costs lower.

Other organizations like governments and bigger orgs like banks prioritize having somebody else liable (i.e. they can blame) and they prefer to not hire technical competence in their orgs but rely on other companies. This is where Microsoft gets a lot of clients. You buy a bunch of server licenses. Your Microsoft support person installs them and installs IIS via GUI. And then you just upload your code every now and then. The OS updates, IIS server etc are all the responsibility of Microsoft and the middlemen companies. Minimal competence from the orginal org is required. There are multiple middlemen businesses who all give zero fucks about anything but whatever the immediate downstream from them. This is more usual in already publicly traded huge businesses. Moreover the investors actually mandate certain things that only this kind of layers of irresponsibility can deliver :) So you see this kind of switch happening towards IPOs.

Azure is the cloud labeling and forcing the first paradigm over the second paradigm for Microsoft products. It got lots of support because shareholders liked it. I don't think the original NT design and Microsoft's business model was bad, it actually worked very well. However, shareholders gonna shareholder. So they pushed hard for Microsoft and their clients to move to the "cloud". Microsoft executives saw the huge profit and share value potential of pushing Azure the brand too. It was the AI of 2010s afterall.

by okanat

4/3/2026 at 12:49:36 AM

The VPs who think that they got a good deal by combining with o365

by whatever1

4/3/2026 at 5:21:18 PM

> You only need to login and click around for a bit to see this is not a coherent system designed by competent people

Ironically, the book "Hit Refresh" hit a nerve that every azure web-page has a refresh button. Isnt that dating back to web 1.0 ?

by bwfan123

4/3/2026 at 1:33:10 PM

Google and especially Amazon/AWS compete with a lot of large companies which drive them towards Oracle, IBM and Microsoft as escape hatches.

For instance, Walmart doesn't want to pay their largest competitor.

by nijave

4/3/2026 at 11:38:51 AM

If you put me in front of AWS I'd have the same reaction. Or GCP for that matter, where I did have your reaction.

It's familiarity and knowing how the beast operates. I know how to read the docs and understand the licensing.

Any one piece of software could be a pile of shit with a terrible UX, but you're going to find those who are so familiar with it that everything else looks alien.

by p_ing

4/3/2026 at 6:31:40 AM

Because for some it works. At least I haven't heard the stories I see here yet at my workplace. Also I use some Azure, but apart from some weird UI bugs never had real big issues.

by accountofthaha

4/3/2026 at 5:48:18 PM

If you are a Microsoft shop then most likely you are on Azure. Your CFO would love the costs saved.

by markus_zhang

4/3/2026 at 6:20:18 AM

No idea but I think it's in half or more of the job ads I see in the Netherlands. I don't get it.

by CalRobert

4/3/2026 at 3:26:40 PM

My comment history here is full of complaints about MS Teams, the chat app. It suffers from the "re-use every existing MS tech" problem. Building it on top of SharePoint I'm pretty sure resulted in it's top problems over the years (some fixed): - search sucked - can't scroll back to old messages - couldn't do private channels - the very concept of 1:1 teams to SharePoint site, resulting in a millions teams when all you really wanted was a channel - can't rename teams or channels - couldn't do private channels

I'm sure many more I didn't catch. These are all observations from outside, I've never worked at MS

by pwarner

4/4/2026 at 6:30:58 PM

I read all 6 posts. OP is genuine and very talented. Unfortunate how foundational issues fall through the cracks in orgs.

by bg24

4/3/2026 at 1:26:25 PM

"Risk aversion preventing fixes" is the most accurate part. I've seen this at other large companies too. You have a known bug, you know exactly how to fix it, but nobody will approve the change because "what if it breaks something else." So the bug stays forever and everyone just works around it. The irony is that the workarounds eventually cause more breakage than the fix ever would have.

by kangraemin

4/3/2026 at 2:56:35 PM

The "too risky to deploy" problem is really a visibility problem. When you can't quickly see what's actually changing in a deploy, fear becomes the default. The teams that break out of this aren't the ones who stop shipping, they're the ones who build better signals before the deploy so engineers can ship with confidence instead of just hoping nothing breaks.

by robshippr

4/3/2026 at 5:53:54 PM

If, like me, you started reading and after a while started thinking “Wait, how many parts is this going to be in‽”, the answer is “six”.

by jrmg

4/3/2026 at 9:23:56 AM

> was maintaining in-memory caches containing unencrypted tenant data, all mixed in the same memory areas, in violation of all hostile multi-tenancy security guidelines

Splitting caches to different isolated memory areas will not make shareholders happy, will not lead to promotion and will not even move the project forward.

Simply put, designing secure software is detrimental in that environment.

by egorfine

4/3/2026 at 11:52:32 AM

I’m not an expert and surprised by the extent of Azure’s technical debt and its consequences. What would be a “minimal” reproducible configuration or setup of services that shows those technical deficiencies in the clearest way? A “benchmark for cloud computing services”, for a lack of a better description.

by mif

4/4/2026 at 6:02:56 PM

Does this mean AWS's #1 position is safe?

Every big cloud provider has its share of UX/stability/customer support issues.

At this point, it feels less like AWS is the 'least bad' option because alternatives are even worse.

by wenbin

4/3/2026 at 11:27:56 AM

Reading through this reminded me of just how engulfed in acronyms and lingo MS engineers must be. Much like AWS engineers with an acronym for every service that gets thrown around with the assumption of understanding, I felt like I needed a dictionary of those just to understand what was going on.

by mathgeek

4/4/2026 at 2:54:05 AM

I have always wanted to find some technical refutes, and I found one on reddit.

https://www.reddit.com/r/programming/comments/1sbir8j/commen...

I'll skip the other comments and focus on the technical ones:

> there are hundreds of “agents” which run on a one time basis to install systems as part of deployment architecture. These agents often amount to pretty simple scripts or programs. They most often run one time per update deployment, or if nodes are repaved. Some install small daemons. It’s called micro service architecture. Guy claims to be some cloud wiz but doesn’t get these basics.

> That said He’s put cutlers original work on a pedestal, when fabric controller should have been replaced a decade ago. The monolithic nature of fabric has been a huge issue for reliability and scalability, and the company is trying its hardest to move as many features out of it into microservices as it can.

I'm wondering if OP can answer this refute? Looks like the person is working in a neighbor team. No offense intended but I'm really curious about the technical part.

by markus_zhang

4/3/2026 at 2:58:20 PM

As far as I know, you still cannot rename a resource. Insanity.

I don't even work with it that much and have a laundry list of complaints about the weird little edge cases or funky pieces of documentation required to make things work.

by Hasz

4/3/2026 at 8:25:21 AM

Thanks for that, now I have a rock-solid argument when people say "oh we're already Microsoft customers, we'll just use Azure, it's easier, and they have Active Directory!!"

by physhster

4/3/2026 at 6:25:45 PM

Great series of articles and completely believable. My first thought after reading is I hope the author doesn’t get sued for violating his non-disclosure agreement.

by drob518

4/3/2026 at 8:25:50 PM

When the CEO is a product person and you need a platform:

by philipallstar

4/3/2026 at 2:35:07 AM

Microsoft Azure has always been a clown show. I've found so many obvious bugs. The quality is not there and never will be. No serious companies rely on it. Use virtually any other vendor or host it yourself.

by purpleidea

4/3/2026 at 9:44:38 AM

I tried to use Azure once (more than 5 years ago), and the signing up kept crashing on me for hours. Never used it again since then. Some things are obvious.

by auggierose

4/3/2026 at 1:01:23 PM

Some of this reads like parody, for example: "Cutler’s intent was to produce a system with the same level of quality, unshakable reliability, and attention to detail he was famous for in his work on VMS and NT."

I'm not really here to take shots at Dave Cutler, but Windows NT was not known for it's unshakeable reliability. If it's known for anything, it is known for lacking any basic security measures. I remember demonstrating to people who joined my WiFi network that I could automatically obtain remote shells on their laptop.

by sidewndr46

4/3/2026 at 5:16:20 PM

> I'm not really here to take shots at Dave Cutler, but Windows NT was not known for it's unshakeable reliability.

NT itself (the kernel and native mode APIs) is pretty well designed and implemented, in my opinion. I know there were findings from fuzzing kernel and native mode APIs in early versions of NT, but by about the Windows 2000-era it was pretty solid.

Win32 and the grown-up mess of APIs around it I'm less enthused with. NT itself is very impressive to me.

My fever-dream OS is an NT kernel with a modern and updated Interix subsystem as the main subsystem, with Win32 as a compatibility layer.

by EvanAnderson

4/3/2026 at 9:57:53 AM

For some reason, MS is still doing well. I’m not sure what conclusions I should draw from that, other than big businesses are hard to kill?

by manmal

4/3/2026 at 7:26:13 AM

This makes it extra silly to trust that Github won't train on your private repos, if they haven't already - just by accident

by h6d_100c

4/3/2026 at 6:09:48 AM

I've worked in Windows for many many years, no idea who this guy is. He is randomly name dropping. He wants attention.

by elankart

4/3/2026 at 9:40:12 AM

and what's with the parents? for running containers?

by pas

4/6/2026 at 4:48:02 PM

Is there a point where you step back and let it blow up so that management HAS to take the issues seriously? In other words, if you care, you might get enough minor fixes through to keep the product limping along instead of letting it become a big enough bonfire that management has to address it?

I suppose that’s the point where you look for a job elsewhere. But maybe if you stuck around you would get the satisfaction of finally fixing the broken stuff (if you were still employed after the disaster that is). Wishful thinking?

by montjoy

4/4/2026 at 8:54:24 AM

To me that text reads a lot like an affidavit supporting a Qui Tam suit ?

by phkamp

4/3/2026 at 9:59:29 AM

At this point, it’s very clear that people nowadays choose Rust mostly to be part of the cult rather than clearly understanding its shortcomings and advantages over languages such as C, C++. It has gotten to the point that some devs after watching a YouTube video criticizing C++ for two hours, announce C++ the worst programming language. Unfortunately, such people become decision makers at giant tech companies too.

by nalekberov

4/3/2026 at 12:26:37 AM

I've said it before and I'll say it again. I'm glad rust has good package management I really am. However given that aspect, it ends up forming a dependency heavy culture. In situations like this it's hard to use dependencies due to the amount of transitive dependencies some items pull in. I really which this would change. Of course this is a social problem so I don't expect a good answer to come of this....

by vsgherzi

4/3/2026 at 12:38:58 AM

Environment is part of the package management. As it stand, it's better than npm only because it is in rust.

by Aperocky

4/3/2026 at 1:05:44 AM

That bar is screwed to the floor.

by jacquesm

4/3/2026 at 6:55:24 PM

We say in Poland that the fish rots from the head down!

by gitowiec

4/3/2026 at 8:53:55 AM

Substack is having its moment. First, deepdelver, now this.

by debarshri

4/3/2026 at 9:11:05 AM

> Cutler’s intent was to produce a system with the same level of quality, unshakable reliability, and attention to detail he was famous for in his work on VMS and NT.

I'm not sure whether this is serious or irony.

by egorfine

4/3/2026 at 11:18:59 AM

Search VMS stability, I think the consensus is clear.

Then Google VMS longest uptime, and the record is 28 years. VMS often achieved five nines over 10 years (99.999%) so no irony.

He took a bunch of folks with him from DEC to Microsoft to make NT, and of course his principles.

Nowadays NT is bomb-proof believe it or not.

Most of the crashes are in device drivers and some rare times in the UI code (Win32k) that should not be there, but the kernel itself is solid.

(Yes I am a big fan)

by axelriet

4/3/2026 at 8:44:59 PM

I remember reading "Showstoppers" and David was quoted to say "If you break the build I'm the lawn mower and your ass is grass". Do you think such attitude is mandatory for good kernel level code?

(I actually think it does and argued with people on HN, although I never wrote any professional kernel code myself)

by markus_zhang

4/3/2026 at 11:29:41 PM

Aside: If you liked "Showstopper" you should give a listen to the Computer History Museum's oral history interview with David Cutler: https://www.computerhistory.org/collections/catalog/10271716...

by EvanAnderson

4/4/2026 at 12:04:44 AM

Thanks, I have done that one, and the other from Dave’s garage.

by markus_zhang

4/3/2026 at 11:30:47 AM

VMS, yes. No doubts.

NT, no. Again, no doubts.

by egorfine

4/3/2026 at 5:07:40 PM

This smells of someone's Clawd writing something deceptively, much like the other semi-viral content that landed on reddit related to DoorDash systems.

by phillipcarter

4/3/2026 at 8:18:49 AM

This reads like Google culture too...

by physhster

4/3/2026 at 9:52:05 AM

"isolveproblems", really?

by goodpoint

4/3/2026 at 7:06:38 AM

i run fastapi APIs on linode with cloudflare in front and honestly the simplicity is underrated. predictable billing, docs that match reality, no surprise platform regressions. for a straightforward API workload the hyperscaler tax doesn't make sense unless you genuinely need their scale

by momo_dev

4/3/2026 at 11:41:32 AM

Hate to break it to ya, you picked an emerging hyperscaler:

https://www.sdxcentral.com/news/cloudflare-has-the-edge-in-h...

by Terretta

4/3/2026 at 8:49:55 PM

i guess the difference is i chose my hyperscalers à la carte instead of getting the all-in-one bundle. at least when cloudflare breaks something i can still ssh into my linode and debug it directly

by momo_dev

4/3/2026 at 1:35:39 PM

Linode is also owned by Akamai...

by nijave

4/3/2026 at 3:31:31 PM

Is it just me or does this describe most of Microsoft software at the moment? I tried to sign into my personal microsoft account to setup an oauth flow and I was greated by an infinity repeating error dialog about some internal service that had failed.

At work, I use outlook. The number of times I've gotten caught in an auth loop where I enter again and again my creds + tfa only for the screen to flicker and start all over again.

Complete garbage.

by barelysapient

4/3/2026 at 2:19:45 AM

"The company formalized the idea that defects could be fixed through human intervention on live production systems" (From Part 5).

Uh...yeah. I think we all realized that years ago.

by g_host

4/3/2026 at 5:01:36 AM

Great but then you tie your growth to the support people headcount. Normally you would see enormous costs upfront for R&D and bringing the thing up, then marginal costs when adding capacity (the hardware, mostly)—if capacity is proportional to the number of humans looking after the system, you will soon hit a limit, and the cost won’t look good either.

by axelriet

4/2/2026 at 9:34:52 PM

What an epic takedown.

Microsoft should have promoted this guy instead of laying him off.

Did Microsoft really lose OpenAI as a customer?

by brcmthrowaway

4/3/2026 at 5:08:09 AM

The answer to your question is in the public releases. MS went from primary partner (under ROFR) to one of the options. They retain IP rights and API hosting, although in recent weeks we learned that OpenAI was planning a workaround with AWS and Microsoft said they might sue them for that. So the happy marriage is over, it’s more like a custody battle now: https://www.reuters.com/technology/microsoft-weighs-legal-ac...

by axelriet

4/3/2026 at 2:55:12 AM

til: there’s individuals/people that "trusted" azure at all

I only used that shit platform because some Microsoft consultant convinced idiotic C-suite that Azure was the future.

by xyst

4/3/2026 at 2:09:23 PM

Explains GitHubs terrible uptime I guess …

by unsubtlecoder

4/3/2026 at 2:19:15 AM

"The company formalized the idea that defects could be fixed through human intervention on live production systems"

Uh...yeah. I think we all realized that years ago.

by g_host

4/2/2026 at 4:00:27 PM

A former Azure Core engineer’s 6-part account of the technical and leadership decisions that eroded trust in Azure.

by axelriet

4/2/2026 at 10:31:44 PM

What's your assessment of AWS and GCP? Do you think it's likely they suffer from some of the same issues (eg the manual access of what should be highly secure, private systems, the instability, the lack of security)?

by ninininino

4/2/2026 at 11:53:01 PM

As a former GCP engineer, no, the systems are not generally unstable or insecure.

There is definitely manual access of data - it requires what was termed “break glass” similar to the JIT mechanism described by the author. However, it wasn’t quite so loose; there were eventually a lot of restrictions on who could approve what, what access you got after approval, and how that was audited.

It was difficult to get into the highest sensitivity data; humans reviewed your request and would reject it without a clear reason. And you could be 100% sure humans would review your session afterwards to look for bad behavior.

I once had to compile a large list of IP addresses that accessed a particular piece of data to fulfill a court order. It took me days of effort to get and maintain the elevated access necessary to do this.

I have a lot of respect for GCP as an engineering artifact, but a significantly less rosy opinion of GCP as an organization and bureaucratic entity. The amount of wasted effort expended on engaging with and navigating the bureaucracy is truly mind-boggling, and is the reason why a tiny feature that took a day to code could take months to release.

by rybosome

4/3/2026 at 12:09:23 PM

Insecure is a curious word as it entangles with what is or isn't known, more than informs about design.

A different way to put it is GCP architecture has made different tradeoffs. For example favoring operability over confidentiality*, or scalability over integrity.

This makes sense from its mono-tenant engineering origins. Those were the right calls. Google exported SRE not SecEng.

Frankly, for most cloud customers, it's what they need.

---

* Take this break glass process. It arguably shouldn't be possible. If clients need their CSP to be "NSL proof", unable to leak corporate info responding to a national security letter (or any less obligatory rationale) without the corporation knowing, GCP is not their cloud. CSPs mostly consider it more difficult than it's worth to design a cloud offering that can be proven unable to provide a client's data. On the contrary, customers yell if CSP can't restore lost data, like Apple users yell if Apple can't restore iCloud. iCloud Advanced Security is what happens when you build clients the choice -- witness the warnings.

Support drives design choices, not security.

by Terretta

4/2/2026 at 10:50:01 PM

Why do you speak about yourself in the third person?

Also, after this:

https://news.ycombinator.com/item?id=20341022

You continued to work at Microsoft and now there is this takedown?

I'm no friend of MS (to put it very mildly) but it seems to me your story is a bit inconsistent as well as the 7 year break between postings.

by jacquesm

4/3/2026 at 12:12:44 AM

The comment comes from the input field on the post form. Not clear it would show up as a comment. The old thread you refer to had little to do with Microsoft per se. Let me known if I can help with the inconsistencies you mention?

by axelriet

4/2/2026 at 11:35:53 PM

> Why do you speak about yourself in the third person?

When you submit a link to HN, there is an entry field for text in addition to the url.

It does not really describe what the text is used for. For links, the content of that field is simply added as the first comment.

Someone who is unfamiliar with the submission process may assume this field should describe what they are submitting, and not format it like a comment.

Then that text gets posted as the first comment and tons of people downvote it, jumping to the conclusion that the weird summary comment is from an AI, and not the submitter describing their own submission.

(I also assumed these comments were AI until someone else pointed this out)

by netruk44

4/3/2026 at 12:04:04 AM

Could not have said it better myself. Thanks.

by axelriet

4/3/2026 at 12:44:32 AM

AH! Thanks, that's useful context!

by dnw

4/2/2026 at 9:40:20 PM

I downvoted this comment for sounding like a summarizing LLM, not adding anything substantial beyond the title of the post, before realizing you were the poster and author.

by AceJohnny2

4/3/2026 at 12:05:16 AM

I didn’t know that “subtitle” would appear as first comment.

by axelriet

4/3/2026 at 12:37:08 AM

huh, i didn't realise that's what that does either

by dijksterhuis

4/6/2026 at 5:02:43 AM

[dead]

by mleonhard

4/3/2026 at 9:11:06 AM

[dead]

by summitwebaudit

4/3/2026 at 3:46:20 PM

[flagged]

by solguarddev

4/3/2026 at 12:50:55 AM

[flagged]

by patrickRyu

4/3/2026 at 4:41:51 PM

[dead]

by tonylewislondon

4/3/2026 at 2:47:31 AM

[dead]

by ryguz

4/4/2026 at 3:34:23 PM

[dead]

by razkaplan

4/3/2026 at 5:53:06 AM

[dead]

by JackSmith_YC

4/3/2026 at 12:59:45 AM

TLDR: It turns out that Nadella despite being an engineer is actually quite bad at managing engineering. Who would have thought?

by gamblor956

4/3/2026 at 1:36:56 AM

I thought he was a PM.

by tuan

4/4/2026 at 10:39:53 AM

There is a video floating where he demoed Excel, and his title was Technical Marketing Manager. Maybe he pivoted to PM later?

https://youtu.be/ddHDclRfzxM

Base education, he’s an EE.

by axelriet

4/4/2026 at 9:47:12 PM

Now you can share this with anyone who says, “AI will make software faulty/buggy/unstable/garbage/whatever” - it won’t; people are perfectly capable of handling it themselves. /s

by dryarzeg

4/2/2026 at 11:30:51 PM

[flagged]

by ok123456

4/3/2026 at 1:01:00 AM

[flagged]

by shmoil

4/3/2026 at 1:36:09 AM

pardon?

by Natfan

4/2/2026 at 10:53:30 PM

The first couple of paragraphs felt like a parody of a guy who goes to a diner and gets upset the waitress doesn’t address him as Dr.

It didn’t get any better.

by pavlov

4/2/2026 at 11:02:22 PM

His writing style is fairly over the top (he is Swiss, and I have seen this before, but not most of the time), but most of the technical content seems true to me.

by lokar

4/3/2026 at 5:09:39 AM

In all fairness, you are right :)

by axelriet

4/3/2026 at 2:16:08 PM

IME, liechtensteiners are more Swiss than the Swiss

by lokar

4/4/2026 at 2:44:13 AM

Nah ;)

by axelriet

4/3/2026 at 2:58:08 AM

When things must be shipped quickly, shit breaks and corners are cut; large orgs are full of disfunction. Not sure if such insight was worth of setting your own career on fire.

by jimbobimbo

4/2/2026 at 9:40:05 PM

Any complex system - and these cloud systems must be immensely complex - accumulate cruft and bloat and bugs until the entire thing starts to look like an old hotel that hasn’t been renovated in 30 years.

by andrewstuart

4/2/2026 at 11:15:37 PM

It’s not inevitable. Absolutely this is true without significant effort, but if you’ve been around the traps for long enough (in enough organisations), you get to see that the level of quality can vary widely. Avoiding the mud-pit does require a whole org commitment, starting from senior leadership.

This story is more interesting, in my opinion, in how quickly things devolved and also how unwilling the more senior layers of the org were to address it. At a whole company level, the rot really sets in when you start to lose the key people that built and know the system. That seems to be what’s happening here, and it does not bode well for MS in the medium term.

by lll-o-lll

4/3/2026 at 10:14:01 AM

This reads like it was written by the Cleverest Person in the Room. I have to use Azure Devops at work, and some of the critique of Azure rings true for me, but the author-centric presentation was quite off-putting.

by andyjohnson0

4/3/2026 at 11:08:34 AM

Sorry you felt like that.

by axelriet