2/2/2026 at 10:02:47 PM
Looks like Azure as a platform just killed the ability for VM scale operations, due to a change on a storage account ACL that hosted VM extensions. Wow... We noticed when github actions went down, then our self hosted runners because we can't scale anymore.Information
Active - Virtual Machines and dependent services - Service management issues in multiple regions
Impact statement: As early as 19:46 UTC on 2 February 2026, we are aware of an ongoing issue causing customers to receive error notifications when performing service management operations - such as create, delete, update, scaling, start, stop - for Virtual Machines (VMs) across multiple regions. These issues are also causing impact to services with dependencies on these service management operations - including Azure Arc Enabled Servers, Azure Batch, Azure DevOps, Azure Load Testing, and GitHub. For details on the latter, please see https://www.githubstatus.com.
Current status: We have determined that these issues were caused by a recent configuration change that affected public access to certain Microsoft‑managed storage accounts, used to host extension packages. We are actively working on mitigation, including updating configuration to restore relevant access permissions. We have applied this update in one region so far, and are assessing the extent to which this mitigates customer issues. Our next update will be provided by 22:30 UTC, approximately 60 minutes from now.
by llama052
2/2/2026 at 11:07:01 PM
They've always been terrible at VM ops. I never get weird quota limits and errors in other places. It's almost as if Amazon wants me to be a customer and Microsoft does not.by bob1029
2/2/2026 at 11:47:48 PM
Amazon isn't much better there. Wait until you hit an EC2 quota limit and can't get anyone to look at it quickly (even under paid enterprise support) or they say no.Also had a few instance types which won't spin up in some regions/AZs recently. I assume this is capacity issues.
by dgxyz
2/3/2026 at 12:15:45 PM
Quota limits are much less stupid than thisby direwolf20
2/3/2026 at 1:52:38 AM
The cloud isn’t some infinite thing.There’s a bunch of hardware, and they can’t run more servers than they have hardware. I don’t see a way around that.
by paulddraper
2/3/2026 at 6:16:25 AM
Indeed, but many people were led to believe so.by kavalg
2/4/2026 at 2:46:14 AM
I guess account limits would be surprising then :)by paulddraper
2/3/2026 at 4:38:51 AM
I was surprised hitting one of these limits once, but it wasn't as if they were 100% out of servers, just had to pick a different node type. I don't think they would ever post their numbers, but some of the more exotic types definitely have less in the pool.by ApolloFortyNine
2/3/2026 at 8:50:17 AM
If you work at AWS in a technical role you can check the capacity of each pool in each AZ using an internal tool. Previously the main reason for pool exhaustion was automated jobs at the start of each working day as well as instance slotting issues (releasing a 4xl but only re-allocating a l means you now cannot slot another 4xl).by theMMaI
2/4/2026 at 12:24:41 AM
Yeah heard of this happening once too - I think someone at work was trying to spin up a few of some really old instance type.by jamesfinlayson
2/3/2026 at 10:25:12 AM
Really prefer Hetzner in this sense because they actually talk about limits. I recently got myself a hetzner account (after shilling it for so much, hearing positivity, I felt like it was time for me to discover it)I wanted to try out the most cheapest option out of frugality & that was actually limited (but kudos to them that they mentioned that these servers have limits) so no worries I went and picked the 5.99 euro instead of the 3.99 euro option instead.
They also have limits option itself as a settings iirc and it shows you all the limits that are imposed in a transparent manner and my account's young so I can't request for limit increases but after some time, one definitely can.
Essentially I love this idea because essentially Cloud is just someone's else's hardware and there is no infinitium. But I feel as if it can come pretty close with hetzner (and I have heard some great things about OVH and have a good personal experience with netcup vps but netcup's payments were really PITA to setup]
by Imustaskforhelp
2/3/2026 at 12:21:02 PM
Hetzner is a dedicated server (meaning monthly contract, 1 month setup fee and up to 1 week delivery time) company that branched out into cloud, so it's not that surprising they treat cloud a bit like that. While Amazon wants you to think they have an infinite capacity pool, and any failure to get a server is an unexpected error, Hetzner seems to not hide they have a finite number of servers in a finite number of racks, since that's how their main business works.by direwolf20
2/3/2026 at 4:03:54 PM
I guess its understandable now the reasons why Amazon might want to do this.Similar to hetzner, I haven't used OVH but does it also have limits or how do they follow?
Out of pure curiosity, Is there anything aside from the three hyperscaler trifecta which doesn't show limits too?
by Imustaskforhelp
2/3/2026 at 5:52:55 PM
Nobody really shows their global limits including Hetzner. Hetzner doesn't, like, call it a secret internal error when they run out of capacity of a type.by direwolf20
2/2/2026 at 11:12:29 PM
Agreed...I've been waiting for months now to increase my quota for a specific Azure VM type by 20 cores. I get an email every two weeks saying my request is still backlogged because they don't have the physical hardware available. I haven't seen an issue like this with AWS before...by arcdigital
2/2/2026 at 11:23:46 PM
We've ran into that issue as well, ended up having to move regions entirely because nothing was changing in the current region. I believe it was westus1 at the time. It's a ton of fun to migrate everything over!That’s was years ago, wild to see they have the same issues.
by llama052
2/3/2026 at 12:22:30 PM
Can someone explain the point of cloud like I'm a 60 year old grumpy Unix admin because you could just get a real server from another company by now. If the whole point is unlimited capacity but you don't have unlimited capacity and you're paying through the nose then why? Compliance?by direwolf20
2/3/2026 at 2:01:30 PM
Compliance and tooling are a big part of it, but the places where the big public cloud providers shine is the PaaS offerings that you don't need to write yourself.In Azure, for example, it's possible to use Entra as your Active Directory, along with the fine grained RBAC built in to the platform. On a host that just gives you VPS/DS, you have to run your own AD (and secondary backups). Likewise with things like webservers (IIS) and SQL Server, which both have PaaS offerings with SLAs and all the infra management tasks handled for you in an easily auditable way.
If you just need a few servers at the IaaS level, the big cloud platforms don't look like a great value. But, if you do a SOC2, for example, you're going to have to build all the documentation and observability/controls yourself.
by briHass
2/4/2026 at 12:26:27 AM
At my day job, serverless stuff is great because in a small team with limited budget we don't need extra people to deal with patching, fail-overs etc.by jamesfinlayson
2/3/2026 at 8:54:04 AM
Is your mental model they are running FCFS or priority allocation?by PeterStuer
2/2/2026 at 11:20:57 PM
It's awful. Any other service in Azure that relies on the core systems seems to have issues trying to depend on it, I feel for those internal teams.Ran into an issue upgrading an AKS cluster last week. It completely stalled and broke the entire cluster in a way where our hands were tied as we can't see the control plane at all...
I submit a severity A ticket and 5 hours later I get told there was a known issue with the latest VM image that would create issues with the control plane leaving any cluster that was updated in that window to essentially kill itself and require manual intervention. Did they notify anyone? Nope, did they stop anyone from killing their own clusters. Nope.
It seems like every time I'm forced to touch the Azure environment I'm basically playing Russian roulette hoping that something's not broken on the backend.
by llama052
2/3/2026 at 6:48:18 AM
It's nice to buy responsibility when it's upheld, else you're just trading your money for the inability to fix things.by lillecarl
2/3/2026 at 1:53:41 AM
How is Azure still having faults that affect multiple regions? Clearly their region definition is bollocks.by everfrustrated
2/3/2026 at 4:20:35 AM
All 3 hyperscalers have vulnerabilities in their control planes: they're either single point of failure like AWS with us-east-1, or global meaning that a faulty release can take it down entirely; and take AZ resilience to mean that existing compute will continue to work as before, but allocation of new resources might fail in multi-AZ or multi-region ways.It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.
by ragall
2/3/2026 at 4:30:38 AM
> It means that any service designed to survive a control plane outage must statically allocate its compute resources and have enough slack that it never relies on auto scaling. True for AWS/GCP/Azure.That sounds oddly similar to owning hardware.
by tbrownaw
2/3/2026 at 5:06:50 AM
In a way. It means that you can get new capacity most often, but the transition windows where a service gets resized (or mutated in general) has to be minimised and carefully controlled by ops.by ragall
2/3/2026 at 4:50:18 AM
This outage talks about what appears to be a VM control plane failure (it mentions stop not working) across multiple regions.AWS has never had this type of outage in 20 years. Yet Azure constantly had them.
This is a total failure of engineering and has nothing to do with capacity. Azure is a joke of a cloud.
by everfrustrated
2/3/2026 at 5:02:54 AM
AWS had an outage that blocked all EC2 operations just a few months ago: https://aws.amazon.com/message/101925/by mirashii
2/4/2026 at 12:28:15 AM
Yeah I remember one maybe four years ago? Existing workloads were fine but I had to go and tell my marketing department to not do anything until it was sorted because auto-scaling was busted.by jamesfinlayson
2/3/2026 at 6:11:17 AM
This was the largest AWS outage in a long long time and was still constrained to a single AWS region.Which is my point.
The same fault on Azure would be a global (all-regions) fault.
by everfrustrated
2/3/2026 at 5:04:49 AM
I do agree that Azure seems to be a lot worse: its control plane(s) seems to be much more centralized than the other two.by ragall
2/2/2026 at 11:56:17 PM
Their AI probably hallucinated the configuration changeby flykespice