5/21/2025 at 8:11:14 PM
Make sure you set GOMAXPROCS when the runtime is cgroup limited.I once profiled a slow go program running on a node with 168 cores, but cpu.max was 2 cores for the cgroup. The runtime defaults to set GOMAXPROCS to the number of visible cores which was 168 in this case. Over half the runtime was the scheduler bouncing goroutines between 168 processes despite cpu.max being 2 CPU.
The JRE is smart enough to figure out if it is running in a resource limited cgroup and make sane decisions based upon that, but golang has no such thing.
by __turbobrew__
5/21/2025 at 8:17:05 PM
Relevant proposal to make GOMAXPROCS cgroup-aware: https://github.com/golang/go/issues/73193by xyzzy_plugh
5/22/2025 at 5:42:41 AM
Looks like it was just merged btw.by robinhoodexe
5/21/2025 at 8:42:51 PM
This should be automatic these days (for the basic scenarios).https://github.com/golang/go/blob/a1a151496503cafa5e4c672e0e...
by yencabulator
5/21/2025 at 10:31:42 PM
This is probably going to save quadrillions of CPU cycles by making an untold number of deployed Go applications a bit more CPU efficient. Since Go is the "lingua franca" of containers, many ops people assume the Go runtime is container-aware - it's not (well not in any released version, yet).If they'd now also make the GC respect memory cgroup limits (i.e. automatic GOMEMLIMIT), we'd probably be freeing up a couple petabytes of memory across the globe.
Java has been doing these things for a while, even OpenJDK 8 has had those patches since probably before covid.
by formerly_proven
5/22/2025 at 8:56:40 AM
As long as I admit respecting cgroup's setting is a good thing, I am not sure it's really quadrillions.Or is it? Need calculations
by kunley
5/22/2025 at 3:26:53 PM
I would've expected it to be either way too much or way too little, but after doing the math it could be sorta in the right ballpark, at least cosmically speaking.Let's go with three quadrillion (which is apparently 10^15), let's assume a server CPU does 3 GHz (10^9), that's 10^6, a day is about 100k seconds, so ~ten days. But of course we're only saving cycles. I've seen throughput increase by about 50% when setting GOMAXPROCS on bigger machines, but in most of those cases we're looking at containers with fractional cores. On the other hand, there are many containers. So...
by formerly_proven
5/23/2025 at 9:18:34 AM
Nice reasoning, thanks.Hey, but what did you have in mind with regard to bigger machines? I think we're talking here about lowering GOMAXPROCS to have in effect less context switching of the OS threads. While it can bring some good result, a gut feeling is that it'd be hardly 50% faster overall, is your scenario the same then?
by kunley
5/21/2025 at 11:24:01 PM
GOMEMLIMIT is not as easy, you may have other processes in the same container/cgroup also using memory.by mappu
5/21/2025 at 8:48:17 PM
uh isn't that change 3 hours old?by jasonthorsness
5/21/2025 at 8:59:37 PM
Oh heh yes it is. I just remembered the original discussion from 2019 (https://github.com/golang/go/issues/33803) and grepped the source tree for cgroup to see if that got done or not, but didn't check when it got done.As said in 2019, import https://github.com/uber-go/automaxprocs to get the functionality ASAP.
by yencabulator
5/21/2025 at 10:50:52 PM
I honestly can’t count on my fingers and toes how many times something very precisely relevant to me was brought up or sorted out hours-to-days before I looked it up. And more often than once, by people I personally knew!Always a weird feeling, it’s a small world
by williamdclt
5/21/2025 at 9:16:02 PM
super-weird coincidence but welcome, I have been waiting for this for a long time!by jasonthorsness
5/22/2025 at 2:17:15 AM
Trying to see if Rust and Tokio have the same problem. I don't know enough about cgroups to be sure. Tokio at this line [1] ends up delegating to `std::thread::available_parallelism` [2] which says> It may overcount the amount of parallelism available when limited by a process-wide affinity mask or cgroup quotas and sched_getaffinity() or cgroup fs can’t be queried, e.g. due to sandboxing.
[1] https://docs.rs/tokio/1.45.0/src/tokio/loom/std/mod.rs.html#...
[2] https://doc.rust-lang.org/stable/std/thread/fn.available_par...
by 01HNNWZ0MV43FF
5/22/2025 at 9:34:53 AM
Probably not?The fundamental issue comes down to background GC and CPU quotas in cgroups.
If your number of worker threads is too high, GC will eat up all the quota.
by nvarsj