3/9/2026 at 10:36:48 AM
One thing I'm curious about here is the operational impact.In production systems we often see Python services scaling horizontally because of the GIL limitations. If true parallelism becomes common, it might actually reduce the number of containers/services needed for some workloads.
But that also changes failure patterns — concurrency bugs, race conditions, and deadlocks might become more common in systems that were previously "protected" by the GIL.
It will be interesting to see whether observability and incident tooling evolves alongside this shift.
by devrimozcay
3/9/2026 at 3:02:45 PM
This is surely why Facebook was interested in funding this work. It is common to have N workers or containers of Python because you are generally restricted to one CPU core per Python process (you can get a bit higher if you use libs that unlock the GIL for significant work). So the only scaling option is horizontal because vertical scaling is very limited. The main downside of this was memory usage. You would have to load all of your code and libraries N types and in-process caches would become less effective. So by being able to vertically scale a Python process much further you can run less and save a lot of memory.Generally speaking the optimal horizontal scaling is as little as you have to. You may want a bit of horizontal scaling for redundancy and geo distribution, but past that vertically scaling to fewer larger process tend to be more efficient, easier to load balance and a handful of other benefits.
by kevincox
3/9/2026 at 5:25:02 PM
> The main downside of this was memory usage. You would have to load all of your code and libraries N types and in-process caches would become less effective.You can load modules and then fork child processes. Children will share memory with each other (if they need to modify any shared memory, they get copy-on-write pages allocated by the kernel) and you'll save quite a lot on memory.
by philsnow
3/9/2026 at 5:29:17 PM
Yes, this can help a lot, but it definitely isn't perfect. Especially since CPython uses reference counting it is likely that many pages get modified relatively quickly as they are accessed. Many other GC strategies are also pretty hostile to CoW memory (for example mark bits, moving, ...) Additionally this doesn't help for lazy loaded data and caches in code and libraries.by kevincox
3/9/2026 at 12:32:04 PM
For big things the current way works fine. Having a separate container/deployment for celery, the web server, etc is nice so you can deploy and scale separately. Mostly it works fine, but there are of course some drawbacks. Like prometheus scraping of things then not able to run a web server in parallel etc is clunky to work around.And for smaller projects it's such an annoyance. Having a simple project running, and having to muck around to get cron jobs, background/async tasks etc. to work in a nice way is one of the reasons I never reach for python in these instances. I hope removing the GIL makes it better, but also afraid it will expose a whole can of worms where lots of apps, tools and frameworks aren't written with this possibility in mind.
by matsemann
3/9/2026 at 5:22:41 PM
> observability tooling for Python evolvingAs much as I dislike Java the language, this is somewhere where the difference between CPython and JVM languages (and probably BEAM too) is hugely stark. Want to know if garbage collection or memory allocation is a problem in your long running Python program? I hope you're ready to be disappointed and need to roll a lot of stuff yourself. On the JVM the tooling for all kinds of observability is immensely better. I'm not hopeful that the gap is really going to close.
by rpcope1
3/9/2026 at 5:33:17 PM
> If true parallelism becomes common, it might actually reduce the number of containers/services needed for some workloadsNot by much. The cases where you can replace processes with threads and save memory are rather limited.
by fiedzia
3/9/2026 at 5:50:35 PM
Citation needed? Tall tasks are standard practice to improve utilization and reduce hotspots by reducing load variance across tasks.by aoeusnth1
3/9/2026 at 7:45:21 PM
I would have thought most of those would have been moved to async Python by now.by influx
3/9/2026 at 8:40:04 PM
async python still uses a single thread for the main loop, it just hides non blocking IO.by LtWorf
3/9/2026 at 1:18:00 PM
A lot of that has already been solved for by scaling workers to cores along with techniques like greenlets/eventlets that support concurrency without true multithreading to take better advantage of CPU capacity.by apothegm
3/9/2026 at 9:15:02 PM
That's great for concurrency, but doesn't improve parallelism.Unless you mean you have multiple worker processes (or GIL-free threads).
by Sohcahtoa82
3/9/2026 at 3:00:17 PM
But you are still more or less limited to one CPU core per Python process. Yes, you can use that core more effectively, but you still can't scale up very effectively.by kevincox
3/9/2026 at 3:59:21 PM
But python can fork itself and run multiple processes into one single container. Why would there be a need to run several containers to run several processes?There's even the multiprocessing module in the stdlib to achieve this.
by LtWorf
3/9/2026 at 5:21:45 PM
Threads are cheap, you can do N work simultaneously with N threads in one process, without serialization, IPC or process creation overhead.With multiprocessing, processes are expensive and work hogs each process. You must serialize data twice for IPC, that's expensive and time consuming.
You shouldn't have to break out multiple processes, for example, to do some simple pure-Python math in parallel. It doesn't make sense to use multiple processes for something like that because the actual work you want to do will be overwhelmed by the IPC overhead.
There are also limitations, only some data can be sent to and from multiple processes. Not all of your objects can be serialized for IPC.
by heavyset_go
3/9/2026 at 8:58:48 PM
It makes sense to me that a program currently written using multiple processes would now be re-written to use multiple truly parallel threads. But it seems very odd to suggest (as your grandparent comment does) that a program currently run in multiple containers would likely be migrated to run on multiple threads.In other words, I imagine anyone who cares about the overhead from serialization, IPC, or process creation would already be avoiding (as much as possible) using containers to scale in the first place.
by connorboyle
3/9/2026 at 5:42:59 PM
I think you have a good point on IPC but process creation in Linux is almost as fast as thread creationUnless the app would constantly be creating and killing processes then the process creation overhead would not be that much but IPC is killer
And also your types aren’t pickable or whatever and now you gotta change a lot of stuff to get it to work lol.
by akdev1l
3/9/2026 at 4:41:43 PM
Forking and multi threading do not coexist. Even if one of your transitive dependencies decides to launch a thread that’s 99% idle, it becomes unsafe to fork.by kccqzy
3/9/2026 at 5:28:16 PM
Im curious as to the down votes on this. It's absolutely true, and when I was maintaining a job runner daemon that ran hundreds of thousands of who knows what Python tasks/jobs a day on some shared infra with arbitrary code for a certain megacorp from 2016-2020 or so, this was one of insidious and ugly failure modes to go debug and handle. The docs really make it sound like you can mix threading and multiprocessing but you can never really completely ensure that threading and then bare fork will ever be safe, period. It's really irritating that the docs would have you believe that this is OK or safe, but is in keeping with the Python philosophy of trying to hide the edge of the blade you're using until it's too late and you've cut the shit out of yourself.by rpcope1
3/9/2026 at 5:44:07 PM
Why is it unsafe?by akdev1l
3/9/2026 at 8:30:27 PM
In general only the thread calling fork() gets forked, so unless you call exec() soon after, there are a lot of complications with signals, shared memory.by LtWorf
3/9/2026 at 9:02:56 PM
What are the complications? A single thread with its own process sandbox with everything from the parent is exactly what I'd expect coming from C land. Are the complications you refer to specific to the python VM or more general?by fc417fc802
3/9/2026 at 10:57:21 PM
Even treating the process as read only after forking is potentially fraught. What if a background thread is mutating some data structure? When it forks the data structure might be internally inconsistent because the work to finish the mutation might not be completed. Imagine there are locks held by various threads when it dies, trying to lock those in the child might deadlock or even worse. There's tons of these types of gotchas.by grogers
3/9/2026 at 10:26:26 PM
If you have multiple threads, you almost certainly have mutexes. If your fork happens when a non-main thread holds a mutex, your main thread will never again be able to hold that mutex.An imperfect solution is to require every mutex created to be accompanied by some pthread_atfork, but libraries don’t do that unless forking is specifically requested. In other words, if you don’t control the library you can’t fork.
by kccqzy
3/9/2026 at 5:26:00 PM
Fork-then-thread works, does it not?by philsnow
3/9/2026 at 5:40:26 PM
If you have enough discipline to make sure you only create threads after all the forking is done, then sure. But having such discipline is harder than just forbidding fork or forbidding threads in your program. It turns a careful analysis of timing and causality into just banning a few functions.by kccqzy
3/9/2026 at 8:40:08 PM
Can't you check what threads are active at the time you fork?by josefx
3/9/2026 at 10:30:52 PM
And what do you do with that information? Refuse to fork after you detect more than one thread running? I haven’t seen any code that gracefully handles the unable-to-fork scenario. When people write fork-based code, especially in Python, they always expect forking to succeed.by kccqzy
3/9/2026 at 5:30:26 PM
But not the reverse, if its a bare fork and not strictly using basically mutex and shared resource free code (which is hard), and there's little or no warning lights to indicate that this is a terrible idea that fails in really unpredictable and hard to debug ways.by rpcope1
3/9/2026 at 8:33:43 PM
I'm replying to a person that scales python by running several containers instead of 1 container with several python processes.by LtWorf