4/5/2025 at 4:13:12 AM
Every large scale cloud/SaaS software has been using parallelism for decades [well, almost two decades give or take]. Requests run in parallel on millions of servers and one server processes many requests in parallel.In that sort of setup the old school optimize one program by doing things in parallel is somewhat less relevant because everything by its nature is massively parallel and complex operations are often broken down into bits that all run in parallel. In that sort of environment over-parallelizing can actually be a pessimization because there is often some cost around breaking the work up to run in parallel and your little bit of this massively parallel system may have some fraction of resources anyways.
Not to say there's no value in knowing "old school" parallelizing (including using SIMD or GPUs or whatnot) but for many types of applications that's not where I'd focus.
by YZF
4/5/2025 at 4:11:45 PM
Parallelism has different constraints when you’ve got a bunch of tasks with their own deadlines versus a bunch of tasks with a single common deadline. It matters a lot more with the common deadline that the tasks are roughly the same size, for instance.For either scenario you may have to parallelize single tasks anyway if the variance is too wide.
by hinkley
4/5/2025 at 5:57:14 PM
It's also about resource constraints. If your parallel request workload (let's take S3 as an example) is already using all the available resources then parallelizing this single task in the hope that it would make things more efficient/run faster is just going to result in things running slower (because of the overhead e.g. of moving data around, synchronization, etc.) At least in the higher end of request concurrency. Ofcourse if a machine just happens to be be processing a single request that can be a win but that means you're not utilizing your hardware. The way to make those heavy parallel request workload systems go faster is simply to do less, i.e. optimize them, not parallelize them. The exception is if you want to have some "quality of service" controls within this setup. That is fairly unusual in the types of systems I'm thinking about.I've seen this happen in practice and it's a common anti-pattern. An engineer tries to get requests to go faster, parallelizes some of the pieces of a single request, only to find the system can now handle less requests/s. The reason why is easy, he's now doing more work on the same amount of resources.
This is very different than a single task running on a multi-core machine with 31 cores sitting idle and one doing all the work.
I think your statement mostly applies in the second case you want to chop things up in more or less equal size piece. Otherwise you'll bottleneck on the piece that took longer.
by YZF