Data Processing Benchmark Featuring Rust, Go, Swift, Zig, Julia etc.

2/1/2026 at 12:03:34 AM

I was surprised to see that Java was slower than C++, but the Java code is run with `-XX:+UseSerialGC`, which is the slowest GC, meant to be used only on very small systems, and to optimise for memory footprint more than performance. Also, there's no heap size, which means it's hard to know what exactly is being measured. Java allows trading off CPU for RAM and vice-versa. It would be meaningful if an appropriate GC were used (Parallel, for this batch job) and with different heap sizes. If the rules say the program should take less than 8GB of RAM, then it's best to configure the heap to 8GB (or a little lower). Also, System.gc() shouldn't be invoked.

Don't know if that would make a difference, but that's how I'd run it, because in Java, the heap/GC configuration is an important part of the program and how it's actually executed.

Of course, the most recent JDK version should be used (I guess the most recent compiler version for all languages).

by pron

2/1/2026 at 1:47:05 AM

It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).

tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.

It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.

by rockwotj

2/1/2026 at 3:46:08 AM

> due to jit warmup

I think this harness actually uses JMH, which measures after warmup.

by pron

2/1/2026 at 5:02:04 PM

[dead]

by clawsyndicate

2/1/2026 at 2:49:51 AM

Why are you surprised? Java always suffers from abstraction penalty for running on a VM. You should be surprised (and skeptical) if Java ever beats C++ on any benchmark.

by KerrAvon

2/1/2026 at 3:43:00 AM

The only "abstraction penalty" of "running on a VM" (by which I think you mean using a JIT compiler), is the warmup time of waiting for the JIT.

by pron

2/1/2026 at 11:03:10 AM

The true penalty of Java is that product types have to be heap-allocated, as there is no mechanism for stack-allocated product types.

by xigoi

2/1/2026 at 4:36:06 PM

> product types have to be heap-allocated

Conceptually, that’s true, but a compiler is free to do things differently. For example, if escape analysis shows that an object allocated in a block never escapes the block, the optimizer can replace the object by local variables, one for each field in the object.

And that’s not theoretical. https://www.bettercodebytes.com/allocation-elimination-when-..., https://medium.com/@souvanik.saha/are-java-objects-always-cr... show that it (sometimes) does.

by Someone

2/1/2026 at 12:14:58 PM

You're right that Java lacks inline types (although it's getting them really soon, now), but the main cost of that isn't because of stack allocation (because heap allocations in Java don't cost much more than stack allocations), but because cache misses due to objects not being inlined in arrays.

by pron

2/2/2026 at 12:40:21 AM

P.S.

Even for flattened types, the "abstraction penalty", or, more precisely, its converse, the "concreteness penalty", in Java will be low, as you don't directly pick when an object is flattened. Instead, you declare whether a class cares about identity or not, and if not, the compiler will transparently choose whether and when to flatten the object, depending on how it's used.

by pron

2/1/2026 at 6:52:12 AM

Its a statement of our times that this is getting down voted. JIT is so underrated.

by andersmurphy

2/1/2026 at 9:35:38 AM

in my opinion, this assertion suffers from the "sufficiently smart compiler" fallacy somewhat.

https://wiki.c2.com/?SufficientlySmartCompiler

by stefs

2/1/2026 at 12:17:19 PM

No, Java's existing compiler is very good, and it generates as good code as you'd want. There is definitely still a cost due to objects not being inlined in arrays yet (this will change soon) that impacts some programs, but in practice Java performs more-or-less the same as C++.

In this case, however, it appears that the Java program may have been configured in a suboptimal way. I don't know how much of an impact it has here, but it can be very big.

by pron

2/1/2026 at 2:13:19 PM

Even benchmarks that allow for jit warmup consistently show java roughly half the speed of c/c++/rust. Is there something they are doing wrong? I've seen people write some really unusual java to eliminate all runtime allocations, but that was about latency, not throughput.

by galangalalgol

2/1/2026 at 3:47:02 PM

> Is there something they are doing wrong?

Yes. The most common issues are heap misconfiguration (which is more important in Java than any compiler configuration in other languages) and that the benchmarks don't simulate realistic workloads in terms of both memory usage and concurrency. Another big issue is that the effort put into the program is not the same. Low-level languages do allow you to get better performance than Java if you put significant extra work to get it. Java aims to be "the fastest" for a "normal" amount of effort at the expense of losing some control that could translate to better performance in exchange for significantly more work, bot at initial development time, but especially during evolution/maintenance.

E.g. I know of a project at one of the world's top 5 software companies where they wanted to migrate a real Java program to C++ or Rust to get better performance (it was probably Rust because there's some people out there who really want to to try Rust). Unsurprisingly, they got significantly worse performance (probably because low-level languages are not good at memory management when concurrency is at play, or at concurrency in general). But they wanted the experiment to be a success, so they put in a tonne of effort - I'm talking many months - hand-optimising the code, and in the end they managed to match Java's performance or even exceed it by a bit (but admitted it was ultimately wasted effort).

If the performance of your Java program doesn't more-or-less match or even exceed the performance of a C++ (or other low level language) program then the cause is one of: 1. you've spent more effort optimising the other program, 2. you've misconfigured the Java program (probably a bad heap-size setting), or 3. the program relies on object flattening, which means the Java program will suffer from costly cache misses (until Valhalla arrives, which is expected to be very soon).

by pron

2/1/2026 at 9:47:08 PM

In my experience, if your C++ or Rust code does not perform as well as Java, it's probably because you are trying to write Java in C++ or Rust. Java can handle a large number of small heap-allocated objects shared between threads really well. You can't reasonably expect to meet its performance in such workloads with the rudimentary tools provided by the C++ or Rust standard library. If you want performance, you have structure the C++/Rust program in a fundamentally different way.

I was not familiar with the term "object flattening", but apparently it just means storing data by value inside a struct. But data layout is exactly the thing you should be thinking about when you are trying to write performant code. As a first approximation, performance means taking advantage of throughput and avoiding latency, and low-level languages give you more tools for that. If you get the layout right, efficient code should be easy to write. Optimization is sometimes necessary, but it's often not very cost-effective, and it can't save you from poor design.

by jltsiren

2/1/2026 at 11:00:32 PM

> it's probably because you are trying to write Java in C++ or Rust

Well, sure. In principle, we know that for every Java program there exists a C++ program that performs at least as well because HotSpot is such a program (i.e. the Java program itself can be seen as a C++ program with some data as input). The question is can you match Java's performance without significantly increasing the cost of development and especially evolution in a way that makes the tradeoff worthwhile? That is quite hard to do, and gets harder and harder the bigger the program gets.

> I was not familiar with the term "object flattening", but apparently it just means storing data by value inside a struct. But data layout is exactly the thing you should be thinking about when you are trying to write performant code.

Of course, but that's why Java is getting flattened objects.

> As a first approximation, performance means taking advantage of throughput and avoiding latency, and low-level languages give you more tools for that

Only at the margins. These benefits are small and they're getting smaller. More significant performance benefits can only be had if virtually all objects in the program have very regular lifetimes - in other words, can be allocated in arenas - which is why I think it's Zig that's particularly suited to squeezing out the last drops of performance that are still left on the table.

Other than that, there's not much left to gain in performance (at least after Java gets flattened objects), which is why the use of low-level languages has been shrinking for a couple of decades now and continues to shrink. Perhaps it would change when AI agents can actually code everything, but then they might as well be programming in machine code.

What low-level languages really give you through better hardware control is not performance, but the ability to target very restricted environments with not much memory (as one of Java's greatest performance tricks is the ability to convert RAM to CPU savings on memory management) assuming you're willing to put in the effort. They're also useful, for that reason, for things that are supposed to sit in the background, such as kernels and drivers.

by pron

2/2/2026 at 8:39:10 AM

> The question is can you match Java's performance without significantly increasing the cost of development and especially evolution in a way that makes the tradeoff worthwhile?

This question is mostly about the person and their way of thinking.

If you have a system optimized for frequent memory allocations, it encourages you to think in terms of small independently allocated objects. Repeat that for a decade or two, and it shapes you as a person.

If you, on the other hand, have a system that always exposes the raw bytes underlying the abstractions, it encourages you to consider the arrays of raw data you are manipulating. Repeat that long enough, and it shapes you as a person.

There are some performance gains from the latter approach. The gains are effectively free, if the approach is natural for you and appropriate to the problem at hand. Because you are processing arrays of data instead of chasing pointers, you benefit from memory locality. And because you are storing fewer pointers and have less memory management overhead, your working set is smaller.

by jltsiren

2/2/2026 at 1:22:10 PM

What you're saying may (sometimes) be true, but that's not why Java's performance is hard to beat, especially as programs evolve (I was programming in C and C++ since before Java even existed).

In a low-level language, you pay a higher performance cost for a more general (abstract) construct. E.g. static vs. dynamic dispatch, or the Box/Rc/Arc progression in Rust. If a certain subroutine or object requires the more general access even once, you pay the higher price almost everywhere. In Java, the situation is opposite: You use a more general construct, and the compiler picks an appropriate implementation per use site. E.g. dispatch is always logically dynamic, but if at a specific use site the compiler sees that the target is known, then the call will be inlined (C++ compilers sometimes do that, too, but not nearly to the same extent; that's because a JIT can perform speculative optimisations without proving they're correct); if a specific `new Integer...` doesn't escape, it will be "allocated" in a register, and if it does escape it will be allocated on the heap.

The problem with Java's approach is that optimisations aren't guaranteed, and sometimes an optimisation can be missed. But on average they work really well.

The problem with a low-level language is that over time, as the program evolves and features (and maintainers) are added, things tend to go in one direction: more generality. So over time, the low-level program's performance degrades and/or you have to rethink and rearchitect to get good performance back.

As to memory locality, there's no issue with Java's approach, only with a missing feature of flattening objects into arrays. This feature is now being added (also in a general way: a class can declare that it doesn't depend on identity, and the compiler then transparently decides when to flatten it and when to box it).

Anyway, this is why it's hard, even for experts to match Java's performance without a significantly higher effort that isn't a one-time thing, but carries (in fact, gets worse) over the software's lifetime. It can be manageable and maybe worthwhile for smaller programs, but the cost, performance, or both suffer more and more with bigger programs as time goes on.

by pron

2/2/2026 at 2:55:21 PM

From my perspective, the problem with Java's approach is memory, not computation. For example, low-level languages treat types as convenient lies you can choose to ignore at your own peril. If it's more convenient to treat your objects as arrays of bytes/integers (maybe to make certain forms of serialization faster), or the other way around (maybe for direct access to data in a memory-mapped file), you can choose to do that. Java tends to make solutions like that harder.

Java's performance may be hard to beat in the same task. But with low-level languages, you can often beat it by doing something else due to having fewer constraints and more control over the environment.

by jltsiren

2/2/2026 at 4:27:57 PM

> or the other way around (maybe for direct access to data in a memory-mapped file), you can choose to do that. Java tends to make solutions like that harder.

Not so much anymore, thanks to the new FFM API (https://openjdk.org/jeps/454). The verbose code you see is all compiler intrinsics, and thanks to Java's aggressive inlining, intrinsics can be wrapped and encapsulated in a clean API (i.e. if you use an intrinsic in method bar which you call from method foo, usually it's as if you've used the intrinsic directly in foo, even though the call to bar is virtual). So you can efficiently and safely map a data interface type to chunks of memory in a memory-mapped file.

> But with low-level languages, you can often beat it by doing something else due to having fewer constraints and more control over the environment.

You can, but it's never free, rarely cheap (and the costs are paid throughout the software's lifetime), and the gains aren't all that large (on average). The question isn't "is it possible to write something faster" but "can you get sufficient gains at a justifiable costs", and that's already hard and getting harder and harder.

by pron

2/2/2026 at 8:17:07 AM

> Java in C++ or Rust.

This critic always forgets that Java is how most folks used to program in C++ARM, 100% of all the 1990's GUI frameworks written in C++, and that the GoF book used C++ and Smalltalk, predating Java for a couple of years.

by pjmlp

2/1/2026 at 5:25:00 PM

Has anyone done a fork of the benchmark game or plb2 to demonstrate the impacts of jit warmup and heap settings?

by galangalalgol

2/1/2026 at 11:51:02 PM

I don't know what plb2 is, but the benchmark game can demonstrate very little for because, the benchmarks are small and uninteresting compared to real programs (I believe there's not a single one with concurrency, plus there's no measure of effort in such small programs) and they compares different algorithms against each other.

For example, what can you learn from the Java vs. C++ comparison? In 7 out of 10 benchmarks there's no clear winner (the programs in one language aren't faster than all programs in the other) and what can you generalise from the 3 where C++ wins? There just isn't much signal there in the first place.

The Techempower benchmarks explore workloads that are probably more interesting, but they also compare apples to oranges, and like with the benchmark game, the only conclusion you could conceivably generalise (in an age of optimising compilers, CPU caches, and machine-learning banch predictors, all affected by context) is that C++ (or Rust) and Java are about the same, as there are no benchmarks in which all C++ or Rust frameworks are faster than all Java ones or vice-versa, so there's no way of telling whether there is some language advantage or particular optimisation work done that helps a specific benchmark (you could try looking at variances, but given the lack of a rigorous comparison, that's probably also meaningless). The differences there are obviously within the level of noise.

Companies that care about and understand performance pick languages based on their own experience and experiments, hopefully ones that are tailored to their particular program types and workloads.

by pron

2/1/2026 at 9:54:33 AM

The linked article makes a specific carveout for Java, on the grounds that its SufficientlySmartCompiler is real, not hypothetical.

by sswatson

2/1/2026 at 9:49:19 AM

c++ certainly also has and needs a similarly sufficiently smart compiler to be compiled at all…

by remexre

2/1/2026 at 3:12:00 AM

For the most naive code, if you're calling "new" multiple times per row, maybe Java benefits from out of band GC while C++ calls destructors and free() inline as things go out of scope?

Of course, if you're optimizing, you'll reuse buffers and objects in either language.

by woooooo

2/1/2026 at 5:33:58 PM

> maybe Java benefits from out of band GC

benchmarks game uses BenchExec to take 'care of important low-level details for accurate, precise, and reproducible measurements' ….

BenchExec uses the cgroups feature of the Linux kernel to correctly handle groups of processes and uses Linux user namespaces to create a container that restricts interference of [each program] with the benchmarking host.

by igouy

2/2/2026 at 3:43:19 AM

I'm talking about memory management in-process, I dont think cgroups would affect that?

by woooooo

2/1/2026 at 8:50:53 AM

In the end, even Java code becomes machine code at some point (at least the hot paths).

by cryptos

2/1/2026 at 9:44:35 AM

yes, but that's just one part of the equation. machine code from compiler and/or language A is not necessarily the same as the machine code from compiler and/or language B. the reasons are, among others, contextual information, handling of undefined behavior and memory access issues.

you can compile many weakly typed high level languages to machine code and their performance will still suck.

java's language design simply prohibits some optimizations that are possible in other languages (and also enables some that aren't in others).

by stefs

2/1/2026 at 12:19:13 PM

> java's language design simply prohibits some optimizations that are possible in other languages (and also enables some that aren't in others).

This isn't really true - at least not beyond some marginal things that are of little consequence - and in fact, Java's compiler has access to more context than pretty much any AOT compiler because it's a JIT and is allowed to speculate optimisations rather than having to prove them.

by pron

2/1/2026 at 2:22:00 PM

It can speculate whether an optimization is performant. Not whether it is sound. I don't know enough about java to say that it doesn't provide all the same soundness guarantees as other languages, just that it is possible for a jit language to be hampered by this. Also c# aot is faster than a warmed up c# jit in my experience, unless the warmup takes days, which wouldn't be useful for applications like games anyway.

by galangalalgol

2/1/2026 at 4:50:24 PM

> Not whether it is sound.

Precisely right, but the entire point is that it doesn't need to. The optimisation is applied in such a way that when it is wrong, a signal triggers, at which point the method is "deoptimised".

That is why Java can and does aggressively optimise things that are hard for compilers to prove. If it turns out to be wrong, the method is then deoptimised.

by pron

2/1/2026 at 5:21:24 PM

But how can it know the optimization violated aliasing or rounding order or any number of usually silent ub?

by galangalalgol

2/1/2026 at 11:58:38 PM

There's no aliasing in the messy C sense in Java (and no pointers into the middle of objects at all). As for other optimisations, there are traps inserted to detect violation if speculation is used at all, but the main thrust of optimisation is quite simple:

The main optimisation is inlining, which, by default, is done to the depth of 15 (non-trivial) calls, even when they are virtual, i.e. dispatched dynamically, and that's the main speculation - that a specific callsite calls a specific target. Then you get a large inlined context within which you can perform optimisations that aren't speculative (but proven).

If you've seen Andrew Kelley's talk about "the vtable boundary"[1] and how it makes efficient abstraction difficult, that boundary does not exist in Java because compilation is at runtime and so the compiler can see through vtables.

But it's also important to remember that low-level languages and Java aim for different things when they say "performance". Low-level languages aim for the worst-case. I.e., some things may be slower than others (e.g. dynamic vs. static dispatch) but when you can use the faster construct, you are guaranteed a certain optimisation. Java aims to optimise something that's more like the "average case" performance, i.e. when you write a program with all the most natural and general construct, it will, be the fastest for that level of effort. You're not guaranteed certain optimisations, but you're not penalised for a more natural, easier-to-evolve, code either.

The worst-case model can get you good performance when you first write the program. But over time, as the program evolves and features are added, things usually get more general, and low level languages do have an "abstraction penalty", so performance degrades, which is costly, until at some point you may need to rearchitect everything, which is also costly.

[1]: https://youtu.be/f30PceqQWko

by pron

2/3/2026 at 11:53:27 AM

I mostly do dsp and control software, so number heavy. I am excited at the prospect of anything that might get me a performance boost. I tried porting a few smaller tests to java and got it to c2 some stuff, but I couldn't get it to autovectorize anything without making massive (and unintuitive) changes to the data structures. So it was still roughly 3x slower than the original in rust. I'll be trying it again though when Valhalla hits, so thanks for the heads up.

by galangalalgol

2/1/2026 at 4:11:12 AM

I was very surprised to see the results for common lisp. As I scrolled down I just figured that the language was not included until I saw it down there. I would have guessed SBCL to be much faster. I checked it out locally and got: Rust 9ms, D: 16ms, and CL: 80ms.

Looking at the implementation, only adding type annotations, there was a ~10% improvement. Then the tag-map using vectors as values which is more appropriate than lists (imo) gave a 40% improvement over the initial version. By additionally cutting a few allocations, the total time is halved. I'm guessing other languages will have similar easy improvements.

by XJ6w9dTdM

2/1/2026 at 12:35:58 AM

D gets no respect. It's a solid language with a lot of great features and conveniences compared to C++ but it barely gets a passing mention (if that) when language discussions pop up. I'd argue a lot of the problems people have with C++ are addressed with D but they have no idea.

by jhack

2/1/2026 at 8:01:16 AM

Ecosystem isn't that great, and much of it relies on the GC. If you're going to move out of C++, you might as well go all in on a GC language (Java, C#, Go) or use Rust. D's value proposition isn't enough to compete with those languages.

by maleldil

2/1/2026 at 9:00:03 AM

D has a GC and it’s optional. Which should be the best of both worlds in theory.

Also D is older than Go and Rust and only a few months younger than C#. So the question then becomes “why weren’t people using D when your recommended alternatives weren’t an option?” Or “why use the alternatives (when they were new) when D already exists?”

by hnlmorg

2/1/2026 at 9:58:46 AM

> D has a GC and it’s optional.

This is only true in the most technical sense: you can easily opt-out of the GC, but you will struggle with the standard library, and probably most third-party libraries too. It's the baseline assumption after all, hence why it's opt-out, not opt-in. There was a DConf talk about the future of Phobos which indicated increased support for @nogc, but this is a ways away, and even then. If you're opting-out of the GC, you are giving up a lot. And honestly, if you really don't want the GC, you may be better off with Zig.

by Defletter

2/1/2026 at 9:56:19 AM

Garbage collection has never been a major issue for most use cases. However, the Phobos vs. Tango and D1 vs. D2 splits severely slowed D’s adoption, causing it to miss the golden window before C++11, Go, and Rust emerged.

by chenzhekl

2/1/2026 at 2:32:15 AM

Could say the same for Nim.

But popularity/awareness/ecosystem matter.

by rsyring

2/1/2026 at 7:05:30 AM

That's the great thing about LLMs.

Especially with Nim it's so easy to make quality libraries with a Codex/ClaudeCode and a couple hours as a hobby.

Especially when they run fast. I just made Metal bindings and got 120 FPS demos with SDF bitmaps running yesterday while eating Saturday brunch.

by elcritch

2/1/2026 at 2:29:17 PM

I don't really get the idea that LLMs lower the level of familiarity one needs to have with a language.

A standup comedian from Australia should not assume that the audience in the Himalayas is laughing because the LLM the comedian used 20 minutes before was really good at translating the comedian's routine.

But I suppose it is normal for developers to assume that a compiler translated their Haskell into x86_64 instructions perfectly, then turned around and did the same for three different flavors of Arm instructions. So why shouldn't an LLM turn piles of oral descriptions into perfectly architected Nim?

For some reason I don't feel the same urgency to double-check the details of the Arm instructions as I feel about inspecting the Nim or Haskell or whatever the LLM generated.

by freeopinion

2/1/2026 at 2:39:15 AM

If the difference in performance between the target language and C++ is huge, it's probably not the language that's great, but some quirk of implementation.

by Ygg2

2/2/2026 at 8:24:00 AM

Tiny community, even more tinier than when Andrei Alexandrescu published the D book (he is now back to C++ at NVidia), lack of direction (it is always trying the next big thing that might atract users, leaving others behind not fully done), since 2010 other alternatives with big corp sponsoring came up, others like Java and C# gained the AOT and improved their low level programing capabilities.

Thus, it makes very little sense to adopt D versus other managed compiled languages.

The language and community are cool, sadly that is not enough.

by pjmlp

2/1/2026 at 1:32:53 PM

The study seems to be “solve this the obvious way, don’t think too hard about it”. Then the systems languages (C, Zig, C++) are pretty close, the GC languages are around an order of magnitude slower (C#, Java doing pretty good at ca. 3x), and the scripting languages around two orders of magnitude slower.

But note the HO-variants: with better algorithms, you can shave off two orders of magnitude.

So if you’re open to thinking a bit harder about the problem, maybe your badly benchmarking language is just fine after all.

by debois

2/1/2026 at 5:08:25 PM

D is a GC language too so the pattern does not hold that well.

by dadoum

2/1/2026 at 7:01:36 PM

> Then the systems languages (C, Zig, C++) are pretty close

I'm sorry, I don't see C among the results.

by dvfjsdhgfv

2/1/2026 at 9:53:15 PM

My mistake, sorry. Same for D above.

Point stands, though: if your language is too far down the list, better algorithms might be enough.

by debois

2/1/2026 at 1:28:03 AM

C# is very fast (see multicore rating). Implementation based on simd (vector), memory spans, stackalloc, source generators and what have you — modern C# allows you go very low-level and very fast.

Probably even faster under .net 10.

Though using stopwatch for benchmark is killing me :-) Wonder if multiple runs via benchmarkdotnet would show better times (also due to jit optimizations). For example, Java code had more warm-up iterations before measuring

by piskov

2/1/2026 at 1:43:48 AM

This entire benchmark is frankly a joke. As other commenters have pointed out, the compiler flags make no sense, they use pretty egregious ways to measure performance, and ancient versions are being used across the board. Worst of all, the code quality in each sample is extremely variable and some are _really_ bad.

by von_lohengramm

2/1/2026 at 11:59:56 AM

Some of the rules seem very arbitrary too

> Must: Represent tags as strings

Provided the correct result is generated I don't get the rationale for this one. As long as you obey the other rule for UTF-8 compatibility, why would it be a problem to represent as bytes (or anything else)?

Seems like it would put e.g. GC'ed languages where strings are immutable at a big disadvantage

by dwroberts

2/2/2026 at 2:56:17 AM

Totally agree. I found the results surprising because a bunch of languages are faster than C++. Then I looked closer. The requirements are self-conflicting, No SIMD, but must be production-ready. No one would use the unoptimized version in production. Also looking at the C++ implementation, they are not optimized at all. This makes this benchmark literally pointless.

by elisbce

2/1/2026 at 5:01:00 AM

Quality does vary wildly because the languages vary wildly in terms of language constructs and standard libraries. Proficiency in every.single.language. used in the benchmark perhaps should not be taken for granted.

But it is an GitHub repository and the repository owner appears to accept PR's and allows people to raise an issue to provide their feedback, or… it can be forked and improved upon. Feel free to jump in and contribute to make it a better benchmark that will not be «frankly a joke» or «_really_ bad».

by inkyoto

2/1/2026 at 7:44:27 AM

I'm completely alright with just having fun and hosting your own little sandboxes online, but what good does it do to post and share this with others in its current state? The picture it paints is certainly not representative, and this sort of thing has been done a million times over with much better consistency. Again, I think it's great to hack around in every language and document your journey all the way, but sharing this is borderline misinformation. It's certainly not my duty to right the wrongs of this benchmark.

by von_lohengramm

2/1/2026 at 11:49:08 AM

About the C++ version: You have to be an absolute weirdo to (sometimes) put the opening brace of functions on the same line, but on the next line for if and for bodies.

by ahartmetz

2/1/2026 at 2:28:20 PM

I think there was a name for that brace style? It seems silly, but leaving c++ development after decades for a variety of reasons, it turned out a standard formatting tool was one of my favorite features.

by galangalalgol

2/1/2026 at 3:12:28 PM

For mixing styles like that?

  int myFunc(int foo){
      if (foo > 42)
      {
          frobnicate();
      }
  }

by ahartmetz

2/1/2026 at 5:20:03 PM

I was getting it confused with gnu style, which indents braces for control flow but not functions

by galangalalgol

2/1/2026 at 1:55:58 AM

I mean this is only meant to be an iteration if I understand correctly. Its not like someone is going around citing this benchmark yelling rewrite everything in Julia / D. Imo this is a good starting point if you are doubtful or fall into the trap of Java is not fast. For most workloads we can clearly see, Java trades off the control of C++ for "about the same speed" and much much larger and well managed ecosystem. (Except for the other day, when someones OpenJDK PR was left hanging for a month which I am not sure why).

by another_twist

2/1/2026 at 9:14:57 AM

If you get the same speeds for C++ and Java, I'd like to point out that the C++ implementation is likely very sub-optimal.

This can obviously be true for toy problems, but tends not to generalize.

by nnevatie

2/1/2026 at 9:15:00 AM

The fact that Julia “highly optimized” is 30x faster than the normal Julia implementation, yet still fails to reach for some pretty obvious optimizations, and uses a joke package called “SuperDataStructures” tells me that maybe this benchmark shouldn’t be taken all that seriously.

Benchmarks like this can still be fun and informative

by jakobnissen

2/1/2026 at 12:05:40 AM

This is really interesting. Julia is a beast compared to python.

Nowadays whenever I see benchmarks of different languages. I really compare it to benjdd.com/languages or benjdd.com/languages2

Ended up creating a visualization of this data if anybody's interested

https://serjaimelannister.github.io/data-processing-benchmar...

(Given credits to both sources in the description of this repo)

(Also fair disclosure but it was generated just out of curiosity of how this benchmark data might look if it was on benjdd's ui and I used LLM's for this use case for prototyping purposes. The result looks pretty simiar imo for visualization so full credits to benjdd's awesome visualization, I just wanted this to be in that to see for myself but ended up having it open source/on github pages)

I think benjdd's on hackernews too so hi ben! Your websites really cool!

by Imustaskforhelp

2/1/2026 at 1:53:53 AM

Someone replied to me in an old comment that for fast Python you have to use numpy. In the folder there is a program in plain python, another with numpy and another with numba. I'm not sure why only one is shown in the data.

Disclaimer: I used numpy and numba, but my level is quite low. Almost as if I just type `import numpy as np` and hope the best.

by gus_massa

2/1/2026 at 6:24:42 AM

For what it's worth, I've ported a lot of heavily optimized numpy code to Julia for work, and consistently gotten 10x-100x speedups, largely due to how much easier it is to control memory allocations and parallelize more effectively.

by SatvikBeri

2/1/2026 at 1:57:52 AM

> Almost as if I just type `import numpy as np` and hope the best.

As do we all. If you browse through deep learning code a large majority is tensor juggling.

by another_twist

2/1/2026 at 5:16:08 AM

Go being beaten by C# in multicore is quite hard to believe. Also Zig and Odin doing so "poorly" in single core is strange.

by gethly

2/1/2026 at 8:32:23 AM

The quality of the benchmark code is... not great. This seems like Zig written by someone who doesn't know Zig or asked Claude to write it for them. Hell, actually Claude might do a better job here.

In short, I wouldn't trust these results for anything concrete. If you're evaluating which language is a better fit for your problem, craft your own benchmark tailored for that problem instead.

by osmsucks

2/1/2026 at 9:06:52 AM

So far, the best benchmark seems to be the https://plummerssoftwarellc.github.io/PrimeView/

Although it is very single-thread biased test.

by gethly

2/1/2026 at 5:17:06 PM

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

by igouy

2/1/2026 at 11:50:26 AM

Modern c# has many low level knobs (still in a safe way; though it also supports unsafe) for zero allocation, hardware intrinsics, devirtualization of calls at runtime, etc.: simd (vector), memory spans, stackalloc, source generators (helps with very efficient json), etc.

Most of all: C# has a very nice framework and tooling (Rider).

by piskov

2/1/2026 at 11:06:34 AM

Go is beaten constantly by C# in both Benchmark Game and Techempower benchmarks.

by DeathArrow

2/1/2026 at 3:44:48 PM

I don't know why this is downvoted, because the statement is not wrong (https://benchmarksgame-team.pages.debian.net/benchmarksgame/...). Times have changed, modern .NET is very fast and is getting faster still (https://devblogs.microsoft.com/dotnet/performance-improvemen...).

by mrsmrtss

2/1/2026 at 2:25:29 PM

It's not really surprising given the implementations. The C# stdlib just exposes more low-level levers here (quick look, correct me if I'm wrong):

For one, the C# code is explicitly using SIMD (System.Numerics.Vector) to process blocks, whereas Go is doing it scalar. It also uses a read-only FrozenDictionary which is heavily optimized for fast lookups compared to a standard map. Parallel.For effectively maps to OS threads, avoiding the Go scheduler's overhead (like preemption every few ms) which is small but still unnecessary for pure number crunching. But a bigger bottleneck is probably synchronization: The Go version writes to a channel in every iteration. Even buffered, that implies internal locking/mutex contention. C# is just writing to pre-allocated memory indices on unrelated disjoint chunks, so there's no synchronization at all.

by kdps

2/1/2026 at 2:39:28 PM

In other words the benchmark doesn't even use the same hardware for each run?

by freeopinion

2/1/2026 at 7:37:47 PM

If you're referring to the SIMD aspect (I assume the other points don't apply here): It depends on your perspective.

You could say yes, because the C# benchmark code is utilizing vector extensions on the CPU while Go's isn't. But you could also say no: Both are running on the same hardware (CPU and RAM). C# is simply using that hardware more efficiently here because the capabilities are exposed via the standard library. There is no magic trick involved. Even cheap consumer CPUs have had vector units for decades.

by kdps

2/1/2026 at 3:38:00 PM

C# is great, but look at the implementations. The jvm is set up wrong, so JAVA could perform better than what is benchmarked. Hell with Python you'd probably use Celery or numpy or ctypes to do this much faster.

So overall the benchmarks are kind of useless.

by Quothling

2/2/2026 at 8:19:20 PM

Zig's being compiled in "releasesafe" so lots of bounds checking going on.

by dnautics

2/1/2026 at 6:36:36 PM

> Rules:

> MUST

> Support up to 100 tags

> Represent tags as strings

That doesn’t require the strings that represent the tags to be the tag strings, So, one can bend the rules by representing tags by single-character strings or, alternatively, by using fixed strings of length 0 through 99, and then doing the tag comparisons only on the first character of each string or, alternatively, the length of the string (if obtaining that is fast)

Especially when tags have large common prefixes, that could speed up things tremendously.

In languages that support string interning (https://en.wikipedia.org/wiki/String_interning), I suspect that also could be used to bend the rules.

by Someone

2/1/2026 at 5:31:18 AM

Why is there no C benchmark? The C++ benchmark appears to be "modern C++" which isn't a substitute.

by hgs3

2/1/2026 at 9:48:46 AM

For comparison here's one from Dec '25

https://niklas-heer.github.io/speed-comparison

Certainly does "look" very interesting.

by stu2421

2/1/2026 at 4:07:27 PM

This one doesn’t even have warmup for Java, which makes results complete non sense.

Those benchmarks should be just forbidden for their misleading nature.

by boroboro4

2/1/2026 at 5:11:02 PM

How much difference does it make for tiny programs?

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

by igouy

2/1/2026 at 8:39:06 PM

It's not an issue of warmup time, it's an issue of jit compilation.

On my server (AMD EPYC 7252): 1) base time of the java program from the repo is 3.23s (which is ~2 worse than the one in linked page, so I assume my cpu is about 2 slower, and corresponding best c++ result will be ~450ms 2) if you count from inside of java program you get 3.17s (so about 60ms of overhead) 3) but if you run it 10 times (inside of same java program) you cut this time to 1570ms

It's still much slower than c++ version, but it's between rust and go. And this is not me optimizing something, it's only measuring things correctly.

update: running vector version of java code from same repo brings runtime to 392ms which is literally fastest out of all solutions including c++.

update2: ran c++ version on same hardware, it takes 400ms, so I would say it's fair to say c++ and vectorized java are on par (and given "allows vectorization" comment in cpp code I assume that's the best one can get out of it).

by boroboro4

2/3/2026 at 8:16:15 PM

Sorry, now I remember past performance variation with that program seemingly caused by switching the order of flip*= and sum+=

Not enough program to care about.

by igouy

2/2/2026 at 1:38:56 AM

> the java program

Which java program?

by igouy

2/2/2026 at 1:41:38 AM

This https://github.com/niklas-heer/speed-comparison/blob/master/... and this https://github.com/niklas-heer/speed-comparison/blob/master/... from parent comment

by boroboro4

2/1/2026 at 4:17:01 AM

I wrote a script (now an app basically haha) to migrate data from EMR #1 to EMR #2 and I chose Nim because it feels like Python but it's fast as hell. Claude Code did a fine job understanding and writing Nim especially when I gave it more explicit instructions in the system prompt.

by sergiotapia

2/1/2026 at 2:24:50 AM

Isn't that measuring the speed of json encoding instead?

by aatd86

2/1/2026 at 2:52:07 AM

Genuine question: Are GitHub workflows stable enough to be used for benchmarking? Like CPU time quantum scheduling is guaranteed to be the same from run to run?

by KerrAvon

2/1/2026 at 5:11:31 AM

No, it’s sloppy benchmarking

by vlovich123

2/1/2026 at 2:08:05 AM

I see some questions around the methodology of the testing. But is this representative of Ruby? Several minutes total when most finish under a second?

by matthewfcarlson

2/1/2026 at 5:12:25 PM

fyi

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

by igouy

2/1/2026 at 3:39:00 AM

What's up with the massive jump for 20k to 60k for nearly all languages?

by jasonjmcghee

2/1/2026 at 3:44:41 AM

My guess would be cache related. 5k probably fits in L1-L2 cache, whereas 20k might put you into L3.

by foota

2/1/2026 at 4:44:06 PM

on HN 2 years ago

https://news.ycombinator.com/item?id=37848571

? unchanged from 7 months ago

by igouy

2/1/2026 at 12:21:20 AM

So in the D vs Zig vs Rust vs C fight - learn d if speed is your thing?

by Vaslo

2/1/2026 at 2:40:50 AM

That only applies in an apples-to-apples comparison, i.e., same data structures, same algorithm, etc. You can't compare sorting in C and Python, but use bubble sort in C and radix sort in Python.

In here there are different data structures being used.

> D[HO] and Julia [HO] footnote: Uses specialized datastructures meant for demonstration purposes: more ↩ ↩2

by Ygg2

2/1/2026 at 8:59:57 AM

You're right of course but it also depends on how long you want to spend on it. If Python gives you radix sort directly and the C implementation you can have with the same time is bubble sort because you spent much time setting up the project and finding the right libs it kinda makes sense.

by makapuf

2/1/2026 at 4:40:59 PM

Python doesn't come with Radix sort, and Julia doesn't come with

     [[deps.SuperDataStructures]]
     git-tree-sha1 = "7222b821efcee6dcdc9e652455da09c665d8afc1"
     repo-rev = "main"
     repo-subdir = "SuperDataStructures.jl"

by Ygg2

2/1/2026 at 2:21:21 PM

Don't know about D but C, Zig and Rust use LLVM so there should be no difference.

by hiccuphippo

2/1/2026 at 9:42:08 PM

Depends on the D compiler. The reference compiler optimizes for compilation speed. LDC is backed by llvm and gdc by gcc.

by vips7L

2/1/2026 at 1:19:09 AM

Data processing benchmark but somehow R is not even mentioned?

by ekianjo

2/1/2026 at 2:26:34 AM

It would be the slowest language result on the list.

by mcdermott

2/1/2026 at 4:27:11 AM

Slower than Python? I seriously doubt that

by ekianjo

2/1/2026 at 8:37:43 AM

Port the script to R, benchmark and report your results. Python is slow, but R is generally much slower.

by mcdermott

2/1/2026 at 3:03:00 PM

I will have a look, but R has much better data structures than Python for data processing (everything is a vector in R)

EDIT: they have one script related.R in their repo, which is 3 years old, and uses jsonlite as a package which is notoriously slow. Using a package such as yyjsonr yields 10x performance, so something tells me what whoever wrote this piece of code has never heard of R before.

by ekianjo

2/1/2026 at 1:27:40 AM

That’s odd zig concurrent got slower

by pyrolistical

2/1/2026 at 1:56:37 AM

Contention overhead likely. Performance is more than just the langauge.

by another_twist

2/1/2026 at 3:45:32 AM

Also 3 years old. Zig has been rewritten in that time

by pyrolistical

2/2/2026 at 6:08:18 AM

If people don't find their preferred language on top, they will claim the benchmark is flawed. They will find a condition that is not satisfied by the benchmark. But if we operate outside of the benchmarks assumptions, all benchmarks are flawed since they cannot satisfy all possible conditions.

by DeathArrow