3/30/2025 at 2:53:33 PM
A more meaningful adventure into microbenchmarking than my last. I look at why we no longer need to P/Invoke memcmp to efficiently compare arrays in C# / .NET.Old stackoverflow answers are a dangerous form of bit-rot. They get picked up by well-meaning developers and LLMs alike and recreated years after they are out of date.
by xnorswap
3/30/2025 at 3:44:53 PM
For loop regression in .NET 9, please submit an issue at dotnet/runtime. It’s yet another loop tearing miscompilation caused by suboptimal loop lowering changes if my guess is correct.by neonsunset
3/30/2025 at 4:59:54 PM
No problem, I've raised the issue as https://github.com/dotnet/runtime/issues/114047 .by xnorswap
3/30/2025 at 5:13:47 PM
Thanks!by neonsunset
3/31/2025 at 12:41:44 PM
19 Hours in and that PR has already hands on from multiple people at MS. Incredible.by jve
3/31/2025 at 12:59:04 AM
UPD: For those interested, it was an interaction between microbenchmark algorithm and tiered compilation and not a regression.https://github.com/dotnet/runtime/issues/114047#issuecomment...
by neonsunset
3/31/2025 at 4:49:33 AM
This is a ten line function that takes half a second to run.Why do you have to call it more than 50 times before it gets fully optimized?? Is the decision-maker completely unaware of the execution time?
by Dylan16807
3/31/2025 at 4:33:36 PM
Long-running methods (like the one here) transition mid-execution to more optimized versions, via on-stack replacement (OSR), after roughly 50K iterations. So you end up running optimized code either if the method is called a lot or loops frequently.The OSR transition happens here, but between .net8 and .net9 some aspects of loop optimizations in OSR code regressed.
by andyayers
3/31/2025 at 6:31:40 PM
So there actually was a regression and it wasn't an intentional warmup delay?by Dylan16807
3/31/2025 at 6:44:14 PM
There indeed is a regression if the method is only called a few times. But not if it is called frequently.With BenchmarkDotNet it may not be obvious which scenario you intend to measure and which one you end up measuring. BDN runs the benchmark method enough times to exceed some overall "goal" time for measuring (250 ms I think). This may require many calls or may just require one.
by andyayers
3/31/2025 at 3:44:24 PM
> Why do you have to call it more than 50 times before it gets fully optimized?? Is the decision-maker completely unaware of the execution time?If you read the linked conversation, you'll notice that there are multiple factors at play.
Here's the document that roughly outlines the tiered compilation and DPGO flows: https://github.com/dotnet/runtime/blob/main/docs/design/feat... note that it may be slightly dated since the exact tuning is subject to change between releases
by neonsunset
3/31/2025 at 7:51:17 AM
The optimiser doesn't know how long optimisation will take or how much time it will save before starting the work, therefore it has to hold off on optimising not frequently called functions.There are also often multiple concrete types that can be passed in, optimising for one will not help if it is also getting called with other concrete types.
by lozenge
3/31/2025 at 8:16:39 AM
> The optimiser doesn't know how long optimisation will take or how much time it will save before starting the work, therefore it has to hold off on optimising not frequently called functions.I don't buy that logic.
It can use the length of the function to estimate how long it will take.
It can estimate the time savings by the total amount of time the function uses. Time used is a far better metric than call count. And the math to track it is not significantly more complicated than a counter.
by Dylan16807
3/31/2025 at 12:34:36 PM
> It can use the length of the function to estimate how long it will take.
Ah, yes, because a function that defines and then prints a 10,000 line string will take x1,000 longer to run than a 10 line function which does matrix multiplication over several billion elements.
by gavinray
3/31/2025 at 1:21:35 PM
I think he meant how long it will take to optimize itIt is naive eitherway
by high_na_euv
3/31/2025 at 6:30:55 PM
It's naive but it's so so much better than letting a single small function run for 15 CPU seconds and deciding it's still not worth optimizing it yet because that was only 30 calls.by Dylan16807
3/30/2025 at 5:27:32 PM
The number of times I've caught developers wholesale copying stack overflow posts, errors and all, is far too high.by timewizard
3/30/2025 at 5:33:43 PM
Indeed, the problems of LLMs are not new. We just automated what people who have no idea what they are doing were doing anyway. We... optimized incompetence.by guerrilla
3/30/2025 at 6:31:30 PM
The problem with the LLM equivalent is that you can't see the timestamp of the knowledge it's drawing from. With stack overflow I can see a post is from 2010 and look for something more modern, that due diligence is no longer available with an LLM, which has little reason to choose the newest solution.by SketchySeaBeast
3/30/2025 at 6:12:44 PM
This is a bit elitist isn’t it. It highly depends on the type of code copied and it’s huge part of software engineer bullishness approach to LLMs compared to most other professions.Regardless of how competent as a programmer you are, you don’t necessarily possess the knowledge/answer to “How to find open ports on Linux” or “How to enumerate child pids of a parent pid” or “what is most efficient way to compare 2 byte arrays in {insert language}” etc. A search engine or an LLM is a fine solution for those problems.
You know that the answer to that question if what you’re after. I’d generally consider you knowing the right question to ask is all that matters. The answer is not interesting. It’s most likely a deeply nested knowledge about how Linux networking stack works, or how process management works on a particular OS. If that was the central point of the software we’re build (like for example we’re a Linux Networking Stack company) then by all means. It’s silly to find a lead engineer in our company who is confused about how open ports work in Linux.
by eddythompson80
3/30/2025 at 6:38:14 PM
Read the license. CC BY-SA.Copying code and breaking the license is a liability many companies don’t want and therefore block SO when in the office.
I’ve seen upvoted answers to questions around with stuff that purposefully has a backdoor in it (one character away from being a correct answer, so you are vulnerable only if you actually copied and pasted).
I think S.O. Is great, and LLMs too, but any “lead” engineer would try to learn and refute the content.
BTW: my favorite thing to do after an LLM gives a coding answer: now fix the bug.
The answers are hilarious. Oh, I see the security vulnerabilities. Or oh, this won’t work in an asynchronous environment. Etc, etc. Sometimes you have to be specific with the type of bug you spot (looking at you, sonnet 3.7). It’s worth adding to your cursor rules or similar.
by sroussey
3/30/2025 at 6:48:04 PM
All my 24-year career is among 4 “very large” software companies and 1 startup. 3 out of the 4 had a culture of “// https://stackoverflow.com/xxxxx” type comments on top of any piece of code that someone learned about from stackoverflow. There was one where everyone made a big fuss about such things in code reviews. They’ll ask “we don’t have any functions in this project that use this Linux syscall. How do you know this is what needs to be called???” And you had 2 ways of answering. You could link a kernel.org url saying “I looked through Linux sources and learned that to do X you need to call Y api” and everyone would reply “cool”, “great find”, etc. You could also say “I searched for X and found this stackoverflow response” which everyone will reply to as “stackoverflow is often wrong”, “do we have the right license to use that code”, “don’t use stackoverflow”, “please reconsider this code”by eddythompson80
3/30/2025 at 7:07:10 PM
> There was one where everyone made a big fuss about such things in code reviews.There's always dumb morons... sigh.
Even if you don't copy code from SO, it still makes sense to link to it if there is a decent explanation on whatever problem you were facing. When I write code and I hit some issue - particularly if it's some sort of weird ass edge case - I always leave a link to SO, and if it's something that's also known upstream but not fixed yet (common if you use niche stuff), I'll also leave a TODO comment linking to the upstream issue.
Code should not just be code, it should also be a document of knowledge and learning to your next fellow coder who touches the code.
(This also means: FFS do not just link stackoverflow in the git commit history. No one is looking there years later)
by mschuster91
3/30/2025 at 7:04:47 PM
Or just put the link in the code as the license requires.Then… you could have a bot that watches for updates to the post in case it was wrong and someone points it out.
by sroussey
3/30/2025 at 6:34:03 PM
> This is a bit elitist isn’t it.Damn straight. Understand what you're doing or don't do it. Software is bad enough as it is. There's absolutely no room for the incompetent in this game. That science experiment has been done to death and we're certain of the results.
by guerrilla
3/30/2025 at 6:50:17 PM
It's hardly unreasonable to expect your peers to at least _try_ to understand what they are doing. Copypaste coding is never conducive to a good codebase.by knome
3/30/2025 at 6:59:34 PM
I do expect them to understand the code they are copying/pasting. Though to an extent. I understand they would test the code. They would try different inputs to the code and its result. I’d also understand they would test that code across all the different “Linux distros” we use, for example. After all, that code basically calls a Linux syscall, so I understand that’s very stable.Then I learn that this particular syscall depends on this kernel build flag that Debian passes, but not alpine. You can get it in alpine if you set that other flag. What are you a “caveman not knowing that `pctxl: true` is the build flag to enable this feature?”
by eddythompson80
3/30/2025 at 6:15:59 PM
In this case it was code to generate an "oauth2 code_challenge" and the correctly URLEncode it. Instead of using replaceAll the example used replace. So only the first character in the string was getting converted.When pressed the developer said they thought their code was "too fast for the oauth server" and that's why it failed about 25% of the time.
The level of disappointment I had when I found the problem was enough to be memorable, but to find the post he flat out copied on stack overflow, along with a comment below it highlighting the bug AND the fix, nearly brought be to apoplexy.
by timewizard
3/30/2025 at 6:34:53 PM
To me “.replace()” vs “.replaceAll()” (in JS at least) is a perfect example to evaluate a developer on. Any JS developer would know that replace()’s main gotcha is that it’s not replaceAll(). I used C# professionally for years before using JS. And “.Replace()” in C# works the same way “.replaceAll()” does in JS. It was one of the first things I learned about JS and how I needed to reevaluate all my code in JS.In interviews, I’d often ask the interviewee “what is your background” and “do you know that in JS .replace() is unlike .replace() in Java or .Replace() in .NET”. That statement should make perfect sense to any developer who realizes the word “replace” is somewhat ambiguous. I would always argue that the behavior of Java and .NET is the right behavior, but it’s an ambiguous word nonetheless.
by eddythompson80
3/31/2025 at 2:48:21 AM
What's even worse is when you catch someone copying from the questions instead of the answers!by jayd16