alt.hn

4/25/2025 at 4:01:24 PM

Differential Coverage for Debugging

https://research.swtch.com/diffcover

by todsacerdoti

4/25/2025 at 9:16:40 PM

https://www.debuggingbook.org/html/StatisticalDebugger.html

A related method. Not quite as straightforward as running with and without the failing test and comparing coverage reports. This technique goes through and collects many test runs and identifies lines only associated with or most often associated with failing runs.

by Jtsummers

4/25/2025 at 9:54:34 PM

I had no idea this had (or was worthy of) a name.

That's the whole point of coverage diffs.

The tough ones are the tests that sometimes fail and give you the same coverage results - the problem is not in the code under test! And the lazy/common things to do are re-run the test or add a sleep to make things "work."

by drewcoo

4/26/2025 at 8:32:04 AM

what about just doing a git diff ? that would see the method was not called before?

by ossusermivami

4/26/2025 at 5:04:22 PM

If it were a longstanding bug versus one just created for this exercise a git diff may not help much. Imagine you've found some edge case in your code that just hadn't been properly exercised before. It could have been there for years, now how do you isolate the erroring section? This technique (or the one I mentioned in my other comment which is very similar but uses more data) can help isolate the problem section.

git diffs can definitely help with newer bugs or if you can show that it's a regression (didn't error before commit 1234abcd but did after).

by Jtsummers

5/4/2025 at 12:10:09 AM

"Wait, what did I change that impacted that test?"

An intersection between code coverage and a git diff once answered that handily for me.

by dllthomas

4/26/2025 at 2:10:10 AM

[dead]

by anougaret

4/26/2025 at 2:52:09 AM

Are you saying that LLMs will generate shitty code and the fix that by using your LLM? That seems... inconsistent...

by godelski

4/26/2025 at 3:22:35 AM

we don't do the LLM part per say

we instrument your code automatically which is a compiler like approach under the hood, then we aggregate the traces

this allows context engineering the most exhaustive & informative prompt for LLMs to debug with

now if they still fail to debug at least we gave them all they should have needed

by anougaret

4/26/2025 at 5:02:34 AM

Okay this sounds better. But aren't there other continuous debuggers out there? It doesn't seem hard to roll my own. I can definitely get vim to run pdb in a buffer every time I save my file (or whatever condition). But this does seem quite expensive for minimal benefit. Usually people turn to print statements because it's easier than the debugger. Is it iterative so you don't do the full trace and only roll back your stack trace to where the breach occurs? That's much more complex

And critically, why are you holding my code for 48 hrs? Why is anything leaving my machine at all?

by godelski

4/26/2025 at 2:37:47 AM

Why do you need to store a copy of my code to support what seems to be a time traveling debugger?

by saagarjha

4/26/2025 at 3:20:25 AM

valid concerns of course

- we are planning a hosted AI debugging feature that can aggregate multiple traces & code snippets from different related codebases and feed it all into one llm prompt, that benefits a lot from having it all centralized on our servers

- for now the rewriting algorithms are quite unstable, it helps me debug it to have failing code files in sight

- we only store your code for 48hours as I assume it's completely unnecessary to store for longer

- a self hosted ver will be released for users that cannot accept this for valid reasons

by anougaret