alt.hn

3/29/2025 at 8:43:42 PM

Atop 2.11 heap problems

https://openwall.com/lists/oss-security/2025/03/29/1

by baggy_trough

3/29/2025 at 11:03:41 PM

Hey guys we commented on another thread from a few days ago about our tool Bismuth finding the bug (along with a sha of our reproducer script for proof) https://news.ycombinator.com/item?id=43489944

After disclosing and having correspondence with Gerlof and from his above post it looks like we did in fact nail it and I've just shared our write up on how we got it.

HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522

Edit: Here's our reproducer and we've added it to the post too: https://gist.github.com/kallsyms/3acdf857ccc5c9fbaae7ed823be...

by ianbutler

3/30/2025 at 7:41:25 AM

> HN post detailing how we got it: https://news.ycombinator.com/item?id=43519522

I don't see any details there. Is there some link missing here, or is it the wrong link?

I'd be interested to read how your tool found it.

by hannob

3/30/2025 at 10:29:18 AM

It's just "we asked our LLM and it found the bug", as I understand it.

by stavros

3/29/2025 at 11:08:43 PM

What is that a hash of?

by saagarjha

3/29/2025 at 11:08:54 PM

As noted, our reproducer script

by ianbutler

3/29/2025 at 11:10:39 PM

Right, but where’s the script?

by saagarjha

3/29/2025 at 11:17:49 PM

https://gist.github.com/kallsyms/3acdf857ccc5c9fbaae7ed823be...

From my co-founders account

by ianbutler

3/29/2025 at 11:39:56 PM

Cool, thanks for adding it. It would also be nice if you posted how you generated the hash :) I’m not trying to be annoying but this is a critical part of how these hashes work; you post the hash early to indicate you have some information early and then later you demonstrate that by actually presenting the artifact with that hash. If you don’t publish the artifact so people can check that it is actually what you claim it is then your hash is worthless (as nobody can prove it’s not, like, the hash of a cat photo). And you’d generally want to demonstrate how you generated the hash just so people don’t have to figure out whether to md5 or sha1sum it.

by saagarjha

3/29/2025 at 11:53:24 PM

Hey yeah got caught up in the excitement of finding it :)

It's a SHA256 - `shasum -a 256 server.py`

by kallsyms

3/29/2025 at 11:29:40 PM

This doesn't seem nearly as nefarious as the post from earlier this week indicated... I had expected a full supply chain compromise or something that bad based on the earlier post.

by geerlingguy

3/30/2025 at 3:04:29 AM

Yea, my first thought was this is a unrelated find because eyeballs since the recent focus.

by barotalomey

3/30/2025 at 1:23:03 AM

Yeah being taciturn was really the worst thing you could do

by f33d5173

3/30/2025 at 3:58:21 AM

I was bit by atop a few years back and swore it off. I would get perfectly periodic 10m hangs on MySQL. Apparently they changed the default runtime options such that it used an expensive metric gathering technique with a 10m cron job that would hang any large memory process on the system. It was one of those “no freaking way” revelations after 3 days troubleshooting everything.

Interesting reading through the related submission comments and seeing other hard to troubleshoot bugs. I don’t think atop devs are to blame, my guess is that what you have to do to make a tool like atop work means you are hooking into lots of places that have potential to have unintended consequences.

by cullenking

3/29/2025 at 9:53:45 PM

It's unfortunate that Unix sockets isn't being used for local connections like this.

by unsnap_biceps

3/29/2025 at 11:18:08 PM

It's more unfortunate a proper RPC library is not being used. People rolling their own buggy parsers in C is an endless source of bugs.

by charcircuit

3/30/2025 at 10:45:02 AM

The whole code is horrible: https://github.com/Atoptool/atop/commit/542b7f7ac52926ca2721...

Inconsistent usage of braces, no clear memory ownership or life-cycles, zero tests.

by ahoka

3/30/2025 at 3:45:23 PM

Can you please provide an example of good C code?

I agree that absence of tests isn't great, and is very common with many C-based projects. But the rest of your comments reads like "ooh, it's C, disgusting!". I hope, I'm wrong.

by the-lazy-guy

3/30/2025 at 6:10:55 PM

sqlite3 is the canonical example of a mature, well-structured, excellently tested C codebase. I would also submit cURL/libcURL as a strong example.

by woodruffw

3/30/2025 at 10:13:06 PM

Thank you. These 2 are well-known, as well as plenty others. But I wanted to see answer from the author of the comment to which I replied. Apart from tests (of which both sqlite and curl have plenty, and that is obviously good), I don't see any reasonable difference in sqlite or curl code in aspects which were mentioned in their comment (namely, style and ownership). I'd like to see what they think is reasonable C code.

by the-lazy-guy

3/30/2025 at 12:45:42 AM

> People rolling their own buggy parsers in C

I'd like to believe this isn't common anymore for new projects?

by timcobb

3/30/2025 at 3:19:13 AM

I dont want to ruin your weekend.

by worthless-trash

3/29/2025 at 10:26:53 PM

Meh. This isn't a technology choice problem. Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.

It's true you could use a privileged spot in the filesystem and set things up to use that by writing some simple extra software, but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1], and atop didn't.

[1] It's not mentioned in the linked email, but I assume the core problem here (and the reason it got a CVE number) is that the atop binary is setuid?

by ajross

3/29/2025 at 11:15:59 PM

> Routine unix sockets are just some file in /tmp which an attacker could likewise open by racing against the daemon in the same way.

So put the socket in /run instead of /tmp?

I'm no expert, but this appears to be where they belong, and it appears to solve the problem. From https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.htm... : "System programs that maintain transient UNIX-domain sockets must place them in this directory or an appropriate subdirectory as outlined above." ... "/run should not be writable for unprivileged users; it is a major security problem if any user can write in this directory."

by adrianmonk

3/30/2025 at 12:49:06 AM

Putting them in /run if you're not already root requires a little extra software be written though. Locking down a TCP socket isn't much harder. I'm not saying "don't use Unix domain sockets", I'm saying that treating this bug as the result of technology choice is bad security analysis.

by ajross

3/30/2025 at 9:38:57 AM

The real problem is the buggy parser, and that is enabled by default, even if you aren't showing anything related to the GPU or launched the daemon.

by Zardoz84

3/30/2025 at 3:54:08 PM

> if you're not already root

Hmm, good point. I think we made opposite assumptions about that.

If the daemon does run as a root, then no extra software is required. For Unix domain sockets, you can trivially create your socket in /run, and for TCP, you can trivially use a port below 1024.

If it doesn't, then some extra software or configuration is required in either case.

I tried looking it up, and I think it does run as root[1]. But I also found that the daemon uses a Python library to get GPU stats, and root might or might not be required depending on how the GPU software is configured[2]. So it could have gone either way.

---

[1] That's how I read this: https://github.com/Atoptool/atop/blob/master/atopgpu.service

[2] See https://github.com/gpuopenanalytics/pynvml/issues/19

by adrianmonk

3/29/2025 at 11:56:55 PM

These days Unix sockets for system daemons should be placed under /run with permissions that only a particular daemon can access for binding. With systemd service and socket units it is trivial to do.

by fpoling

3/30/2025 at 2:53:50 AM

> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

How, actually? With UNIX sockets it can be a matter of setting file ownership and mode (at worst, a chmod and a chown).

What's the equally simple way to restrict access to a locally listening tcp socket?

by 3np

3/29/2025 at 10:47:54 PM

> but it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

Can you educate me? I'm familiar with SO_PEERCRED that returns the user/group/pid on the other end. Would you then checksum the exe of the pid from /proc?

by johnmaguire

3/29/2025 at 11:13:20 PM

SO_PEERCRED is only for Unix domains though, it's not going to work for TCP.

For TCP, your only easy option is to have port <1024 - but that requires root. If you want a dedicated user, then TCP requires hacks - like creating a cookie file in some protected location, like XAuthority does.

But if you have a protected location, why even bother with all this? Just create a UNIX socket there directly, after all the difference is only in connect call, read/write loop is the same. And as an extra bonus there is much better visibility, and zero chance of someone accidentally grabbing your magic number.

Unix sockets are really underappreciated.

by theamk

3/31/2025 at 3:48:38 PM

Sorry to be pedantic, but this doesn't really allow you to lock down the socket to "a specific process" does it? You're talking about restricting it to root, or another particular user/group.

I'm interested in this as I've been working on a problem myself where I'm trying to restrict access to a specific process (or a specific application), without much care for which user is running that process. On mobile, there are lots of solutions for protected locations (as you suggest) that allow sharing files across applications within a publisher, for example.

by johnmaguire

3/31/2025 at 8:24:21 PM

Correct, this is for specific user/group.

Restricting use to "specific application for any user" sounds pretty dodgy, security-wise. Linux makes no guarantees that processes are protected from executing user, so it is entirely possible your process has the right name, but runs different code. LD_PRELOAD and ptrace immediately come to mind, but I am sure there are other methods too.

That's why Android makes a unique UID per app - this turns insecure "restrict by process name" problem into well-supported "restrict by UID/GID".

(And if there no need for security boundary, and you only want convenience check to avoid non-malicious mistakes? Then just hardcode magic string in your app and check it as a part of protocol.)

by theamk

3/29/2025 at 10:52:22 PM

You can check socket credentials, indeed. You can set up filtering rules to match on UID using nftables. You can do things like put a cookie somewhere else to exchange and authenticate the connection a-la xauth. You could use TLS and check the host key vs. a public key stored at install time. There are many ways to do this, none of which require more than a few dozen lines of code/config.

But really the simplest thing would just be to use a port <1024 so that only root can open it. That's literally what the feature was for. You can still be "attacked", but only by someone who already has local root.

by ajross

3/30/2025 at 4:22:49 AM

None of that (save for running as root, which is very crude, much less granular, and requires promoting privileges of the process in question to root) is "about the same amount of work" as using a unix socket directly.

by 3np

3/30/2025 at 4:41:42 PM

If the daemon isn't running as root it can't put the socket in a secure location, requiring more code. That code isn't complicated, but neither are any of the suggestions above.

Once more: people wanting to make this security bug about the specific socket family in use are doing bad security analysis. There's nothing wrong with TCP, the app just did it wrong and failed to recognize the security boundary being crossed.

by ajross

3/31/2025 at 3:50:42 PM

This is all well and good if you want to restrict access to root users, but I thought we were trying to restrict access "to a specific process" (i.e. a specific client application.)

by johnmaguire

3/31/2025 at 5:02:19 PM

Open the socket and drop privilege before launching the daemon. I mean, come on: inetd could do this back in 4.3BSD on a VAX.

I remain absolutely dumbfounded how people in this subthread are going to the matresses trying to explain why Unix sockets are great and TCP isn't, when they both suck in exactly the same way and the correct answer is "validate your input" and not "use a different API".

by ajross

3/31/2025 at 7:36:13 PM

I'm not trying to explain why Unix sockets are great and TCP isn't... I'm trying to solve a real-world problem along a similar vein myself. FWIW, I agree that you should use Unix sockets for local-machine access - you can't accidentally expose them off the box like you can a TCP socket. But that's neither here nor there.

You seem to be misunderstanding the scenario I'm describing: I have a daemon that runs in a privileged context (as root.) I have a client that connects to the daemon, as any user on the box. The client cannot be run as root because the user does not have permission to do so.

I want to ensure that only my client can connect to the daemon. I can't use user/group permissions, because I don't care what user/group has access. I want to make sure a specific process (or a specific binary/executable) has access. To quote the comment I initially responded to:

> it's equally true that you could lock down a TCP socket to a specific process with about the same amount of work.

On a Unix machine, this is often done by creating a group to use for access (e.g. a docker group.) This works to lock down a TCP socket to a specific group but not to a specific process. Using shared secrets stored elsewhere on the box also doesn't help here, since any other process could access those secrets.

The best I know of is using something like XPC on macOS, using SO_GETPEERCRED and checksum'ing the pid out of /proc/<pid>/exe, or perhaps using some other platform-specific code signing API.

I was excited to hear that it was easy. I'm disappointed now.

by johnmaguire

3/30/2025 at 12:25:26 PM

> Bottom line is that you need to validate your input from outside the process if you're running in a privileged context[1]

What this "if" qualifier? You need to validate all input from outside the process. Whether the process is privileged or not is, frankly, not really relevant.

(I submitted a blog post a few days ago explaining "Parse, Don't Validate" in plain C, but it didn't get any traction).

by lelanthran

3/31/2025 at 2:23:41 PM

> What this "if" qualifier? You need to validate all input from outside the process.

Not all tools are designed to accept input from outside a security boundary. Obviously atop isn't one, but the world is filled with software that misbehaves on bad input. Ever DDoS your build system by misconfiguring something? Crash a running program by removing a cache directory (or unpacking a tarball on top of it)?

It's very rarely a bad idea to fail to validate input. But it's for sure not always a requirement either.

And to be blunt, it's not really possible either. You write "insecure" parsers/interpreters/whatever probably every day, we all do. And you "know" when it's safe and when it's not, I'm sure. But my point is that if that knowledge isn't based on at least a little bit of rigor ("crossing a privilege boundary" in this case), you're probably going to do it wrong.

by ajross

3/30/2025 at 6:59:32 AM

It is. But even with unix sockets, the client should never blindly trust the bytes received and parse them defensively.

by eptcyka

3/30/2025 at 10:43:38 AM

> The vulnerability is caused by the fact that atop always tries to connect to the TCP port of 'atopgpud' during initialization. When another local program has been started (instead of 'atopgpud') that listens to this TCP port, atop connects to that program. Such program is able then to send unexpected strings that may lead to parsing failures in atop. These failures result in heap problems and segmentation faults.

Okay, so, if I have a shell and the rights to listen on a host, I can crash the "atop" of other users? That's it ? I could also create a fork bomb, fill up the disk, use all CPU and memory, etc...

by Galanwe

3/30/2025 at 3:43:00 PM

Not the same thing at all if atop runs as root and you are a user on that system that has no root access. With a well-prepared exploit you could achieve code execution as root. That's a bit more than a simple Denial of Service by filling up the disk.

by TonyTrapp

3/30/2025 at 3:26:41 PM

I think the concern is for privilege escalation.

by bitbasher

3/29/2025 at 9:39:07 PM

Ah, there's the other shoe:)

> optional sources, that have to be activated explicitly.

So only locally exploitable, and you have to enable an optional feature? That's ... honestly better than I was worried that it might be

by yjftsjthsd-h

3/29/2025 at 9:55:09 PM

No. Local but it always tries to connect and the deamon to which it tries to connect is optional, which means that the default is attackable. An attacker can run their own program on the port and send bad strings that will cause an overflow.

by dgacmu

3/30/2025 at 12:36:53 AM

Oh, I see, thanks.

> Therefore, the default behavior of atop is now not to connect to the TCP port at all.

I missed that now it defaults to not connecting.

by yjftsjthsd-h

3/29/2025 at 9:50:50 PM

The fix is to make it optional.

But yeah, I was anticipating something quite a bit worse.

by MattPalmer1086

3/29/2025 at 9:43:39 PM

> always tries to connect

by immibis

3/29/2025 at 11:15:11 PM

Right, the post on “rachelbythebay” was hinting at something much worse.

by xyst

3/30/2025 at 8:02:39 AM

How so? It was pretty clear from her second post that it's a local privilege escalation. And that is is, and otherwise fairly easily exploitable.

by brazzy

3/30/2025 at 4:21:18 PM

well, the first post opened with "You might want to stop running atop" and followed with "Right now, I think it's probably best if you uninstall atop. I don't mean just stopping it, but actually keep it from being executed."

Which does indeed hint at something much worse IMO.

To be clear: I value rachaels opinion and contributions greatly. Maybe just these days I'm a little grounchy about panicky security people making us spend hours during the middle of the week uninstalling atop from hundreds of systems that wouldn't have been at risk from something like this.

by natebc

3/29/2025 at 9:57:20 PM

Did you stop reading at that sentence?

by mvdtnz

3/30/2025 at 12:41:12 AM

Unlikely, since the use of a local TCP part was later than the quoted sentence. Granted, I did skim, but after having it clarified and rereading, I think that introduction is misleadingly phrased and would benefit from clearer delineation of the previous vulnerable behavior and the fixed behavior.

by yjftsjthsd-h

3/30/2025 at 5:55:16 AM

So what was the point of Rachel's vagueposting? Was there any kind of NDA or a good reason to be so vague?

by mvdtnz

3/30/2025 at 8:03:09 AM

Responsible disclosure?

by brazzy

3/29/2025 at 10:39:59 PM

I have a semi-related question.For someone whose main job is not maintaining or running full linux servers but would like information about processes and their RAM/CPU..etc. What would be a good tool that is easy to parse with good defaults?

by stiild

3/29/2025 at 11:44:00 PM

The tool btop was suggested in the other thread to replace atop and htop.

by edoceo

3/30/2025 at 4:57:58 AM

Seconding btop++, been running it as my main top for a few years now, and switched from htop. I didn't have a single complaint about htop, did what it said on the tin and did it well in my experience, but personally I prefer btop's ux/ui.

by 0manrho

3/30/2025 at 3:21:56 AM

If you are writing software to parse it, dont use third party tooling. Read the kernel outputs directly (/proc/ /sys etc).

While they do have no guarantee not to change, if they do change any tool you are parsing will also be broken.

by worthless-trash

3/30/2025 at 4:31:31 PM

I recommend.. atop, now that it has been updated to address this issue.

by ezekiel68

3/29/2025 at 11:15:43 PM

Node exporter is a good start, or you could look at Netdata

by candiddevmike

3/30/2025 at 12:51:32 AM

htop is a decent curses processes manager that's a few miles better than top

by calvinmorrison

3/30/2025 at 9:41:17 AM

I recommend nmon

by Zardoz84

3/29/2025 at 10:09:28 PM

Is it just me or does this seem like a bad design where a TCP port is exposed to share information?

by zitterbewegung

3/29/2025 at 10:15:18 PM

Yes. Any local process can connect to a TCP port (unless special care is taken) so it should be a last-resort option. Additionally the sever either needs to be run as root to bind a privileged port or any application can race over binding that port. UNIX sockets are a much better option as they can be protected by filesystem permissions including who can bind the socket and who can connect to it.

This can be mitigated by having authentication inside the socket, but now your authentication code is an attack surface and how are you going to share the secrets? On the filesystem? You are basically back to a UNIX socket with extra steps.

by kevincox

3/29/2025 at 10:11:52 PM

As long as you bind to localhost it's fine in theory. Though any network code still needs to be rigorously hardened.

by marginalia_nu

3/29/2025 at 10:30:34 PM

> As long as you bind to localhost it's fine in theory

But only if you assume that the data being transferred is public, right?

With the described method, any non-privilieged user could access the data from the TCP socket, right?

by echoangle

3/29/2025 at 11:30:35 PM

Information in top isn't much of a secret though.

by marginalia_nu

3/30/2025 at 11:15:45 AM

That sounds less bad than expected

by Havoc

3/29/2025 at 11:43:43 PM

So, as https://www.cve.org/CVERecord?id=CVE-2025-31160 says:

* CWE-617 Reachable Assertion

* affected from 0 through 2.11.0

... can we assume these will be updated to the actual vulnerability (CWE-940, CWE-120?), and vulnerable versions (2.4.0 through 2.11.0)? Or was the vaguepost about an entirely different vulnerability? Does anyone yet know what specific issue the vaguepost was alluding to?

by amiga386

3/30/2025 at 9:36:04 AM

omg .. Why a TCP port instead of using a UNIX socket ?

by Zardoz84

3/29/2025 at 11:27:50 PM

> the parsing of the strings is improved to avoid that heap problems can occur.

Tell me what language you’re using without telling me what language you’re using…

by taspeotis

3/29/2025 at 10:25:37 PM

atop freaks out if it isn't talking to the thing it thinks it's talking to... who would have thunked it... I feel like a lot of programs have that issue.

by nubinetwork

3/29/2025 at 11:02:17 PM

It's acceptable to freak out by crashing. It's even acceptable to crash via explicit assertion failure if the developers don't want to write proper error handling. It's not acceptable to crash via segmentation fault.

by kccqzy

3/30/2025 at 3:02:36 AM

It's to an extent even acceptable to crash via segmentation fault (more specifically, doing whatever unsafe exploitable things may come of the source of the issue) if it takes the same amount of privileges to cause the crash as the thing crashing has.

And that's the important thing violated here, atop being rather reasonably ran by root to examine root processes, whereas the exploiter just needs the ability to host a thing on a specific port.

by dzaima

3/30/2025 at 11:37:48 AM

A segmentation fault is perfectly fine as long as an attacker can not cause any other action before it (but I guess this is the case here).

by uecker

3/30/2025 at 2:09:12 PM

Ah, but will it always segementation fault?

It can be difficult to prove that an out-of-bounds memory reference triggered by malformed input will always result in a segmentation fault instead of a read or write of an "interesting" memory location.

by Polizeiposaune

3/31/2025 at 12:24:50 PM

This depends. In this, I guess the issue is that there is some oob memory reference. But for example a null pointer deference resulting in a segmentation fault is not (necessarily) a security problem.

by uecker