What does HackerNews think of oomd?

A userspace out-of-memory killer

Language: C++

Is systemd having an OOM killer really needed given that the kernel already does this? And if you have systemd doing OOM killing, would that also be competing with the kernel doing the same thing?

https://www.kernel.org/doc/gorman/html/understand/understand...

The kernel's OOM killer can be annoying and complicated sometimes, but I do like it since I can easily change /proc/${PID}/oom_score_adj to prevent it from killing something important.

EDIT: Per the docs:

    systemd-oomd is a system service that uses cgroups-v2 and pressure stall information (PSI) to monitor and take corrective action before an OOM occurs in the kernel space.
https://www.freedesktop.org/software/systemd/man/systemd-oom...

I'd be really interested in a more thorough explanation about the justification for it. This makes it seem like a bad thing that this is happening in the kernel, but memory management is the kernel's job and there would be less overhead than in userspace. Personally I think OOMScoreAdjust that systemd I think has provided a long time is the best approach for this, since it's not duplicating functionality and gives you a more user-friendly way to maintain the oom score without messing around with /proc.

Another fun fact, it also seems to kill based on what it determines to be excessive swap usage.

https://askubuntu.com/a/1423840

Facebook also wrote its own oom daemon, that systemd's was forked from:

https://github.com/facebookincubator/oomd

https://www.phoronix.com/news/Ubuntu-22.04-Systemd-OOMD

The Linux OOMKiller is inflexible and as likely to kill the right thing as the wrong thing when there's memory pressure on the system. There's a reason why third-party userspace [0] ones are used in lieu of the builtin one. You can tweak the default implementation, but never fully rely on it behaving in a deterministic manner.

Whether this systemd implementation is any better is, of course, another question entirely.

[0] https://github.com/facebookincubator/oomd

OOMKiller has a bunch of issues. Its heuristics don't apply well across the wide range of workloads Linux provides (mobile/android? webserver? Database server? build server? desktop client? Gaming machine?), each of which would require its own tuning. (more background at https://lwn.net/Kernel/Index/#Memory_management-Out-of-memor...)

That's why some orgs implemented their own solutions to avoid OOMKiller having to enter the picture, like Facebook's user-space oomd [1] or Android's LMKD [2]

[1] https://github.com/facebookincubator/oomd

[2] https://source.android.com/devices/tech/perf/lmkd

I think the distinction is that MemoryMax= is just an interface to the cgroupv2 setting, i.e., that rule is implemented inside the kernel and invokes the kernel's OOM killer within a cgroup. The manpage for systemd-oomd says, "systemd-oomd is a system service that uses cgroups-v2 and pressure stall information (PSI) to monitor and take action on processes before an OOM occurs in kernel space."

It looks like systemd-oomd is related to (based on? from the same people as?) Facebook's oomd https://github.com/facebookincubator/oomd , whose documentation gives a bunch of reasons as to why you would prefer a userspace oomd that takes in PSI data and can be configured to proactively kill misbehaving processes instead of just letting the kernel OOM killer handle it. The major reason is time to recovery: a misbehaving process can cause a system to be so far under pressure that the kernel OOM killer will take a long time to flush things out, but a userspace component can respond in advance with more configurable rules (and more flexibility, since the kernel doesn't believe you're at capacity yet).

I've heard that is problem is caused by Linux's overcommitting strategy. Basically, initial memory allocation never fails (unless you set special flags), but no memory is actually allocated on the spot. Memory is only allocated when it is accessed. And if Linux runs out of memory when a program accessed a piece of yet to be allocated memory, it will try really _really_ hard to free up memory so that memory access can success.

That's what's causing the lock ups.

Sounds to me this would be difficult to fix without breaking backward compatibility.

In the mean time, you can probably improve your quality of life quite a bit by using something like: https://github.com/facebookincubator/oomd

The oom killer already exists on servers and already can kill programs.

If you want to turn off overcommit and have the system power off when it runs out of memory, the kernel allows that.

Whatever knob they add will certainly be configurable, and ubuntu desktop can configure it one way while ubuntu server configures it the other, if it turns out people would prefer that.

In practice, people running servers seem to want oom killers to kick in before the server barfs. One example of this is facebook's oomd [1]. I assure you, they're running that on their servers, not their web-browser-machines.

[1]: https://github.com/facebookincubator/oomd

Digging the LKML thread, this appears to be the corresponding userland component for the OOM use-case:

https://github.com/facebookincubator/oomd

There was also more minimal proof-of-concept example posted by Endless OS guys:

https://gist.github.com/dsd/a8988bf0b81a6163475988120fe8d9cd