What does HackerNews think of earlyoom?

earlyoom - Early OOM Daemon for Linux

Language: C

It is less zram, and more block I/O scheduling congestion on Linux in general[1]. The machine thrashes and becomes unresponsive under memory pressure as I/O requests flood the disks, whether it is for swap, or unpaging and re-paging file-backed storage (open shared libraries, etc.), or simply evicting frequently accessed files from the file page cache.

I run my personal workstations and laptops without swap, and with earlyoom[2], which results in applications getting killed before the machine reaches unresponsive state. I can only afford that because I trust my tools (vim, emacs, firefox, but most likely firefox) would not lose my session if they shutdown unexpectedly. I turn earlyoom off when I play games where I know memory usage will grow suddenly, but the game won't reach the limits of my machine. You can also whitelist specific applications in earlyoom, if I recall correctly.

Some people claim success configuring the kernel to use different I/O schedulers, but I haven't tried that yet.

[1]: https://lwn.net/Articles/682582/

[2]: https://github.com/rfjakob/earlyoom

> The system is not supposed to 'lock up' when you run out of physical RAM. If it does, something is wrong. It might become slower as pages are flushed to disk but it shouldn't be terrible unless you are really constrained and thrashing. If the Kernel still can't allocate memory, you should expect the OOMKiller to start removing processes. It should not just 'lock up'. Something is wrong.

I don't why but locking up is my usual experience for Desktop Linux for many years and distros, and I remember seeing at least one article explaining why. The only real solution is calling the OOMKiller early either with a daemon or SysRq.

> It should not take minutes. Should happen really quickly once thresholds are reached and allocations are attempted. What is probably happening is that the system has not run out of memory just yet but it is very close and is busy thrashing the swap. If this is happening frequently you may need to adjust your settings (vm.overcommit, vm.admin_reserve_kbytes, etc). Or even deploy something like EarlyOOM (https://github.com/rfjakob/earlyoom). Or you might just need more RAM, honestly.

Yeah. Exactly. But as the thread says, why aren't those things set up automatically?

Swap is indeed supposed to prevent this AFAIK. You can though try some tools like EarlyOOM and see if it helps : https://github.com/rfjakob/earlyoom
Probably not enough RAM. The page isn't doing anything crazy, but includes a dozen or so example 's, which in turn can send Linux into an out-of-memory situation that freezes the system (technically it's still working just really slow). Ran into that issue a lot when browsing around with 8GB RAM, upgrading that helped. Installing earlyoom[1] is another workaround.

[1] https://github.com/rfjakob/earlyoom

> Warning signs of a genuine low memory situation that you may want to look into:

> available memory (or "free + buffers/cache") is close to zero

> swap used increases or fluctuates

> dmesg | grep oom-killer shows the OutOfMemory-killer at work

How about this one: the entire system becomes so unresponsive that even a VT switch takes multiple minutes to process, never mind actually logging in and running diagnostics. I don't know exactly why, but I used to get this pretty regularly on a 2 GB laptop if I did the wrong thing in Firefox. The workaround was installing earlyoom [1] (after which the tab would crash instead), although I assume there's some kind of sysfs parameter that I might have tweaked instead.

[1] https://github.com/rfjakob/earlyoom

The kernel OOM killer just isn't designed for desktop at all, it seems like. Back when I didn't have a ridiculous amount of RAM, I used this instead and was happy with it:

https://github.com/rfjakob/earlyoom

The boot thing you mention always seemed a very odd default to me. It doesn't happen to me any more on my work ubuntu machine, but maybe I configured something to fix that and forgot about it.

I mean, the OOM killer's heuristics are byzantine to be sure. However, if your program is not likely to be the "true" culprit of memory exhaustion, there are better tools at your disposal than ballast pages--cgroups and tunable oomkillers like earlyoom (https://github.com/rfjakob/earlyoom).

On the other hand, if you are likely to be identified as the culprit, I think the best you can hope for is getting some cleanup/reporting in before you're kill-9'd.

I've found that earlyoom [1] can at least save me from having a complete freeze. There are packages for most distros.

[1] https://github.com/rfjakob/earlyoom

How can you use cgroups2 for this? I know that BSD resource limits are basically useless for this as they only allow to limit virtual memory, not RSS use.

EarlyOOM [1] is a configurable daemon that kills processes early enough to (hopefully) prevent thrashing. I'm using it on my Linux desktops (it has proven to catch my own programs' runaway memory usage before it risks locking up the development machine), but it may also be useful on servers. It logs to syslog but also can be configured to run a program on kill events.

[1] https://github.com/rfjakob/earlyoom, https://launchpad.net/ubuntu/+source/earlyoom, https://packages.debian.org/search?keywords=earlyoom

(Why would a user space OOM killer be necessary if the kernel has better information about the state of the world? I don't know the details, but my interpretation is that because people disliked OOM killing, the kernel devs made the kernel OOM killer trigger so late that it is largely useless. If that's true and thus a social problem, maybe it needs to be solved on that level, too.)

BTW in my experience, Linux 2.2 used to handle out of memory situations much more gracefully than any later kernel version.

>1. Swap thrashing can bring the system into a completely unresponsive state.

This can be prevented using earlyoom (which is packaged in most distros):

https://github.com/rfjakob/earlyoom

It's probably too difficult to fix the underlying design errors, e.g. fork() duplicating the process's entire address space, thus requiring overcommit and copy-on-write, but losing only one process beats losing all of them. earlyoom should be enabled by default.

In practice, everything slows to an unusable crawl, and you hit the reset button because it's the fastest way of regaining control. IMO, earlyoom or similar is essential for general desktop use where memory load is unpredictable. Better to lose one process than lose all of them.

"The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think that it will do. I have yet to be patient enough to wait for it, sitting in front of an unresponsive system."

https://github.com/rfjakob/earlyoom

not that its anything more than a bandaide, but I have found earlyoom to be immensely useful in cases like that.

https://github.com/rfjakob/earlyoom

> I still have to figure out how in the world Reddit on Firefox manages to completely lock the system up, with looping audio and frozen cursor. Nothing else causes that fault.

Fellow X220 user here... a solution for this exact problem where the system runs out of memory and then you sit there staring and waiting until it churns around long enough until it can do stuff again is to run earlyoom[0].

It will kill off the firefox process (or whichever is the main memory hog) early, which is also annoying but less so than having to wait minutes until you can use your computer again.

[0] https://github.com/rfjakob/earlyoom

+1 to this as a workaround until the kernel finally addresses the issue. Earlyoom is a user space OOM-killer that kicks in before the system starts the mad paging dance.

https://github.com/rfjakob/earlyoom

Packages are available in Debian Stable (Buster), so they should be available in most child distros by now as well.

It actually DOES happen like this. When the entire working set for actively used apps fits in memory swap lets the system page out things that are little used. This works perfectly fine.

This is to say that swapping out little used stuff delays the point where you are actually out of memory and performance goes straight to hell.

This means the optimal arrangement for desktop use is some swap and low swappiness.

One could imagine that perhaps something like

https://github.com/rfjakob/earlyoom

Might be an easier route to better behavior especially as you can more easily tell it what it ought to kill.

The behavior of the kernel could probably be improved but it is probably inherently lacking the data required to make a truly optimal choice along with a gui to communicate with users. Going forward possibly desktop oriented distros should probably come with some out of the box gui to handle this situation built into their graphical environments before it gets to the point of dysfunction.

There's no reason the OOM killer can't be made more aggressive, and there are user-space implementations of that behavior. I use the "Early OOM Daemon"[0], which is packaged in Debian. I had problems with my system locking up under memory pressure before, but so far earlyoom has always managed to kill processes early enough to prevent this.

[0] https://github.com/rfjakob/earlyoom

Tip: I have the exact same setup, and I use earlyoom (https://github.com/rfjakob/earlyoom) to prevent this. Previously, I was frequently unable to break out of the freeze no matter how long I waited, forcing a hard reboot, but with earlyoom just a single memory-heavy app gets killed instead.
Interesting approach. I'm curious to try it out.

After playing around with vm.overcommit_ratio, different swap sizes, earlyoom[1], and a few other variables, I still haven't found a happy medium between high memory utilization and low risk of swapping to death. vm.overcommit_ratio=0 is safe, but on systems where occasional swapping is tolerable and memory is limited (e.g. my laptop), I'd rather allow some overcommit.

The risk is that if many cold or unallocated pages get touched while the system is under high memory pressure, the system can become totally unresponsive. At the moment I use "Magic SysRq"+f to manually start the oom_killer, when possible. Obviously it's not a great solution. Is there some kernel tunable to keep the system responsive that I'm unaware of? What do you guys do for desktop/laptop systems?

1. https://github.com/rfjakob/earlyoom

There's a userspace daemon that does that: https://github.com/rfjakob/earlyoom (I am the author)
Swap engineering is definitely way more complicated than the article lets on.

What people often don't realise is that Linux distinguishes between dirty and clean memory, with dirty memory not reflected in swap, and clean memory already stored somewhere on disc, and program code is counted as clean memory that just happens to live somewhere other than swap.

Therefore, under memory pressure (especially if you set swappiness to zero), you will be preferentially swapping out your program code (because it is always clean) in preference to your program data. If you have no swap, then this is what causes the system to grind to a halt when the RAM is full- all your program code gets discarded from RAM, and nothing can run without reading it from disc again.

The recommendation to have a little bit of swap is absolutely fine. However, as the amount of RAM in your system increases, the penalty for running out of RAM increases as well. On larger systems (for example 256GB RAM), I recommend using something like EarlyOOM[1] to kill off tasks before the pathological swapping case occurs. Otherwise, you could end up with an unresponsive system. If you have lots of RAM, the kernel OOM killer waits far too late, and the system is already unresponsive.

[1] https://github.com/rfjakob/earlyoom

Use earlyoom: https://github.com/rfjakob/earlyoom

By default, it'll start killing processes when free memory drops below 10%, though you can configure the threshold. I had the same problem for years, and then I started using earlyoom and I don't have to deal with it anymore.

> I've had the OOM-killer render the system unresponsive a couple of times

Use earlyoom instead of relying on oom-killer.

https://github.com/rfjakob/earlyoom

To quote from the description:

> The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think what it will do. I have yet to be patient enough to wait for it.

[...]

> This made people wonder if the oom-killer could be configured to step in earlier: superuser.com , unix.stackexchange.com.

> As it turns out, no, it can't. At least using the in-kernel oom killer.

And earlyoom exists to provide a better alternative to oom-killer in userspace that's much more aggressive about maintaining responsivity.

I haven't used swap in years, and more recently I've accompanied that by using earlyoom [0] to start killing processes when RAM usage rises above 90%.

Both changes have made my computers much more usable. Systems should designed to fail fast when memory is low instead of slowing down.

[0] https://github.com/rfjakob/earlyoom

No kidding. If not for earlyoom [0], every few hours my machine would grind to a screeching halt with the hard drive thrashing (and yes, I got rid of swap ages ago but it still happens) because the kernel doesn't know what to do with large amounts of RAM being used. Before discovering earlyoom, I would powercycle my machine whenever it happened because a powercycle was faster than waiting for the kernel to finish its tantrum.

[0] https://github.com/rfjakob/earlyoom