I run my personal workstations and laptops without swap, and with earlyoom[2], which results in applications getting killed before the machine reaches unresponsive state. I can only afford that because I trust my tools (vim, emacs, firefox, but most likely firefox) would not lose my session if they shutdown unexpectedly. I turn earlyoom off when I play games where I know memory usage will grow suddenly, but the game won't reach the limits of my machine. You can also whitelist specific applications in earlyoom, if I recall correctly.
Some people claim success configuring the kernel to use different I/O schedulers, but I haven't tried that yet.
I don't why but locking up is my usual experience for Desktop Linux for many years and distros, and I remember seeing at least one article explaining why. The only real solution is calling the OOMKiller early either with a daemon or SysRq.
> It should not take minutes. Should happen really quickly once thresholds are reached and allocations are attempted. What is probably happening is that the system has not run out of memory just yet but it is very close and is busy thrashing the swap. If this is happening frequently you may need to adjust your settings (vm.overcommit, vm.admin_reserve_kbytes, etc). Or even deploy something like EarlyOOM (https://github.com/rfjakob/earlyoom). Or you might just need more RAM, honestly.
Yeah. Exactly. But as the thread says, why aren't those things set up automatically?
> available memory (or "free + buffers/cache") is close to zero
> swap used increases or fluctuates
> dmesg | grep oom-killer shows the OutOfMemory-killer at work
How about this one: the entire system becomes so unresponsive that even a VT switch takes multiple minutes to process, never mind actually logging in and running diagnostics. I don't know exactly why, but I used to get this pretty regularly on a 2 GB laptop if I did the wrong thing in Firefox. The workaround was installing earlyoom [1] (after which the tab would crash instead), although I assume there's some kind of sysfs parameter that I might have tweaked instead.
https://github.com/rfjakob/earlyoom
The boot thing you mention always seemed a very odd default to me. It doesn't happen to me any more on my work ubuntu machine, but maybe I configured something to fix that and forgot about it.
On the other hand, if you are likely to be identified as the culprit, I think the best you can hope for is getting some cleanup/reporting in before you're kill-9'd.
EarlyOOM [1] is a configurable daemon that kills processes early enough to (hopefully) prevent thrashing. I'm using it on my Linux desktops (it has proven to catch my own programs' runaway memory usage before it risks locking up the development machine), but it may also be useful on servers. It logs to syslog but also can be configured to run a program on kill events.
[1] https://github.com/rfjakob/earlyoom, https://launchpad.net/ubuntu/+source/earlyoom, https://packages.debian.org/search?keywords=earlyoom
(Why would a user space OOM killer be necessary if the kernel has better information about the state of the world? I don't know the details, but my interpretation is that because people disliked OOM killing, the kernel devs made the kernel OOM killer trigger so late that it is largely useless. If that's true and thus a social problem, maybe it needs to be solved on that level, too.)
BTW in my experience, Linux 2.2 used to handle out of memory situations much more gracefully than any later kernel version.
This can be prevented using earlyoom (which is packaged in most distros):
https://github.com/rfjakob/earlyoom
It's probably too difficult to fix the underlying design errors, e.g. fork() duplicating the process's entire address space, thus requiring overcommit and copy-on-write, but losing only one process beats losing all of them. earlyoom should be enabled by default.
"The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think that it will do. I have yet to be patient enough to wait for it, sitting in front of an unresponsive system."
Fellow X220 user here... a solution for this exact problem where the system runs out of memory and then you sit there staring and waiting until it churns around long enough until it can do stuff again is to run earlyoom[0].
It will kill off the firefox process (or whichever is the main memory hog) early, which is also annoying but less so than having to wait minutes until you can use your computer again.
https://github.com/rfjakob/earlyoom
Packages are available in Debian Stable (Buster), so they should be available in most child distros by now as well.
This is to say that swapping out little used stuff delays the point where you are actually out of memory and performance goes straight to hell.
This means the optimal arrangement for desktop use is some swap and low swappiness.
One could imagine that perhaps something like
https://github.com/rfjakob/earlyoom
Might be an easier route to better behavior especially as you can more easily tell it what it ought to kill.
The behavior of the kernel could probably be improved but it is probably inherently lacking the data required to make a truly optimal choice along with a gui to communicate with users. Going forward possibly desktop oriented distros should probably come with some out of the box gui to handle this situation built into their graphical environments before it gets to the point of dysfunction.
After playing around with vm.overcommit_ratio, different swap sizes, earlyoom[1], and a few other variables, I still haven't found a happy medium between high memory utilization and low risk of swapping to death. vm.overcommit_ratio=0 is safe, but on systems where occasional swapping is tolerable and memory is limited (e.g. my laptop), I'd rather allow some overcommit.
The risk is that if many cold or unallocated pages get touched while the system is under high memory pressure, the system can become totally unresponsive. At the moment I use "Magic SysRq"+f to manually start the oom_killer, when possible. Obviously it's not a great solution. Is there some kernel tunable to keep the system responsive that I'm unaware of? What do you guys do for desktop/laptop systems?
What people often don't realise is that Linux distinguishes between dirty and clean memory, with dirty memory not reflected in swap, and clean memory already stored somewhere on disc, and program code is counted as clean memory that just happens to live somewhere other than swap.
Therefore, under memory pressure (especially if you set swappiness to zero), you will be preferentially swapping out your program code (because it is always clean) in preference to your program data. If you have no swap, then this is what causes the system to grind to a halt when the RAM is full- all your program code gets discarded from RAM, and nothing can run without reading it from disc again.
The recommendation to have a little bit of swap is absolutely fine. However, as the amount of RAM in your system increases, the penalty for running out of RAM increases as well. On larger systems (for example 256GB RAM), I recommend using something like EarlyOOM[1] to kill off tasks before the pathological swapping case occurs. Otherwise, you could end up with an unresponsive system. If you have lots of RAM, the kernel OOM killer waits far too late, and the system is already unresponsive.
By default, it'll start killing processes when free memory drops below 10%, though you can configure the threshold. I had the same problem for years, and then I started using earlyoom and I don't have to deal with it anymore.
Use earlyoom instead of relying on oom-killer.
https://github.com/rfjakob/earlyoom
To quote from the description:
> The oom-killer generally has a bad reputation among Linux users. This may be part of the reason Linux invokes it only when it has absolutely no other choice. It will swap out the desktop environment, drop the whole page cache and empty every buffer before it will ultimately kill a process. At least that's what I think what it will do. I have yet to be patient enough to wait for it.
[...]
> This made people wonder if the oom-killer could be configured to step in earlier: superuser.com , unix.stackexchange.com.
> As it turns out, no, it can't. At least using the in-kernel oom killer.
And earlyoom exists to provide a better alternative to oom-killer in userspace that's much more aggressive about maintaining responsivity.
Both changes have made my computers much more usable. Systems should designed to fail fast when memory is low instead of slowing down.