Poettering says that PID 1 has special requirements. One of these is killing "zombie" processes that have been abandoned by their calling session. This is a real problem for Docker since the application runs as PID 1 and does not handle the zombie processes. For example, containers running the Oracle database can end up with thousands of zombie processes.

Why does Poettering keep claiming this when he's the one who submitted the patch that adds the PR_SET_CHILD_SUBREAPER prctl(2) [0] functionality?

[0] http://man7.org/linux/man-pages/man2/prctl.2.html

I guess he's saying, that you can't just take any random binary and run it in a Docker container, because if that binary spawns a lot of children but does not wait for them, then you'll have a lot of zombies.

Docker could run a minimal pid1 in each container to address this. Though if this had been a big issue I guess this would have been already fixed.

Naturally, a proof of concept of the problem would be great. (Let's say a Dockerfile.)

It has been a reasonably big issue. E.g. I kept seeing zombies with Consul for a while until we realised that every single Consul Docker container on Dockerhub just had Consul run as pid 1 in the container (this is a while ago, no idea if that's still the case), without realising that Consul health checks then could end up as zombies if you weren't very careful about how you wrote them (e.g. typical example: Spawning curl from a shell script, with a timeout on the health check that was shorter than any timeouts on the curl requests).

It's usually fairly simple to fix (e.g. for Consul above, I raised it with the Consul guys and they said they'd look at adding waiting on children to it as a precaution - it's just a couple of lines -, but people building containers could also introduce a minimal init, or you can write your health checks to guard against it), but it happens all over the place, and people are often unaware and so not on the lookout for it and it may not be immediately obvious.

The reason I raised it as an issue for Consul, for example, even though it wasn't really their fault, but an issue with the containers, is that people need to be aware of the problem when packaging the containers, need to be aware that a given application may spawn children, and that they may not wait for them. Even a lot of people aware of the zombie issue end up packaging software that they didn't realise where spawning child processes that could end up as zombies (in this case, it took running it in a container without a proper pid 1, using health checks which not everyone will do, and writing the health checks in a particular way in order to notice the effects).

Thankfully there are a number of tiny little inits. E.g. there's suckless sinit [1], Tini[2] , and here's a tiny little proof of concept Go init [3] I wrote (though frankly, suckless or Tini compiled with musl will give you a much smaller binary) as what little you actually need to do is very trivial.

[1] http://git.suckless.org/sinit

[2] https://github.com/krallin/tini

[3] https://gist.github.com/vidarh/91a110792c86d6c3bb41