What does HackerNews think of automaxprocs?
Automatically set GOMAXPROCS to match Linux container CPU quota.
if [[ -e /sys/fs/cgroup/cpu/cpu.cfs_quota_us ]] && [[ -e /sys/fs/cgroup/cpu/cpu.cfs_period_us ]]; then
GOMAXPROCS=$(perl -e 'use POSIX; printf "%d\n", ceil($ARGV[0] / $ARGV[1])' "$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)" "$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us)")
else
GOMAXPROCS=$(nproc)
fi
export GOMAXPROCS
This follows from how `docker --cpus` works (https://docs.docker.com/config/containers/resource_constrain...), as well as https://stackoverflow.com/a/65554131/207384 to get the /sys paths to read from.Or use https://github.com/uber-go/automaxprocs, which is very comprehensive, but is a bunch of code for what should be a simple task.
[automaxprocs]: https://github.com/uber-go/automaxprocs
The CPU limits = weird latency spikes also shows up a lot there, but it's technically a cgroups problem. (Set GOMAXPROCS=16, set cpu limit to 1, wonder why your program is asleep 15/16th of every cgroups throttling interval. I see that happen to people a lot, the key point being that GOMAXPROCS and the throttling interval are not something they ever manually configured, hence it's surprising how they interact. I ship https://github.com/uber-go/automaxprocs in all of my open source stuff to avoid bug reports about this particular issue. Fun stuff! :)
DNS also makes a regular appearance, and I agree it's not Kubernetes' fault, but on the other hand, people probably just hard-coded service IPs for service discovery before Kubernetes, so DNS issues are a surprise to them. When they type "google.com" into their browser, it works every time, so why wouldn't "service.namespace.svc.cluster.local" work just as well? (I also love the cloud providers' approach to this rough spot -- GKE has a service that exists to scale up kube-dns if you manually scale it down!)
Anyway, it's all good reading. If you don't read this, you are bound to have these things happen to you. Many of these things will happen to you even if you don't use Kubernetes!
You are right that by default, the logic that sets GOMAXPROCS is unaware of the limits you've set. That means GOMAXPROCS will be something much higher than your cpu limit, and an application that uses all available CPUs will use all of its quota early on in the cfs_period_us interval, and then sleep for the rest of it. This is bad for latency.