Explicitly setting GOMAXPROCS is probably the cleanest way to limit CPU among the runtimes that are out there, however. For example, if you set requests = 1, limits = 1, GOMAXPROCS=1, then you will never run into the latency-increasing cfs cpu throttling; you would be throttled if you used more than 1 CPU, but since you can't (modulo forks, of course), it won't happen. There is https://github.com/uber-go/automaxprocs to set this automatically, if you care.

You are right that by default, the logic that sets GOMAXPROCS is unaware of the limits you've set. That means GOMAXPROCS will be something much higher than your cpu limit, and an application that uses all available CPUs will use all of its quota early on in the cfs_period_us interval, and then sleep for the rest of it. This is bad for latency.