Not the OP, but we use Google Container Engine (hosted Kubernetes), with Salt for the non-GKE VMs.

This is needed because K8s is not mature enough to host all the things. In particular, stateful sets are still in beta. Not sure if I would trust K8s to run our production databases even with stateful sets. We've had K8s kill pods for unknown reasons, for example, and the volume handling has also been historically a bit flaky. Fine for completely redundant, stateless containers, less fine for stateless ones.

Sure that makes sense. Statefuf especially databases would make me nerovous as well. am curious, what are the advantages of GKE over deploying K8 on AWS? Or were you already GCE?

Our production setup is actually still on Digital Ocean; we're currently testing GCP (GKE + Salt-managed VMs) with a staging cluster to see how it works in practice.

Before GCP, we set up a staging cluster on AWS. It was okay. The biggest pain point is that AWS's VPC does not match Kubernetes' requirement for per-pod IPs, so anyone who installs on AWS ends up setting up their own overlay network (such as Flannel or Calico). That's a big downside, because VPC isn't fun to deal with.

You don't really notice how antiquated AWS is until you move to GCP. Everything — networking, UI, CLI tools, etc. — feels more modern, sleeker and less creaky.

One area where GCP is particularly superior is networking. Google gives you a Layer 3 SDN (virtual network) that's more flexible than the rigid subnet-carving you need to do with AWS VPC. The tag-based firewall rules are also a breath of fresh air after AWS's weird, aging "security group" model.

It's not all innovation, of course. Some services are carbon-copy clones of AWS counterparts: Pub/Sub is essentially SQS, Cloud Storage is S3, and so on, with only minor improvements along the way. Cloud Storage, for example, doesn't fix S3's lack of queryability. I've also been distinctly unimpressed with the "StackDriver"-branded suite of services, which do things like logging and metrics. I don't know why anyone in this day and age would bother making something that doesn't compare favourably to Prometheus.

I should add that the security situation on GKE could be better:

* GKE's Docker containers run in privileged mode.

* There's still no role-based authentication.

* Containers end up getting access to a lot of privileged APIs because they inherit the same IAM role as the machine.

* You can't disable the automatic K8s service account mount [1].

Another scary thing, unrelated to GKE, is that the VMs run a daemon that automatically creates users with SSH keys. As a team member, you can SSH into any box with any user name. Not sure if real security weakness, but I don't like it.

I love the CLI tools ("gcloud" and so on), much nicer than awscli etc, and hopefully much friendlier to junior devs.

[1] You can disable the mount by mounting an "emptyDir" on top, but that makes K8s think the pod has local data, which causes the autoscaler to refuse to tear down a node. Fortunately there's an option coming to disable the service account.

Thanks for the detailed response. You mentioned "GKE's Docker containers run in privileged mode", do you why this is?

Not sure. Speculating here, but it's possible that it's because they're still working on building out GKE itself.

GKE is basically a wizard that just runs a pre-built image for the master and node VMs, and comes with some support for upgrades. There are very few settings [1] aside from the machine type. So it's pretty rudimentary. You'd think that GKE would come with a flashy dashboard for pods, deployments, user management, autoscaling and so on, but you're actually stuck with kubectl + running the Dashboard app [2] as a pod, which, while nice enough, is not integrated into the Google Cloud Platform web UI at all. Kubernetes runs fine, but GKE itself feels unfinished.

Anyway, a lot of people were asking for privileged mode back in 2015 [3], and it kind of looks like they turned it on by default rather than developing a setting for it.

[1] http://i.imgur.com/6pGRzl9.png

[2] https://github.com/kubernetes/dashboard

[3] https://github.com/kubernetes/kubernetes/issues/12048