This work (and related efforts like Img and Buildah) is a big deal.
Right now docker images and Dockerfiles are joined at the hip to the Docker daemon.
It works great for local development, but for hosted systems that run on containers, it's a dire mess. I have personally slammed head-first into Docker-in-Docker quagmires on Kubernetes and Concourse. Not knowing the particular arcane rites and having neither sufficient eye of newt nor sufficient patience to get it to work, I like everyone else in the universe gave up.
Not an acceptable state of affairs, given the many problems of Dockerfiles in themselves. Dockerfiles force an ugly choice. You can have ease of development or you can have fast, safe production images. But you can't really have both.
Kaniko is another step in the direction of divorcing docker images as a means of distributing bits from Dockerfiles as a means of describing docker images from Docker daemons as a means for assembling the images. All three are different and should no longer be conflated.
Disclosure: I work for Pivotal, we have a lot of stuff that does stuff with containers.
> Not knowing the particular arcane rites and having neither sufficient eye of newt nor sufficient patience to get it to work, I like everyone else in the universe gave up.
One thing I feel like more people need to know: Docker container-images are really not that hard to build "manually", without using Docker. Just because Docker itself builds images by repeatedly invoking `docker run` and then snapshotting the new layers, people think that's what their build tools need to do as well. No! You just need to have the files you want, and know the config you want, and the ability to build a tar file.
Here's a look inside an average one-layer Docker image:
$ mkdir busybox_image; cd busybox_image
$ docker pull busybox:latest
$ docker save busybox:latest | tar x
$ tree
.
├── 8ac48589692a53a9b8c2d1ceaa6b402665aa7fe667ba51ccc03002300856d8c7.json
├── f4752d3dbb207ca444ab74169ca5e21c5a47085c4aba49e367315bd4ca3a91ba
│ ├── VERSION
│ ├── json
│ └── layer.tar
├── manifest.json
└── repositories
1 directory, 6 files
• `repositories` contains the tag refs that will be imported when you `docker load` this archive;• `manifest.json` contains the declarations needed for the daemon to unpack the layers into its storage backend (just a listing of the layer.tar files, basically);
• the SHA-named config file specifies how to reconstruct a container from this archive, if you dumped it from a container (and I believe it's optional when constructing a "fresh" archive for `docker load`ing);
Each SHA-named layer directory contains:
• a `layer.tar` file, which is what you'd expect, e.g.:
-rwxr-xr-x 0 0 0 1037528 16 May 2017 bin/bash
• a `json` file, specifying (the patch of!) the container config that that layer creates. (If you're composing a docker image from scratch, you just need the one layer, so you don't have to worry about the patching semantics.)That's pretty much it. Make a directory that looks like that, tar it up, and `docker load` will accept it and turn it into something you can `docker push` to a registry. No need to have the privileges required to run docker containers (i.e. unshare(3)) in your environment. (And `docker load` and `docker push` work fine without a working Docker execution backend, IIRC.)
> `docker load` them, `docker push` to a registry. No need to run docker in docker.
This is missing the point.
The point of the tool is to do docker builds + pushes on Kubernetes (or inside other containerized environments) securely.
If you can `docker load/push`, that means you have access to a docker daemon. If that daemon is not docker-in-docker, you have root on the machine since access to the docker.sock is trivially the same as root.
As such, to do `docker load` + `docker push` in a containerized environment reasonably securely, you do need either docker-in-docker (which is probably insecure anyways if you need the container to be privileged still).
In addition, sure you can piece together a tarball, but the point of this tool is backwards compatibility with Dockerfiles, not to be able to manually piece things together.
I wasn't trying to argue against the existence of this product; I was, like I said, trying to make a separate point—that people don't realize it's very simple to manually construct Docker images, and that this kind of pipeline may be preferable to a Dockerfile-based one for some CI environments. (And, in such cases, you really didn't need to be waiting around for something like this to exist. You could have reached CI/CD nirvana long ago!)
> If you can `docker load/push`, that means you have access to a docker daemon.
Yes†, but by manually creating a container image, you've decoupled CI from CD: you no longer need to actually have a trustworthy execution sandbox on the machine that does the `docker push`-ing, because that machine never does any `docker run`-ing. It doesn't need, itself, to be docker-in-docker. It can just be a raw VM that has the docker daemon installed (sitting beside your K8s cluster), that receives webhook requests to download these tarballs, and then `docker load`s them and `docker push`es them.
---
† Though, consider:
• You can talk to a Docker registry without a Docker daemon. The Docker daemon<->Docker registry protocol is just a protocol. You can write another client for it. (Or, you can just carve the registry-client library out of Docker and re-use it as a Go library in your own Go code.)
• You can parse and execute every line of a Dockerfile just as `docker build` does, without a running Docker daemon, as long as none of those lines is a RUN command. Many application container-images (as opposed to platform container-images) indeed do no RUNing. You've already got a compiled static binary from earlier in your CI pipeline; you just want it "in Docker" now. Or you don't have a build step at all; you're just "composing" a container by e.g. burning some config files and a static website into an Nginx instance. In either of these cases, you might have a Dockerfile with no RUN at all.
Combine the two considerations, and you could design and implement a `docker`-compatible executable that supports `docker build` and `docker push`, without doing anything related to containers!
(The simplest way to do this, of course, would be to just take the docker client binary—which is, handily, already the same binary as the docker daemon binary—and make it so the Docker client spawns its own Docker daemon as a thread on each invocation. Add some logic for filesystem-exclusive locking of the Docker state dir; and remove all the logic for the execution driver. Remove the libcontainer dependency altogether. And remove `RUN` as a valid `docker build` command. There: you've got a "standalone Docker client" you can run unprivileged.)
https://github.com/google/go-containerregistry
https://github.com/google/containerregistry
- Our team has built something exactly like you're describing https://github.com/GoogleCloudPlatform/distroless
Dockerfiles without RUN commands are technically more correct: reproducible, much easier to inspect. However, its quite limiting for the existing corpus of Dockerfiles.
I like to think of kaniko as the (pull) + build + push decoupling of the docker monolith. Other tools, like cri-o, have implemented the complement (pull + run).
Disclaimer: I work on kaniko and some of these other tools at Google