With GitHub Container Repo, Podman, Colima, etc., the Dockerfile format is maybe the only one that will stick for a long time. I don't think Docker is going away, but I think it is such a great format that there is no need to reinvent it. Same for docker-compose.

The Dockerfile format is, in my opinion, crap. It’s extremely useful, and it’s fairly straightforward to kludge together a container build process using it, but:

It’s very hard to get reproducible output. It’s even fairly hard to get output where the inputs are well controlled.

It can’t do a clean crossbuild — you have to run the container to build it. As a side effect, you need all the tooling to install things into the container to be in the container. (Yes, there are workarounds. They’re ugly.)

It leaves trash behind. You need to fight with it to even get /tmp to be temporary.

It has no usable efficient way to supply large input files. You can bind-mount into a RUN, but getting permissions right when doing so is an uphill battle.

It is inherently not possible for Dockerfiles, as a format, to generate reproducible outputs/images. You can run whatever command you want in a Dockerfile. Docker engine itself has no way of knowing whether that command's behavior is reproducible--and in turn, has no way to guarantee reproducible images from a Dockerfile.

The format and engine could try a lot harder to make improved reproducibility the default.

As a trivial example, network access for RUN should be opt-in, not opt-out. The fact that the easiest ways to pull data in involve things like RUN wget is a design error.

A much better approach would be to have packages that install with as little script involvement as possible. Most Linux images are put together using rpm or deb packages and, other than pre/post-install scripts (which are not usually particularly necessary), package installation is fundamentally reproducible and does not require running the image. A good image building system IMO would mostly look more like:

INSTALLPACKAGES foo bar baz

And dependencies would get solved and packages installed, reproducibly.

> The fact that the easiest ways to pull data in involve things like RUN wget is a design error

Why is that? You can perfectly get reproducible build even using wget. You wget your file, get its checksum and compare it to an expected checksum. Boom, reproducible wget.

Honestly I've always found reproducibility harder to enforce when using Linux package managers (at least with apt-get which messes stuff up with timestamps)

The easy way to download something in a Dockerfile:

     RUN wget URL
Your better way?

    RUN wget URL && \
        if [[ "$(sha256sum )" != "the hash" ]]; then \
            # Wow, I sure hope I spelled this right!  Also, can a comment end with \
            echo "Hmm, sha256 was wrong.  Let's log the actual hash we saw.  Oh wait, forgot to save that.  Run sha256sum again?" 2>&1 \
            echo "Hmm, better not forget to fail!" 2>&1 \
            exit 1 # Better remember that 1 is failure and 0 is success! \
        fi
An actual civilized solution would involve a manifest of external resources, a lockfile, and a little library of instructions that the tooling could use to fetch or build those external resources. Any competent implementation would result in VASTLY better caching behavior than Docker or Buildah can credibly implement today -- wget uses network resources and is usually slow, COPY is oddly slow, and the tooling has no real way to know that the import of a file could be cached even if something earlier in the Dockerfile (like "apt update"!) changed.

Think of it like modern cargo or npm or whatever, but agnostic to the kind of resource being fetched.

If there was a manifest and lockfile, it really would not be that hard to wire apt or dnf up to it so that a dependency solver would run outside the container, fetch packages, and then install them inside the container. Of course, either COPY would need to become faster or bind mounts would have to start working reliably. Oh well.

> Honestly I've always found reproducibility harder to enforce when using Linux package managers

Timestamps could well cause issues (which would be fixable), but it's not conceptually difficult to download .rpm or .deb files and then install them. rpm -i works just fine. In fact, rpm -i --root arguably works quite a bit better than docker/podman build, and it would be straightforward to sandbox it.

We have built something very similar to what you are describing: https://github.com/chainguard-dev/apko