I'm in a devops role where we actually reroll the Tensorflow whl in-house (to get a few tweaks like specific AVX flags turned on), but because the rest of our deployment is apt/debs, we then turn around and wrap that whl in a deb using Spotify's excellent dh-virtualenv:

https://github.com/spotify/dh-virtualenv

There's no expertise for Bazel in-house; when we run the build, it seems to fail all its cache hits and then spend 12-13h in total compiling, much of which appears to be recompiling a specific version of LLVM.

Every dependency is either vendored or pinned, including some critical things that have no ABI guarantees like Eigen, which is literally pinned to some random commit, so that causes chaos when other binaries try to link up with the underlying Tensorflow shared objects:

https://github.com/tensorflow/tensorflow/blob/master/third_p...

And when you go down a layer into CUDA, there are even more support matrices listing exact known sets of versions of things that work together:

https://docs.nvidia.com/deeplearning/tensorrt/support-matrix...

Anyway, I'm mostly just venting here. But the whole thing is an absurd nightmare. I have no idea how a normal distro would even begin to approach the task of unvendoring this stuff and shipping a set of reasonable packages for it all.

The Tensorflow maintainers themselves even kind of admit the futility of it all when they propose that the easiest thing to do is just install your app on top of their pre-cooked Tensorflow GPU docker container:

https://www.tensorflow.org/install