I would sincerely hope for a competitive AMD GPU for deep learning. But as long as it's a week-long journey with unknown ending to try to recompile TensorFlow to support ROCm, everyone I know in AI will firmly stick with NVIDIA and their production-proven drivers and CUDA APIs.

I wish AMD would offer something like NVIDIA's Inception program to gift some accelerators and GPUs to suitable C++ coders (like me) so that there's at least a few tutorials on the internet on how other people managed to successfully use AMD + ROCm for deep learning.

EDIT: And it seems ROCm doesn't even support any of those new RDNA2 accelerators or gaming GPUs: https://github.com/RadeonOpenCompute/ROCm/issues/1344

So this is great hardware, but absolutely useless unless you are big enough to write your own GPU drivers from scratch ~_~

AMD's not nowhere. https://rocmdocs.amd.com/en/latest/Deep_learning/Deep-learni... shows what should be a followable happy path to getting TensorFlow going (2 year old TF 1.15, and a 2.2beta). I'm curious what is prickly or hard about it.

IMO the deep learning folk need to be working more actively towards the future. The CUDA free ride is amazing, but AMD's HIP already does a good job being CUDA compliant in a general sense. But CUDA also sort of encompasses the massive collection of libraries that Nvidia has written to accelerate a huge amount of use cases. Trying to keep pace with that free-ride is hard.

My hope is eventually we start to invest in Vulkan Compute. Vulkan is way way way harder than CUDA, but it's the only right way I can see to do things. Getting TensorFlow & other libraries ported to run atop Vulkan is a herculean feat, but once there's a start, I tend to believe most ML practitioners won't have to think about the particulars, and I think the deep engineering talent will be able to come, optimize, improve the Vulkan engines very quickly, rapidly be able to improve whatever it is we start with.

It's a huge task, but it just seems like it's got to happen. I don't see what alternative there is, long term, to starting to get good with Vulkan.

> My hope is eventually we start to invest in Vulkan Compute.

Vulkan is for graphics. Khronos' compute standard that's most similar to Cuda is SYCL. Both compile shaders to SPIR-V though.

> Vulkan is for graphics.

Incorrect. Quoting the spec:

> Vulkan is an API (Application Programming Interface) for graphics and compute hardware.

Vulkan has compute shaders[1], which are generally usable. Libraries like VkFFT[2] demonstrate basic signal processing in Vulkan. There are plenty of other non-graphical Compute Shader examples, and this is part of the design of Vulkan (and also WebGPU). Further, there is a Vulkan ML TSG (Technical Subgroup)[3], which is supposed to be working on ML. Even Nvidia is participating, with extensions like VK_NV_cooperative_matrix, which specifically target the ml tensor cores. A more complex & current example, we see works like Google's IREE, which allow inference/Tensorflow Lite execution on a variety of drivers, including Vulkan[4], which has broad portability across hardware & fairly decent performance, even on mobile chips.

There's people could probably say this better/more specifically, but I'll give it a try: Vulkan is, above all, an general standard for modelling, dispatching & orchestrating work usually on a GPU. Right now that usage is predominately graphics, but that is far from a limit. The ideas of representing GPU resources, dispatching/queueing work are generic, apply fairly reasonably to all GPU systems, and can model any workload done on a GPU.

A good general introduction to Vulkan Compute is this great write up here[5]: https://www.duskborn.com/posts/a-simple-vulkan-compute-examp...

> Khronos' compute standard that's most similar to Cuda is SYCL.

SYCL is, imo, the opposite of where we need to go. It's the old historical legacy that CUDA has, of writing really dumb ignorant code & hoping the tools can make it run well on a GPU. Vulkan, on the other hand, asks us to consider deeply what near-to-the-metal resources we are going to need, and demands that we define, dispatch, & complete the actual processing engines on the GPU that will do the work. It's a much much much harder task, but it invites in fantastic levels of close optimization & tuning, allows for far more advanced pipelining & possibilities. If the future is good, it should abandon lo-fi easy options like SYCL and CUDA, and bother to get good at Vulkan, which will allow us to work intimately with the GPU. This is a relationship worth forging, and no substitutes will cut it.

[1] https://vkguide.dev/docs/gpudriven/compute_shaders/

[2] https://github.com/DTolm/VkFFT

[3] https://www.khronos.org/assets/uploads/developers/presentati...

[4] https://google.github.io/iree/deployment-configurations/gpu-...

[5] https://www.duskborn.com/posts/a-simple-vulkan-compute-examp...