There is only limited empirical evidence of AMD closing the gap that NVidia has created in the science or ML software. Even when considering pytorch only, the engineering effort to maintain specialized ROCm along with CUDA solutions is not trivial (think flashattention, or any customization that optimizes your own model). If your GPUs only need a simple ML workflow all times for a few years nonstop, maybe there exist corner cases where the finances make sense. It is hard for AMD now to close the gap across the scientific/industrial software base of CUDA. NVidia feels like a software company for the hardware they produce; luckily they make the money from hardware thus cannot lock the software libraries.

(Edited “no” to limited empirical evidence after a fellow user mentioned El Capitan.)

ROCm has HIP (1) which is a compatibility layer to run CUDA code on AMD GPUs. In theory, you only have to adjust #includes, and everything should just work, but as usual, reality is different.

Newer backends for AI frameworks like OpenXLA and OpenAI Triton directly generate GPU native code using MLIR and LLVM, they do not use CUDA apart from some glue code to actually load the code onto the GPU and get the data there. Both already support ROCm, but from what I've read the support is not as mature yet compared to NVIDIA.

1: https://github.com/ROCm-Developer-Tools/HIP