What does HackerNews think of HIP?
HIP: C++ Heterogeneous-Compute Interface for Portability
Newer backends for AI frameworks like OpenXLA and OpenAI Triton directly generate GPU native code using MLIR and LLVM, they do not use CUDA apart from some glue code to actually load the code onto the GPU and get the data there. Both already support ROCm, but from what I've read the support is not as mature yet compared to NVIDIA.
both Cuda and the Metal shader language are C++, so is OpenCL since 2.0 (https://www.khronos.org/opencl/), so is AMD ROCm's HIP (https://github.com/ROCm-Developer-Tools/HIP), so is SYCL (https://www.khronos.org/sycl/)? C++ is pretty much the language that runs most on GPUs.
> no vector instructions,
There's a thousand different possibilities for SIMD in C++, from #pragma omp simd, to libs such as std::experimental::simd (https://en.cppreference.com/w/cpp/experimental/simd/simd), Eve (https://github.com/jfalcou/eve), Highway (https://github.com/google/highway), Vc (https://github.com/VcDevel/Vc)...