Well, if they can make CUDA and Wayland work simultaneously...

(Baseline Nvidia drivers without CUDA already work fine with Wayland).

From "Wayland does not support screen savers" (2023) https://news.ycombinator.com/item?id=37385627 :

> the NVIDIA proprietary Linux module for NVIDIA GPUs hardware video decode doesn't work on Wayland; along with a number of other things: "NVIDIA Accelerated Linux Graphics Driver README and Installation Guide > Appendix L. Wayland Known Issues" https://download.nvidia.com/XFree86/Linux-x86_64/535.54.03/R...

What is NVIDIA's annual developer salary commitment to non-HPC Linux compared to AMD with ROCm and Intel?

(EDIT)

Most nvidia driver Linux kernel module re-packaging projects are not on GitHub which supports Sponsors.yml for specifying how to donate.

I don’t know, but ROCm is a joke and unusable. 1200 lines of some very complicated C++ for a simple FFT… That’s ridiculous. And nobody seems to be doing anything about it — and then they wonder, why they have 1% or so market share in AI.

It’s not a hardware problem, it is an API design problem.

It has nearly the same interface design as cuFFT, admittedly with less documentation. I’m not sure I understand this complaint versus the competition. https://rocm.docs.amd.com/projects/hipFFT/en/latest/api.html

If you’re complaining about cuFFTs design, how BLAS like interfaces are outdated, and a lack of a proper hypothetical heterogenous array programming language?, sure. But it’s not much better in Nvidia land.

No, I mean for writing new code. I know that FFT is already in the libraries, I was using it as an example. About 200 LoC in CUDA, and 1200 lines of code in ROCm. ROCm is boilerplate-ridden and hardly usable for new code.
That's a contrived example, then. Also because there's already an optimized version of FFT in their libraries.

At least with open-source AMD code, it can be fixed with Pull Requests.

FWIU, OpenCL is insufficient, CUDA is the closed-source fanboy favorite that the industry can't move away from, and Intel OneAPI may be the most portable but not the most performant.

Impact-wise, contributing to the ROCm and OneAPI tools to help them be more competitive is in consumers' interest.

The industry can't move from CUDA because you can easily write anything in CUDA, as opposed to ROCm. I needed once to write a lattice Boltzman CFD simulation, it is so much easier in CUDA compared to ROCm, I wouldn't even start the latter unless I am forced to. Eveything takes 5x amount of code and 5x amount of time.

ROCm is bad API design, and no amount of gradual tinkering will save it. I wonder why AMD can't design something better. HIP exists, but it is a "lesser CUDA" (CUDA is actually a mediocre API, we can design things much better than that now).

What was the difference in runtime performance, and did you try CuPy?

https://github.com/cupy/cupy :

> CuPy is a NumPy/SciPy-compatible array library for GPU-accelerated computing with Python. CuPy acts as a drop-in replacement to run existing NumPy/SciPy code on NVIDIA CUDA or AMD ROCm platforms.

Projects using CuPy: https://github.com/cupy/cupy/wiki/Projects-using-CuPy