What does HackerNews think of VkFFT?
Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
Incorrect. Quoting the spec:
> Vulkan is an API (Application Programming Interface) for graphics and compute hardware.
Vulkan has compute shaders[1], which are generally usable. Libraries like VkFFT[2] demonstrate basic signal processing in Vulkan. There are plenty of other non-graphical Compute Shader examples, and this is part of the design of Vulkan (and also WebGPU). Further, there is a Vulkan ML TSG (Technical Subgroup)[3], which is supposed to be working on ML. Even Nvidia is participating, with extensions like VK_NV_cooperative_matrix, which specifically target the ml tensor cores. A more complex & current example, we see works like Google's IREE, which allow inference/Tensorflow Lite execution on a variety of drivers, including Vulkan[4], which has broad portability across hardware & fairly decent performance, even on mobile chips.
There's people could probably say this better/more specifically, but I'll give it a try: Vulkan is, above all, an general standard for modelling, dispatching & orchestrating work usually on a GPU. Right now that usage is predominately graphics, but that is far from a limit. The ideas of representing GPU resources, dispatching/queueing work are generic, apply fairly reasonably to all GPU systems, and can model any workload done on a GPU.
A good general introduction to Vulkan Compute is this great write up here[5]: https://www.duskborn.com/posts/a-simple-vulkan-compute-examp...
> Khronos' compute standard that's most similar to Cuda is SYCL.
SYCL is, imo, the opposite of where we need to go. It's the old historical legacy that CUDA has, of writing really dumb ignorant code & hoping the tools can make it run well on a GPU. Vulkan, on the other hand, asks us to consider deeply what near-to-the-metal resources we are going to need, and demands that we define, dispatch, & complete the actual processing engines on the GPU that will do the work. It's a much much much harder task, but it invites in fantastic levels of close optimization & tuning, allows for far more advanced pipelining & possibilities. If the future is good, it should abandon lo-fi easy options like SYCL and CUDA, and bother to get good at Vulkan, which will allow us to work intimately with the GPU. This is a relationship worth forging, and no substitutes will cut it.
[1] https://vkguide.dev/docs/gpudriven/compute_shaders/
[2] https://github.com/DTolm/VkFFT
[3] https://www.khronos.org/assets/uploads/developers/presentati...
[4] https://google.github.io/iree/deployment-configurations/gpu-...
[5] https://www.duskborn.com/posts/a-simple-vulkan-compute-examp...
-Compute and memory-bound problems. Importance of proper memory management and why it is important to have coalesced global memory accesses and avoid shared memory bank conflicts.
-VkFFT, Vulkan Spirit and memory layout optimizations implemented there
-There was a coding session in the end where I showed a simple out-of-place transposition routine in Vulkan. It was launched on Nvidia GTX 1660Ti, Intel UHD 610 and AMD Radeon Vega RX 10 to illustrate the differences in bandwidth and memory coalescing of the different architectures. Full code with comments can be found here: https://github.com/DTolm/VulkanComputeSamples-Transposition
Hope this can be interesting to people willing to try Vulkan for computations!
P.S. I have added double and half precision support (with precision verification) to VkFFT and a choice to perform FFTs using lookup tables. VkFFT now also has a command line interface and it is possible to build cuFFT benchmark and launch it right after VkFFT one.