What does HackerNews think of Whisper?
High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
And another thing: nVidia forbids usage of GeForce cards in data centers, while AMD allows that. I don’t know how specifically they define datacenter, whether it’s enforceable, or whether it’s tested in courts of various jurisdictions. I just don’t want to find out answers to these questions at the legal expenses of my employer. I believe they would prefer to not cut corners like that.
I think nVidia only beats AMD due to the ecosystem: for GPGPU that’s CUDA (and especially the included first-party libraries like BLAS, FFT, DNN and others), also due to the support in popular libraries like TensorFlow. However, it’s not that hard to ignore the ecosystem, and instead write some compute shaders in HLSL. Here’s a non-trivial open-source project unrelated to CAE, where I managed to do just that with decent results: https://github.com/Const-me/Whisper That software even works on Linux, probably due to Valve’s work on DXVK 2.0 (a compatibility layer which implements D3D11 on top of Vulkan).
It took about 5 minutes to process each hour on a 1080ti GPU.
There are a few wrappers available with GUI like https://github.com/Const-me/Whisper
My girlfriend asked me if I could transcribe some audio files for her with my "programming stuff". I immediately thought of Whisper from OpenAI.
I first used the official CLI tool. With the largest model it took long 8 hours to transcribe a 30 min long file. I noticed it was running on the CPU - tried switching it to use the GPU instead with no luck. Running it on WSL was probably not helping.
Then I found this gem: https://github.com/Const-me/Whisper A C++ Windows implementation of Whisper. I opened the program, fed it with the largest model and the file. The transcript was done in 4 minutes, instead of 8 hours... Downside? The program has a GUI, lol.
Of course, I could probably get the CLI tool to run on the GPU with some tinkering and installing some Nvidia packages for Whisper to use. But frankly, I have so little experience with that kind of stuff, that installing the Windows implementation was a much easier choice.
These NVIDIA A40 GPUs mentioned on the page you’ve linked cost $4000 (Amazon) to $10000 (Dell), deliver similar performance to Intel A770 which cost $350. That’s 1 order of magnitude difference in cost efficiency.
They sometimes open their older models: https://github.com/Const-me/Whisper
> As for your specific computer, with 64 GB of RAM and a high-performance GPU like the GeForce 1080 Ti, it should have sufficient resources to run a language model like me for many common tasks.
Based on the models open sourced by OpenAI, they are using PyTorch and CUDA. This means their stack requires nVidia GPUs. I think the main reason for their high costs is a single sentence in the EULA of GeForce drivers: https://www.datacenterdynamics.com/en/news/nvidia-updates-ge...
It’s technically possible to port their GPGPU code from CUDA somewhere else. Here’s a vendor-agnostic DirectCompute re-implementation of their Whisper model: https://github.com/Const-me/Whisper
On servers, DirectCompute is not great ‘coz Windows server licenses are expensive. Still, I did that port alone, and spent couple weeks doing that.
OpenAI probably has resources to port their inference to vendor-agnostic Vulkan Compute, running on Linux servers equipped with reasonably-priced AMD or Intel GPUs. For instance, Intel A770 16GB only costs $350, but delivers similar performance to nVidia A30 which costs $16000. Intel consumes more electricity but not by much, 225W versus 165W. That’s like 40x difference in cost efficiency of that chat.