Whisper as an API is great, but having to send the whole payload upfront is a bummer. Most use cases I can build for would want streaming support.
Like establish a WebRTC connection and stream audio to OpenAI and get back a live transcription until the audio channel closes.
I've ran Whisper locally via [1] with one of the medium sized models and it was damn good at transcribing audio from a video of two people having a conversation.
I don't know exactly what the use case is where people would need to run this via API; the compute isn't huge, I used CPU only (an M1) and the memory requirements aren't much.