Whisper is awesome, but managing it in production environment is not easy. I am waiting OpenAI (or someone else) to offer a API with a Real Time Factor of < 1. RTF is inference time/duration of the file. We can really use a that.

Getting a server running is easy if you use https://github.com/ahmetoner/whisper-asr-webservice as a guide. It's then a REST API which you post the file to and get the transcription in return.

But I don't know what you consider being "in production". If it's for internal use then it is enough.

Here are some comparisons of running it on GPU vs CPU According to https://github.com/MiscellaneousStuff/openai-whisper-cpu the medium model needs 1.7 seconds to transcribe 30 seconds of audio when run on a GPU.