It might require too much work for what you are looking for, but the wav2letter library is the best real-time transcription OSS I have found by a considerable margin.

Out of interest, did you try Nemo? https://github.com/NVIDIA/NeMo