I wish Whisper offered speaker diarization. That would be a full game changer for the speech-to-text space.

whisperX has diarization.

https://github.com/m-bain/whisperX