Can someone try to explain this architecture / whisper reimplementation? Thanks.

The original Whisper implementation from OpenAI uses the PyTorch deep learning framework. On the other hand, faster-whisper is implemented using CTranslate2 [1] which is a custom inference engine for Transformer models. So basically it is running the same model but using another backend, which is specifically optimized for inference workloads.

[1] https://github.com/OpenNMT/CTranslate2