Does anyone have any benchmarks for this?

Since Kaldi is a toolkit, it can be used to build nearly any ASR architecture. See here [0] for a comprehensive comparison of the Word Error Rate of various architectures.

[0]: https://github.com/syhw/wer_are_we