>PyTorch/Nvidia GPUs easily overtaking TensorFlow/Google TPUs.

TF lost to PyTorch, and this is Google’s fault - TF APIs are both insane and badly documented.

But nothing comes close to performance of Google’s TPU exaflop mega-clusters. Nvidia is not even in the same ballpark.

NVidia A100 DGX Superpod is equivalent to exaflop TPU pod. No?

Perhaps NVidia is close now. A bit hard to say without specific hardware info.

Google’s were already available 5-6 years ago. And probably current versions are even faster. They have super fast optical interconnects in torus or hyper-torus configuration that allow synchronous weight updates on 1k+ TPUs. This leads to dramatically lower training times and less noise, which leads to better-performing models. I.e. you can’t even train model to the same level on traditional GPUs.

Once they started to get deployed, models that trained for 3 weeks on 30 GPUs were trained in 30 minutes on 1k TPU cluster.

All this reiterated main point in the article - Google had tremendous lead and wasted it due to the lack of vision and product execution ability.

GPU cluster scaling has come a long way. Just check out the scaling plot here: https://github.com/NVIDIA/Megatron-LM