It would be cool if there was a platform to crowd source compute resources to train stuff like this so that regular people (without 7 figure budgets) can have access to these models which are becoming increasingly out of reach to the general public.
Here is a recent paper (disclaimer: I am the first author) named "Learning@home" which proposes something along these lines. Basically, we develop a system that allows you to train a network with thousands of "experts" distributed across hundreds or more of consumer-grade PCs. You don't have to fit 700GB of parameters on a single machine and there is significantly less network delay as for synchronous model parallel training. The only thing you sacrifice is the guarantee that all the batches will be processed by all required experts.
You can read it on ArXiv https://arxiv.org/abs/2002.04013v1 or browse the code here: https://github.com/learning-at-home/hivemind. It's not ready for widespread use yet, but the core functionality is stable and you can see what features we are working on now.