Here is a recent paper (disclaimer: I am the first author) named "Learning@home" which proposes something along these lines. Basically, we develop a system that allows you to train a network with thousands of "experts" distributed across hundreds or more of consumer-grade PCs. You don't have to fit 700GB of parameters on a single machine and there is significantly less network delay as for synchronous model parallel training. The only thing you sacrifice is the guarantee that all the batches will be processed by all required experts.

You can read it on ArXiv https://arxiv.org/abs/2002.04013v1 or browse the code here: https://github.com/learning-at-home/hivemind. It's not ready for widespread use yet, but the core functionality is stable and you can see what features we are working on now.