I've often wondered why a service doesn't exist that allows you to rent out your graphics card for the large data processing needed for training models. Like mining bitcoin except you are doing something actually useful and getting paid actual money for it. Example:

- Company Alpha needs $40,000,000 worth of cloud computing for their training model - Company Beta provides them said cloud computing for $30,000,000 from their pool of connected graphics cards - Individuals can connect their computers to the Company Beta network and receive compensation for doing so. In total $20,000,000 is distributed.

Company Alpha gets their cloud computing done for cheap, Company Beta pockets the $10,000,000 difference for running a network, the individuals make money with their graphics cards, except this time it's actual United States Dollars. What am I missing here that would make this type of business unfeasible?

Surprised no one has commented this but the latency requires the model to be trained in tiny fragments on each device which is currently a field of research that is being explored. As it stands now basically all of a model needs to be loaded into memory.

There’s a whole field here and people exploring this problem, colloquially solving this would enable Federated Learning and whoever figures this out will far eclipse OpenAI (if it’s ever solved).

Where can I read more about this field of research?

https://arxiv.org/ has a ton of papers on it.

Do you have some good search terms to get started down the rabbit hole?

Probably the biggest recent result: https://arxiv.org/abs/2209.04836 (author thread: https://twitter.com/SamuelAinsworth/status/15697194946455265...)

See also: https://github.com/learning-at-home/hivemind

and more to OP's incentive structure: https://docs.bittensor.com/

Latter two intend to beat latency with Mixture-of-Expert models (MoEs). If the results of the former hold, it shows that with a simple algorithmic transformation you can merge two independently trained models in weight-space and have performance functionally equivalent to a model trained monolithically.