What does HackerNews think of petals?

🌸 Run 100B+ language models at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Language: Python

There's Petals[0], but the problem seems to be that the entire training data needs to be loaded into VRAM and can't be split up across devices.

[0] https://github.com/bigscience-workshop/petals

good opportunity to try the free and totally open source Big Science Petals chat: https://chat.petals.dev/ ... Try out Stable Beluga 2 70B

I am currently running my 3090 GPU on there to help out, you can check out https://health.petals.dev/

If you have a spare GPU, consider contributing: https://github.com/bigscience-workshop/petals . I am not associated with them.

If you have a lot of money (but not H100/A100 money), get 4090s as they're currently the best bang for your buck on the CUDA side (according to George Hotz). If broke, get multiple second hand 3090s. https://timdettmers.com/2023/01/30/which-gpu-for-deep-learni.... If unwilling to spend any money at all and just want to play around with llama70b, look into petals https://github.com/bigscience-workshop/petals
Could this work well with distributed solutions like petals?

https://github.com/bigscience-workshop/petals

I don't understand how petals can work though. I thought LLMs were typically quite monolithic.

I read about Petals (1) some time ago here on HN. There are surely others too, but I don't remember the names.

1. https://github.com/bigscience-workshop/petals

Yes there is petals/bloom https://github.com/bigscience-workshop/petals but it's not so great. Maybe it will improve or a better one will come.
My understanding is that can work for model inference but not for model training.

https://github.com/bigscience-workshop/petals is a project that does this kind of thing for running inference - I tried it out in Google Collab and it seemed to work pretty well.

Model training is much harder though, because it requires a HUGE amount of high bandwidth data exchange between the machines doing the training - way more than is feasible to send over anything other than a local network connection.

Right, so sort of like https://github.com/bigscience-workshop/petals but for the training phase. I suppose different training runs could be proposed via a RFC type of procedure. Then it’s not only the open source model maintainers that put the effort, but also supporters of the project can “donate” their hardware resources.
The BigScience team (a working group of researchers that trained the BLOOM-176B LLM last year) released Petals [0][1] which allows distributed inference and fine-tuning of BLOOM, with the option to pick a custom model + private swarm. SWARM [2][3] is a WIP from yandex and UW that shares some of the same codebase, but is for distributed training.

[0] https://petals.ml/ [1] https://github.com/bigscience-workshop/petals [2] https://github.com/yandex-research/swarm [3] https://twitter.com/m_ryabinin/status/1625175933492641814

Hey look, a decentralized chatbot running BLOOMZ-176B (an open source LLM about the size of GPT-3)

http://chat.petals.ml

I'm contributing to the project by running a node in my garage with a single RTX 3060ti in it, and you can too: https://github.com/bigscience-workshop/petals

It's early days, but the tech is super promising.

Actually there is the Petals [0] project which does that (I think it's more for inference and fine-tuning, not full training) and might be the best current approach for "self-hosted LLM", though it defeats the point of privacy/anonymity since everyone in the distributed swarm have access to your data

[0] https://github.com/bigscience-workshop/petals

Would it be possible to integrate RLHF with BLOOM Petals [0]? Petals lets you run the huge BLOOM model (175B params) with a distributed swarm of machines. iirc it already supports fine-tuning BLOOM, so.. maybe?

[0] https://github.com/bigscience-workshop/petals

https://github.com/bigscience-workshop/petals

Since my other account is shadow banned for some unexplained reason, I just wanted to mention the petal project. It's an attempt to bittorrent style distribute the load of running these large models. Good luck!