I'm really excited about this project and I think it could be really disruptive. It is organized by LAION, the same folks who curated the dataset used to train Stable Diffusion.
My understanding of the plan is to fine-tune an existing large language model, trained with self-supervised learning on a very large corpus of data, using reinforcement learning from human feedback, which is the same method used in ChatGPT. Once the dataset they are creating is available, though, perhaps better methods can be rapidly developed as it will democratize the ability to do basic research in this space. I'm curious regarding how much more limited the systems they are planning to build will be compared to ChatGPT, since they are planning to make models with far less parameters to deploy them on much more modest hardware than ChatGPT.
As an AI researcher in academia, it is frustrating to be blocked from doing a lot of research in this space due to computational constraints and a lack of the required data. I'm teaching a class this semester on self-supervised and generative AI methods, and it will be fun to let students play around with this in the future.
Here is a video about the Open Assistant effort: https://www.youtube.com/watch?v=64Izfm24FKA
> it is frustrating to be blocked from doing a lot of research in this space due to computational
Do we need a SETI@home-like project to distribute the training computation across many volunteers so we can all benefit from the trained model?