Take a look at Databricks' getting a hand from internal team to create the dolly 15k dataset. ( https://www.databricks.com/blog/2023/04/12/dolly-first-open-... )

For training AGI (artificial general intelligence) maybe only a select few mega companies with massive datasets will be able to come up with training data.

There are so many other use cases that OSS projects can enable otherwise. Individuals or smaller companies have unique data that can be used to augment existing open source models. Many use cases are area specific, and without the need for general intelligence.

Palantir just did a talk at the AIPCon ( https://youtu.be/o2b0DwNg6Ko ), where they recommended the use of many LLMs, open and closed. ( the example had Llama 2 70B, GPT4, Palm coding, claude, + fine-tuned models) feeding into their synthesizer.

While I want open source to win, especially as an open source maintainer on Ollama (if you haven't seen it yet, it's one of the easiest ways to run LLMs locally - https://github.com/jmorganca/ollama ), I think the work so far in this space has been a positive sum one - open or closed.