I was served with papers yesterday from the Silverman folks. No idea what they could possibly want, since books3 is open source. https://twitter.com/theshawwn/status/1704559992135717238?s=4...

I think we’re witnessing the death of open source AI. The logical outcome of this is that only large companies will be able to acquire and use the training data necessary to compete with ChatGPT.

So anyone who thinks otherwise will have to answer the question: how are we going to make any datasets?

It’s tempting to think that we can pull together data out of non copyrighted work. But there isn’t enough data. That would mean the model has no knowledge about any books.

Take a look at Databricks' getting a hand from internal team to create the dolly 15k dataset. ( https://www.databricks.com/blog/2023/04/12/dolly-first-open-... )

For training AGI (artificial general intelligence) maybe only a select few mega companies with massive datasets will be able to come up with training data.

There are so many other use cases that OSS projects can enable otherwise. Individuals or smaller companies have unique data that can be used to augment existing open source models. Many use cases are area specific, and without the need for general intelligence.

Palantir just did a talk at the AIPCon ( https://youtu.be/o2b0DwNg6Ko ), where they recommended the use of many LLMs, open and closed. ( the example had Llama 2 70B, GPT4, Palm coding, claude, + fine-tuned models) feeding into their synthesizer.

While I want open source to win, especially as an open source maintainer on Ollama (if you haven't seen it yet, it's one of the easiest ways to run LLMs locally - https://github.com/jmorganca/ollama ), I think the work so far in this space has been a positive sum one - open or closed.