To be fair, the open source model has been what's been working for the last few decades. The concern with LLMs was that open source (and academia) couldn't do what the big companies are doing because they couldn't get access to enough computing resources. The article is arguing (and I guess open source ML groups are showing) you don't need those computing resources to pave the way. It's still an open question whether OpenAI or the other big companies can find a most in AI via either some model, dataset, computing resources, whatever. But then you could ask that question about any field.

But none of the "open source" AI models are open source in the classic sense. They are free but they aren't the source code; they are closer to a freely distributable compiled binary where the compiler and the original input hasn't been released. A true open source AI model would need to specify the training data and the code to go from the training data to the model. Certainly it would be very expensive for someone else to take this information, build the model again, and verify that the same result is obtained, and maybe we don't really need that. But if we don't have it, then I think we need some other term than "open source" to describe these things. You can get it, you can share it, but you don't know what's in it.

kbrkbr

RWKV does: https://github.com/BlinkDL/RWKV-LM It uses „the Pile“: https://pile.eleuther.ai/ And I’ve seen some more in the last weeks.