What does HackerNews think of open_llama?

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

From their project page https://github.com/openlm-research/open_llama :

> For current version of OpenLLaMA models, our tokenizer is trained to merge multiple empty spaces into one before tokenization, similar to T5 tokenizer. Because of this, our tokenizer will not work with code generation tasks (e.g. HumanEval) since code involves many empty spaces. We are planning to open source long context models trained on more code data. Stay tuned.

Sounds like they're working on it.

There is Open Llama 7B which is Apache 2.0 licensed, please consider checking it out https://github.com/openlm-research/open_llama
OpenLLAMA will be released soon and it's 100% compatible with the original LLAMA.

https://github.com/openlm-research/open_llama

There are already efforts to recreate the llama weights under open source licenses (eta: days/nowish).

https://github.com/openlm-research/open_llama

Maybe we don't need to worry, OpenLLaMA is under training right now. It will be the commercial version of LLaMA.

> Update 05/22/2023

> We are happy to release our 700B token checkpoint for the OpenLLaMA 7B model and 600B token checkpoint for the 3B model. We’ve also updated the evaluation results. We expect the full 1T token training run to finish at the end of this week.

https://github.com/openlm-research/open_llama

So we could develop on LLaMA for now and switch to OpenLLaMA later.

Namespace collisions are inevitable, especially w/ how fast-moving the LLM space is right now, just wanted to point out that besides this "Open-Llama" project (which looks really interesting, and well documented in the Github repo), there is also another group training "OpenLLaMA" https://github.com/openlm-research/open_llama (which looks like an effort by two Berkeley PhD students, https://www.haoliu.site/ and http://young-geng.xyz/ to reproduce LLaMA using the 1.2T token Together RedPajama dataset. They've released up to a 300B checkpoint so far.)

Feedback for /u/bayes-song - it'd be great to have a more info on the model card on HF - right now it's unclear the parameter count, # of total tokens you're planning on training on/how many you've trained on so far. An Evaluation section (maybe using lm-evaluation-harness) might be good as well?