What does HackerNews think of open

Databricks Strikes $1.3B Deal for Generative AI Startup MosaicML | Jun 2023

OpenLLaMA models up to 13B parameters have now been trained on 1T tokens:

https://github.com/openlm-research/open_llama

Open LLAMA 13B released, trained on 1T tokens | Jun 2023

From their project page https://github.com/openlm-research/open_llama :

> For current version of OpenLLaMA models, our tokenizer is trained to merge multiple empty spaces into one before tokenization, similar to T5 tokenizer. Because of this, our tokenizer will not work with code generation tasks (e.g. HumanEval) since code involves many empty spaces. We are planning to open source long context models trained on more code data. Stay tuned.

Sounds like they're working on it.

Gorilla: Large Language Model Connected with APIs | Jun 2023

Expand Context ↕

There is Open Llama 7B which is Apache 2.0 licensed, please consider checking it out https://github.com/openlm-research/open_llama

GGML – AI at the Edge | Jun 2023

Expand Context ↕

OpenLLAMA will be released soon and it's 100% compatible with the original LLAMA.

https://github.com/openlm-research/open_llama

Chatbot Arena Leaderboard | May 2023

Expand Context ↕

There are already efforts to recreate the llama weights under open source licenses (eta: days/nowish).

https://github.com/openlm-research/open_llama

How to Finetune GPT Like Large Language Models on a Custom Dataset | May 2023

Expand Context ↕

Maybe we don't need to worry, OpenLLaMA is under training right now. It will be the commercial version of LLaMA.

> Update 05/22/2023

> We are happy to release our 700B token checkpoint for the OpenLLaMA 7B model and 600B token checkpoint for the 3B model. We’ve also updated the evaluation results. We expect the full 1T token training run to finish at the end of this week.

https://github.com/openlm-research/open_llama

So we could develop on LLaMA for now and switch to OpenLLaMA later.

Open-Lamam: A “real” open-source project to train LLM not just checkpoints | May 2023

Namespace collisions are inevitable, especially w/ how fast-moving the LLM space is right now, just wanted to point out that besides this "Open-Llama" project (which looks really interesting, and well documented in the Github repo), there is also another group training "OpenLLaMA" https://github.com/openlm-research/open_llama (which looks like an effort by two Berkeley PhD students, https://www.haoliu.site/ and http://young-geng.xyz/ to reproduce LLaMA using the 1.2T token Together RedPajama dataset. They've released up to a 300B checkpoint so far.)

Feedback for /u/bayes-song - it'd be great to have a more info on the model card on HF - right now it's unclear the parameter count, # of total tokens you're planning on training on/how many you've trained on so far. An Evaluation section (maybe using lm-evaluation-harness) might be good as well?

What does HackerNews think of open_llama?