I've been following the RedPajama project closely and I must say, it's quite an impressive undertaking. The fact that it's all open-source, and the collaboration between various institutions, is nothing short of amazing. This shows the power of the open-source community in action, with a bunch of smart people coming together to build something truly remarkable.

The 3B model, being super fast and accessible, is a game changer for a lot of us who may not have the latest hardware. I mean, running on an RTX 2070 that was released 5 years ago? That's pretty cool.

As for the 7B model, it's great to see that it's already outperforming the Pythia 7B. The bigger dataset definitely seems to be making a difference here. I'm eager to see how far this project goes, and what kinda improvements we can expect in the coming weeks with the new RedPajama dataset they're working on.

One thing I found interesting is the mention of differences between the LLaMA 7B and their replication. I'd love to learn more about those differences, as it could shed light on what's working well and what could be improved further.

Sorry, excuse my ignorance, but why is having access to a 3B model a gamechanger?

I played with a pirated 7B model a while back. My computer runs a 1080 TI - so it used to be good but now it's pretty old. The model ran with a reasonable number of tokens/sec, but the quality was just trash compared to what I'd grown used to with ChatGPT. It was a novelty I interacted with for just a single evening.

I truly don't understand the use case for a 3B model with our current technologies.

What are you going to use it for?

You can ultra fine tune those models ... look at vicune 13B, if you know how to prompt it well, you can get it to work as """"well"""" as ChatGPT. Running on local hardware .... I just got vicune 13b on gradio[1] to act as japanese kanji personal trainer, and I've only used a simple prompt: "I want you to act as a Japanese Kanji quiz machine. Each time I ask you for the next question, you are to provide one random Japanese kanji from JLPT N5 kanji list and ask for its meaning. You will generate four options, one correct, three wrong. The options will be labeled from A to D. I will reply to you with one letter, corresponding to one of these labels. You will evaluate my each answer based on your last question and tell me if I chose the right option. If I chose the right label, you will congratulate me. Otherwise you will tell me the right answer. Then you will ask me the next question. Avoid simple kanjis, let's go."

[1] https://chat.lmsys.org/

How can someone get into using these models? How does ‘tuning’ work? How might I go about using these models for doing things like say summarizing news articles or video transcriptions? When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model?

(I'm not an expert)

> How can someone get into using these models

You can use gradio(online) or download(git will not download, it's too big, do it manually) the weights at https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main and then load the model in pytourch and try inference(text generation). But you'll need either a lot of RAM(16GB,32GB+) or VRAM(Card).

> How might I go about using these models for doing things like say summarizing news articles or video transcriptions Again, you might try online or setup a python/bash/powershell script to load the model for you so you can use it. If you can pay I would recommend runpod for the shared GPUs.

> When someone tunes a model for a task, what exactly are they doing and how does this ‘change’ the model? From my view ... not much ... "fine-tuning" means training(tuning) on a specific dataset(fine, as in fine-grained). As I believe(I'm not sure) they just run more epochs on the model with the new data you have provided it until they reach a good loss(the model works), that's why quality data is important.

You might try https://github.com/oobabooga/text-generation-webui they have a pretty easy setup config. Again, you'll need a lot of RAM and a good CPU for inference on CPU or a GPU.

https://huggingface.co/lmsys/vicuna-13b-delta-v1.1/tree/main