Your commands assume the model is a .bin file (so I guess there must be a way to convert the pytorch model .pth to the .bin file). How can I do this and what is the difference between the two models?
The facebook repo provides commands for using the models, these commands don't work on my windows machine: "NOTE: Redirects are currently not supported in Windows or MacOs. [W ..\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to ...."
The facebook repo does not describe which OS you are supposed to use, so I assumed it would work on Windows too. But then if this can work why would anyone need the ggerganov llama code? I am new to all of this and easily confused, so any help is appreciated
Direct link to request access form: https://ai.meta.com/resources/models-and-libraries/llama-dow...
Direct link to request access on Hugging Face (use the same email): https://huggingface.co/meta-llama/Llama-2-70b-chat-hf
Direct link to repo: https://github.com/facebookresearch/llama
Once you get a link to download on email make sure to copy it without spaces, an option is to open it in a new tab and then download. If you are using fish or another fancy shell, make sure you switch to bash or sh before running download.sh from the repo.
I am not sure exactly how much space is needed but it is likely north of 500GB given that there are two 70B models (you are given the option to download just the small ones in a prompt).
Edit: The_Bloke on HF already has them in GGML format available for download.
https://huggingface.co/TheBloke/Llama-2-7B-GGML https://huggingface.co/TheBloke/Llama-2-13B-GGML
https://github.com/facebookresearch/llama links to
https://forms.gle/jk851eBVbX1m5TAv5 which contains LLaMA license agreement below the form.
The training code is Apache 2.0 licensed so it can be copied and modified freely, including for commercial purpoes. https://github.com/facebookresearch/llama
The python + GPU one can be found on the official facebook repo: https://github.com/facebookresearch/llama (Presumably GP thought this was already known to everyone so they pasted the other link)
https://github.com/facebookresearch/llama
I already got the 7B model to generate text using my GPU! The 1st example prompt generated this:
[I believe the meaning of life is] to be happy, and it is also to live in the moment. I think that is the most important thing. I'm not really a party girl. I'm not a girl's girl. I have a really small group of close girlfriends and that's all I need. I believe in equal rights for everyone. I'm not a rebel. I don't really rebel against anything. I'm a very traditional girl, very loyal. I'm a mum's girl and I'm a dad's girl. People have a right to know what's going on. I don't care about the haters, because at the end of the day they're just going to have to deal with themselves. I've been getting more and more into fashion since I was about 16. I know I'm a little different, but so what? I think that's good. I don't think you should be like everyone else. It's my birthday, and I'll cry if I want to. I've always been a huge fan of fashion, and I've always liked to dress up
it's sharded across all 4 GPUs (as per the readme here: https://github.com/facebookresearch/llama). I'd wait a few weeks to a month for people to settle on a solution for running the model, people are just going to be throwing pytorch code at the wall and seeing what sticks right now.
Github [2]
[1] https://research.facebook.com/file/1574548786327032/LLaMA--O...
The closest you are going to get to the source is here: https://github.com/facebookresearch/llama
It is still unclear if you are even going to get access to the entire model as open source. Even if you did, you can't use it for your commercial product anyway.