What does HackerNews think of dalai?
The simplest way to run LLaMA on your local machine
I guess it comes down to the requirement of a very high end (or multiple) GPU that makes it impractical for most vs just running it in Colab or something.
Tho there are some efforts:
It works for 7B/13B/30B/65B LLaMA and Alpaca (fine-tuned LLaMA which definitely works better). The smaller models at least should run on pretty much any computer.
these could be useful:
https://github.com/Crataco/ai-guide/blob/main/guide/models.m... -> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...
https://github.com/cocktailpeanut/dalai
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)
My plan was to use https://github.com/cocktailpeanut/dalai with the alpaca model then somehow use llamaindex to input my dataset - a slack export. But it's not too clear how to train the alpaca model.
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction: {instruction}
### Response:
or
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction: {instruction}
### Input: {input}
### Response:
I ran Alpaca 7B Q4 almost instantly because they provided Curl's to download it. Super simple. But it seems most aren't doing that because it's prone to getting Facebook's gaze. So.. what's recommended?
I happened to find this[2], but i think that's the non-quantized raw models? Not sure yet.
[1]: Won't bother with 65B, can't fit in memory i believe? [2]: https://github.com/shawwn/llama-dl/blob/main/llama.sh
edit: I forgot about https://github.com/cocktailpeanut/dalai - i suspect this is best in breed atm? Though a Docker container would be nice to wrangle all the dependencies