What does HackerNews think of dalai?

The simplest way to run LLaMA on your local machine

Language: CSS

I agree, I've definitely seen way more information about running image synthesis models like Stable Diffusion locally than I have LLMs. It's counterintuitive to me that Stable Diffusion takes less RAM than an LLM, especially considering it still needs the word vectors. Goes to show I know nothing.

I guess it comes down to the requirement of a very high end (or multiple) GPU that makes it impractical for most vs just running it in Colab or something.

Tho there are some efforts:

https://github.com/cocktailpeanut/dalai

If you're just looking to play with something locally for the first time, this is the simplest project I've found and has a simple web UI: https://github.com/cocktailpeanut/dalai

It works for 7B/13B/30B/65B LLaMA and Alpaca (fine-tuned LLaMA which definitely works better). The smaller models at least should run on pretty much any computer.

I had it running before with Dalai (https://github.com/cocktailpeanut/dalai) but have since moved to using the browser based WebGPU method (https://mlc.ai/web-llm/) which uses Vicuna 7B and is quite good.
consumer hardware is a bit vague of a limitation, which I guess it's partly why people are not tracking precisely what runs on what very closely

these could be useful:

https://nixified.ai

https://github.com/Crataco/ai-guide/blob/main/guide/models.m... -> https://old.reddit.com/user/Crataco/comments/zuowi9/opensour...

https://github.com/cocktailpeanut/dalai

the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )

GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant

PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM

feel free to answer back if you're trying any of these things this week (later I might lose track)

I want to use llamaindex. My input would be a slack export but I don't want any data to go to openai I want it all to happen locally or within my own EC2 instance. I have seen https://github.com/jerryjliu/llama_index/blob/046183303da416... but it calls hugging face.

My plan was to use https://github.com/cocktailpeanut/dalai with the alpaca model then somehow use llamaindex to input my dataset - a slack export. But it's not too clear how to train the alpaca model.

Just hacked together a library to use LLaMa and Alpaca through Dalai (https://github.com/cocktailpeanut/dalai)
Can you give an example on how you prompted the model? Your issue is probably related to that, but I would need an example to be sure. I've found the 7b Alpaca model [1] to work surprisingly well! Here's how you're supposed to prompt it:

Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction: {instruction}

### Response:

or

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction: {instruction}

### Input: {input}

### Response:

[1] https://github.com/cocktailpeanut/dalai

What's the best way to download and get setup with this stuff atm? Ie lets say i want to run currently available variations of LLaMA -- 7B, 13B, and 30B [1] -- is there a current summary of how to acquire them, possibly quantize them, etc? Would i download a quantized version or do it myself? etc

I ran Alpaca 7B Q4 almost instantly because they provided Curl's to download it. Super simple. But it seems most aren't doing that because it's prone to getting Facebook's gaze. So.. what's recommended?

I happened to find this[2], but i think that's the non-quantized raw models? Not sure yet.

[1]: Won't bother with 65B, can't fit in memory i believe? [2]: https://github.com/shawwn/llama-dl/blob/main/llama.sh

edit: I forgot about https://github.com/cocktailpeanut/dalai - i suspect this is best in breed atm? Though a Docker container would be nice to wrangle all the dependencies