You may need some intermediate knowledge of linear algebra and this thing called "data science" nowadays, which is pretty much knowing how to mangle data and visualize it.
Try creating a small model on your own, it doesn't have to be super fancy just make sure it does something you want it to do. And then ... you'll probably could go on your own then.
What about https://github.com/ggerganov/llama.cpp ?
It compiles and run easily on Linux.
If somebody hasn't tried running LLMs yet, here are some lines that do the job in Google Colab or locally. The !s are for Colab, remove them for local terminal. The script downloads the ca. 8GB model, but Llama.cpp can run offline afterwards.
! git clone https://github.com/ggerganov/llama.cpp.git
! wget "https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/resolve/main/codellama-7b.Q8_0.gguf" -P llama.cpp/models
! cd llama.cpp && make
! ./llama.cpp/main -m ./llama.cpp/models/codellama-7b.Q8_0.gguf --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1.1 -t 8And... If you'd like a more hands on approach, here is a manual approach to get llama running locally
- https://github.com/ggerganov/llama.cpp
- follow instructions to build it (note the `METAL` flag)
- https://huggingface.co/models?sort=trending&search=gguf
- pick any `gguf` model that tickles your fancy, download instructions will be there
and a little script like this will get it running swimmingly ./main -m ./models/.gguf --color --keep -1 -n -1 -ngl 32 --repeat_penalty 1.1 -i -ins
Enjoy the next hours of digging through flags and the wonderful pit of time ahead of you.NOTE: I'm new at this stuff, feedback welcome.
[0] https://huggingface.co/TheBloke/WizardCoder-Python-34B-V1.0-...
- I wouldn't use anything higher than a 7B model if you want decent speed.
- Quantize to 4-bit to save RAM and run inference faster.
Speed will be around 15 tokens per second on CPU (tolerable), and 5-10x faster with a GPU.Step 1: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/blob/ma...
Step 2: https://github.com/ggerganov/llama.cpp
Step 3: you're welcome
Hoping they add support for llama 2 soon!
You're comparing a single, well managed project that had put effort into user onboarding against all projects of a different language and proclaiming that an entire language/ecosystem is crap.
The only real take away is that many projects, independent of language, put way too little effort towards onboarding users.
Which in turn has the following as the first link: https://arxiv.org/abs/2302.13971
Is it really quicker to ask here than just browse content for a bit, skimming some text or even using Google for one minute?
git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp && cmake -B build && cmake --build build
python3 -m pip install -r requirements.txt
cd models && git clone https://huggingface.co/openlm-research/open_llama_7b_preview_200bt/ && cd -
python3 convert-pth-to-ggml.py models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights 1
./build/bin/quantize models/open_llama_7b_preview_200bt/open_llama_7b_preview_200bt_transformers_weights/ggml-model-f16.bin models/open_llama_7b_preview_200bt_q5_0.ggml q5_0
./build/bin/main -m models/open_llama_7b_preview_200bt_q5_0.ggml --ignore-eos -n 1280 -p "Building a website can be done in 10 simple steps:" --mlockStill a couple years out but moving way faster than I would have expected.