Great job OP. Yesterday after managing to run local instance of alpaca.cpp and reading more about what alpaca is and how it got fine tuned on LLaMA, I began to wonder what will it take to fine tune with own set of data.

With no real knowledge of LLM and only recently started to understand what LLM terms mean, such as 'model, inference, LLM model, intruction set, fine tuning' whatelse do you think is required to make a took like yours?

This is for education purposes and love to take a jab on creating something like this and and write an inference - such as the dev behind LLaMa inference in Rush.

>I am not familiar with HuggingFace libaries at all, why were they important in your implementaiton? > Gradio - I believe is the UI that allows to plugin different lLM models, I am familiar with text-generation-ui on GitHub that uses Gradio. >LORA I think further fines tines an model -- just like how LLaMa got fine tuned on instruciton set to produce Alpaca model.

> With no real knowledge of LLM and only recently started to understand what LLM terms mean, such as 'model, inference, LLM model, intruction set, fine tuning' whatelse do you think is required to make a took like yours?

This was mee a few weeks ago. I got interested in all this when FlexGen (https://github.com/FMInference/FlexGen) was announced, which allowed to run inference using OPT model on consumer hardware. I'm an avid user of Stable Diffusion, and I wanted to see if I can have an SD equivalent of ChatGPT.

Not understanding the details of hyperparameters or terminology, I basically asked ChatGPT to explain to me what these things are:

   Explain to someone who is a software engineer with limited knowledge of ML terms or linear algebra, what is "feed forward" and "self-attention" in the context of ML and large language models. Provide examples when possible.
I did the same with all the other terms I didn't understand, like "ADAM optimizer", "gradient", etc. I relied on it very heavily and cross-referenced the answers.

Looking at other people's code and just tinkering with things on my own really helped.

Through the FlexGen discord I've discovered https://github.com/oobabooga/text-generation-webui where I spent days just playing around with models. This got me into the huggingface ecosystem -- their transformers library is an easy way to get started. I joined a few other discords, like LLaMA Unofficial, RWKV, Eleuther AI, Together, Hivemind and Petals.

I bookmarked a bunch of resources but it's very sporadic. Here are some:

- https://github.com/zphang/minimal-llama/#peft-fine-tuning-wi...

- https://github.com/togethercomputer/OpenChatKit

- https://www.cstroik.com/index.php/2023/02/18/finetuning-an-a...

- https://github.com/huggingface/peft

- https://github.com/kingoflolz/mesh-transformer-jax/blob/mast...

- https://github.com/oobabooga/text-generation-webui

- https://github.com/hizkifw/WebChatRWKVstic

- https://github.com/ggerganov/whisper.cpp

- https://github.com/qwopqwop200/GPTQ-for-LLaMa

- https://github.com/oobabooga/text-generation-webui/issues/14...

- https://github.com/bigscience-workshop/petals

- https://github.com/alpa-projects/alpa