I might be missing something. The article asks me to run a bash script on windows.

I assume this would still need to be run manually to access GPU resources etc, so can someone illuminate what is actually expected for a windows user to make this run?

I'm currently paying 15$ a month in a personal translation/summarizer project's ChatGPT queries. I run whisper (const.me's GPU fork) locally and would love to get the LLM part local eventually too! The system generates 30k queries a month but is not super-affected by delay so lower token rates might work too.

The bash script is downloading llama.cpp, a project which allows you to run LLaMA-based language models on your CPU. The bash script then downloads the 13 billion parameter GGML version of LLaMA 2. The GGML version is what will work with llama.cpp and uses CPU for inferencing. There are ways to run models using your GPU, but it depends on your setup whether it will be worth it.

I would highly recommend looking into the text-generation-webui project (https://github.com/oobabooga/text-generation-webui). It has a one-click installer and very comprehensive guides for getting models running locally and where to find models. The project also has an "api" command flag to let you use it like you might use a web-based service currently.