What does HackerNews think of ggllm.cpp?

Falcon LLM ggml framework with CPU and GPU support

Language: C

It doesn't support Falcon right now, but there's a fork that does (https://github.com/cmp-nct/ggllm.cpp/).
Pretty much anything with 32GB (?) total RAM+VRAM:

https://github.com/cmp-nct/ggllm.cpp

But its going to be slow without even a small Nvidia GPU (a 2060?). CPUs are really slow at prompt ingestion, and that can't be hidden with streaming.

The GGLLM fork seems to be the leading falcon winner for now [1]

It comes with its own variant of the GGML sub format "ggcv1" but there's quants available on HF [2]

Although if you have a GPU I'd go with the newly released AWQ quantization instead [3] the performance is better.

(I may or may not have a mild local LLM addiction - and video cards cost more then drugs)

[1] https://github.com/cmp-nct/ggllm.cpp

[2] https://huggingface.co/TheBloke/falcon-7b-instruct-GGML

[3] https://huggingface.co/abhinavkulkarni/tiiuae-falcon-7b-inst...

Experimental Falcon inference via ggml (so on CPU): https://github.com/cmp-nct/ggllm.cpp

It has problems but it does work