What kind of hardware do I need to run this sufficiently well? I.e. say I want 10 tokens/s, what specs am I looking at?
Pretty much anything with 32GB (?) total RAM+VRAM:
https://github.com/cmp-nct/ggllm.cpp
But its going to be slow without even a small Nvidia GPU (a 2060?). CPUs are really slow at prompt ingestion, and that can't be hidden with streaming.