What kind of hardware do I need to run this sufficiently well? I.e. say I want 10 tokens/s, what specs am I looking at?

Pretty much anything with 32GB (?) total RAM+VRAM:

https://github.com/cmp-nct/ggllm.cpp

But its going to be slow without even a small Nvidia GPU (a 2060?). CPUs are really slow at prompt ingestion, and that can't be hidden with streaming.