From skimming, it looks like this approach requires CUDA and thus is Nvidia only.

Anyone have a recommended guide for AMD / Intel GPUs? I gather the 4 bit quantization is the special sauce for CUDA, but I’d guess there’d be something comparable for not-CUDA?

4-bit quantization is to reduce the amount of VRAM required to run the model. You can run it 100% on CPU if you don't have CUDA. I'm not aware of any AMD equivalent yet.

Looks like there are several projects that implement the CUDA interface for various other compute systems, e.g.:

https://github.com/ROCm-Developer-Tools/HIPIFY/blob/master/R...

https://github.com/hughperkins/coriander

I have zero experience with these, though.