Anyone get performance numbers for other 30 series cards? 3060 12gb?

I’m curious how it compares to his apple silicon numbers

Not a 30 series, but on my 4090 I'm getting 32.0 tokens/s on a 13b q4_0 model (uses about 10GiB of VRAM) w/ full context (`-ngl 99 -n 2048 --ignore-eos` to force all layers on GPU memory and to do a full 2048 context). As a point of reference, currently exllama [1] runs a 4-bit GPTQ of the same 13b model at 83.5 tokens/s.

[1] https://github.com/turboderp/exllama