I have a mediocre GPU but a fast CPU (with a lot of RAM). Would I see improvements there?

I guess I should give it a try.

On intel MacBookPro 2020, CPU-only, the original one[1] using pytorch utilized one core only. A tensorflow implementation[2] with oneDNN support which utilized most of the cores ran at ~11sec/iteration. Another OpenVINO based implementation[3] ran at ~6.0sec/iteration.

[1] https://github.com/CompVis/stable-diffusion/

[2] https://github.com/divamgupta/stable-diffusion-tensorflow/

[3] https://github.com/bes-dev/stable_diffusion.openvino/