I have a mediocre GPU but a fast CPU (with a lot of RAM). Would I see improvements there?
I guess I should give it a try.
On intel MacBookPro 2020, CPU-only, the original one[1] using pytorch utilized one core only. A tensorflow implementation[2] with oneDNN support which utilized most of the cores ran at ~11sec/iteration. Another OpenVINO based implementation[3] ran at ~6.0sec/iteration.
[1] https://github.com/CompVis/stable-diffusion/
[2] https://github.com/divamgupta/stable-diffusion-tensorflow/