3060 12GB are the best things you can buy right now. They are cheap, have a ton of memory--which seems to be the issue w/ image generation--and you can fit four of them into the cheapest motherboards.
3060ti 8GB, 3090 24GB, and 4000 series all have performance benefits, but for now this one is off the charts.
With this card you can also run Open AI's Whisper with the Large model (the multilingual one!), as it requires 10GB.
My implementation of Whisper uses slightly over 4GB VRAM running their large multilingual model: https://github.com/Const-me/Whisper