If you're creating the normal maps yourself, perhaps you would benefit from generating the image from that using https://github.com/lllyasviel/ControlNet#controlnet-with-nor...

CN is a total game changer for generative image models, it solves so many things that were problematic before (proper depth, pose, sensible text and much more). This along with LoRa[0] and other improvements from the SD community really turn this into a super capable toolchain.

[0] https://github.com/cloneofsimo/lora