Drawing graphs is probably one of the worst comparisons one can do in terms of evaluating these models. They seem to be trained to generate either photorealistic or stylized images.
I had personally never seen this side of the models, so I wanted to share this finding.
I agree that they were trained more on artistic images, but I was still surprised with how bad they generalized to a more theoretical(?) context.
It's not about it being theoretical, it's moreso that the language model is still far more simplistic than our own, and struggles with anything but the most basic relations between nouns. The "horse riding an astronaut" post is a good example of this.[0]
See also “Shirt without stripes”
https://github.com/elsamuko/Shirt-without-Stripes