I find this surprising, from the simplistic (and probably naive) view that images are 2D signals while music is 1D.
In music the "style" is the content in some sense. For example jazz has very different "style" than classical, at many levels (key and tempo choice/mode choice/melodic intervals/motifs/amount of repetition of said motif/how it varies/harmonization and chord choice/global structure (AABA format)) and it isn't easy separate what pieces make it "jazz", and what don't (what factors of variation matter).
The equivalent in images would be replacing objects as well as texture, to form a new image that is reminiscent of the original but also novel at multiple scales - think Simpson's "Last Supper" as the goal of a style transfer [2].
It is also hard because as consumers we are used to hearing high quality versions of these types of "style transfer" for some styles all the time - and we even have a name for it ... "muzak".
[0] https://raw.githubusercontent.com/awentzonline/image-analogi...
[1] https://github.com/chuanli11/CNNMRF
[2] http://s267.photobucket.com/user/wiro_bucket/media/last%20su...