What does HackerNews think of CNNMRF?

code for paper "Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis"

Language: Lua

Maybe 'Image Quilting for Texture Synthesis and Transfer', Efros and Freeman [0]?

There's some neural / patch blends from 2016 that I always thought were interesting (CNN-MRF) [1], and I think there's a renaissance in those approaches recently (combined with other generators / prompts etc.). You can also argue ViT is "patch based" in a major sense... I am still a big believer in patch + combinations + warping (non-parameteric synthesis) generally, some cool older work from Apple on that in speech land [2].

I go as far as arguing BPE / wordpiece / sentencepiece / tokenizers in general are key for modern approaches (as were word vocab selections in the earlier days of NMT), because they find 'good enough' patches (tokens) for a higher level model to stitch together while still having some creativity / generalization available... but we focus on the model details rather than the importance of the tokenizer (and tokenizer distribution) in publication many times.

[0] http://people.eecs.berkeley.edu/~efros/research/quilting.htm...

[1] https://github.com/chuanli11/CNNMRF

[2] https://machinelearning.apple.com/research/siri-voices

"Style transfer" also rarely works for object level transfer - it is more pattern based (high frequency content is often the "style" that is enhanced and transferred). Really nice transfers in practice sometimes require the object level content in the images to be similar, c.f. [0][1]. And all of this is coupled with really heavy human curation (people don't normally show their bad outputs)!

In music the "style" is the content in some sense. For example jazz has very different "style" than classical, at many levels (key and tempo choice/mode choice/melodic intervals/motifs/amount of repetition of said motif/how it varies/harmonization and chord choice/global structure (AABA format)) and it isn't easy separate what pieces make it "jazz", and what don't (what factors of variation matter).

The equivalent in images would be replacing objects as well as texture, to form a new image that is reminiscent of the original but also novel at multiple scales - think Simpson's "Last Supper" as the goal of a style transfer [2].

It is also hard because as consumers we are used to hearing high quality versions of these types of "style transfer" for some styles all the time - and we even have a name for it ... "muzak".

[0] https://raw.githubusercontent.com/awentzonline/image-analogi...

[1] https://github.com/chuanli11/CNNMRF

[2] http://s267.photobucket.com/user/wiro_bucket/media/last%20su...

Not directly related, but here are some results from a similar algorithm[1] combining the Starcraft map Python with various Google Maps screenshots: http://i.imgur.com/EgFpqRA.jpg

[1] CNNMRF, Neural Style plus Markov Random Fields: https://github.com/chuanli11/CNNMRF

The MRF loss is patch based, and adapted from CNNMRF [1]. Since the precursors to PatchMatch (as mentioned in the PatchMatch paper) were MRF + belief propagation based, I am pretty sure it could be done with some tweaking.

These analogies seem quite similar to the "user constraints" PatchMatch allows to be set, though an explicit "be straight" constraint might be much more difficult to optimize.

[1] https://github.com/chuanli11/CNNMRF