What does HackerNews think of waifu2x?

Image Super-Resolution for Anime-Style Art

Language: Lua

Waifu2x - I've used the library to upscale both old photos and videos with enough success to be pleased with the results.

https://github.com/nagadomi/waifu2x

Not sure about open-source, but a number of friends of mine who work as freelance professional creatives (e.g. ad hoc video advertisements for smaller companies) espouse Topaz Labs’ Video Enhance AI [0] for this sort of work. Anecdotally, seeing the results of their applications of it, it does seem to work astoundingly well— especially for an application that runs fine on hobbyist/enthusiast grade hardware (albeit the render times will likely still be in a multiple of hours for longer videos; personal experience on 3090 / 5950X / 128GB home work machine was roughly 5-8 frames rendered/sec when upscaling 720p footage to 1440p, roughly 12 seconds render time for every second of footage. I imagine the render time would scale with the demands of the selected AI model / output resolution).

I am not aware of any singular FOSS project in creative work that performs similarly to Topaz Labs’ product for all input videos (note to reader: if I am wrong, correct me— I only have second-hand anecdotes to go off of from my friends who work with this kind of work in their careers, and precious little first-hand experience from upscaling old family memories to experiment with the software). As far as I am aware, this is because different upscaling models are trained on, and thus effective / mainly used for a specific type of content. The same upscaling model that was trained on interpolated 480p videos with compression artifacts will not produce the same results as one trained on, say, anime videos/manga, e.g. the models used by Waifu2X [1]. Hence, with Topaz Labs’ application, you select the model that was trained on footage that best matches the footage you wish to upscale.

All that being said, I do know that some (if not all) of the upscaling models Topaz uses are FOSS. Much of their application is just syntactical sugar upon the models it uses, making it easier for non-SWEs to use. I’m not sure whether or not the models that Topaz distributes have any level of training done by them, in house— logically, I would assume as such, otherwise their product wouldn’t be as performant as it is.

[0] https://www.topazlabs.com/video-enhance-ai

[1] https://github.com/nagadomi/waifu2x

In procedurally generated Anime there is Waifu 2x https://github.com/nagadomi/waifu2x . After procedural generation it is recommended to do denoiseing on the resulting image to improve quality.
It tries to predict patterns from blurry shots. It might introduce artifacts that were never there: some drawings I upscaled turned distant woods into houses. You can see why this might be bad when viewing something for historical accuracy.

See e.g., https://www.topazlabs.com and https://github.com/nagadomi/waifu2x

This is a lot like "waifu2x".[1] That's super-resolution for anime images.

[1] https://github.com/nagadomi/waifu2x

A few times I have looked at the possibility of switching from JPEG to another format for photo web sites and every time I have I've come to the conclusion that you can't really win.

There are three benefits that one could get from reducing the file size:

1. Reduced storage cost

2. Reduced bandwidth cost

3. Better user experience

In my models, storage cost matters a lot. You can't come out ahead here, however, if you still have to keep JPEG copies of all the images.

Benefits in terms of 2 are real.

Benefits in terms of 3 are hard to realize. Part of it is that adding any more parts to the system will cause problems for somebody somewhere. For instance, you can decompress a new image format with a Javascript polyfill, but is download+decompress really going to be faster for all users?

Another problem is that much of the source material is already overcompressed JPEG so simply recompressing it with another format doesn't lead to a really improved experience. When I've done my own trials, and when I've looked closely at other people's trials, I don't see a revolutionary improvement.

A scenario that I am interested in now is making desktop backgrounds from (often overcompressed) photos I find on the web. In these cases, JPEG artifacts look like hell when images are blown up, particularly when images have the sharp-cornered bokeh that you get when people take pictures with the kit lens. In that case I can accept a slow and expensive process to blow the image up and make a PNG, something like

https://www.mathworks.com/help/images/jpeg-image-deblocking-...

or

https://github.com/nagadomi/waifu2x

The other approach I imagine is some kind of maximum entropy approach that minimizes the blocking artifacts.

Looking at that project on github[1] I can't help but to think the algorithm need to upsample anime images (like in this project) might be different than the one needed to upsample photos. Anime has the benefit of having a lot of sections of solid colors.

Edit: Actually looking at the source further it appears they also have models for photos as part of that project in addition to the model for anime.

[1] https://github.com/nagadomi/waifu2x

There's the waifu2x project [1] which also uses neural networks for super resolution. There's also the MPDN extensions project [2] which has various kinds of image scaling methods. It depends what kind of algorithms you want to play with really.

[1]: https://github.com/nagadomi/waifu2x

[2]: https://github.com/zachsaw/MPDN_Extensions

It's really cool to see a writeup for this on real-life images. Similar work at Pinterest http://engineering.flipboard.com/2015/05/scaling-convnets/, for Anime scaling https://github.com/nagadomi/waifu2x and for sprite scaling (can't find the reference I'm thinking of).

This tech has been around for several years, and some variation was presented in concert with the Boston Marathon investigation. https://arstechnica.com/information-technology/2013/05/hallu...

(Not clear if this was used as part of the investigation, or if it could be used for future investigations)

Existing pixel-art-specific upscaling methods are going to be better than this, for pixel art.

But if they wanted they could probably try to train their neural network for pixel art specifically, similar to https://github.com/nagadomi/waifu2x

Thanks for the detailed reply. I'd assumed your source files were digital photos of artwork (e.g. 20 megapixel). I want to print some photos of Japanese wood block artwork, and there's an RNN implementation which apparently works well for doing 2x, 4x etc. upscaling: https://github.com/nagadomi/waifu2x
Suuuper late to the discussion, but there is a project for anime-style images and denoising that uses CNNs.

https://github.com/nagadomi/waifu2x

https://community.sony.co.uk/t5/blog-news-from-sony/inside-4...

They won't tell you it's NNs but it is. Sony distributes a lot of movies, they use their database of movies to train the upscaling models (which is obviously NNs https://github.com/nagadomi/waifu2x ) and then put the chip in the TV.

It's something almost equivalent to storing Pride and Prejudice and Zombies in your TV in 4K, and then reproducing it when they match it with what's playing on TV.