-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional comparisons to Tiled DDPM, ControlNet Tile, Loopback Scaler and DeepFloyed. #2
Comments
Yes, I also want a visual comparison. If your method is competitive (For example if you can upscale to 4k images like the controlnet tile model), I will be happy to migrate your method to the automatic1111. By the way I'm also studying in NTU. We may have opportunity to cooperate! |
Hi, thanks for your interests of our work! We appreciate your valuable advice and we will go through these demos later. Next, we will compare with these baselines one by one. |
Comparison with Tiled DDPM: We observe that Tiled DDPM tends to be struggling with fidelity as well as the quality in real-world cases. |
We further show an example on AIGC SR, though StableSR is not for AIGC and never see such type of data during training. We directly test on the image provided by Tiled DDPM, the generated image is in 4K resolution: |
Thanks for your effort in testing. It seems that your model is compatible with my tiled diffusion method (that is only tiling, no advanced algorithm involved). Would you mind me migrating your model to the Automatic1111? Or if you want to start the project on your own, I may be able to help. |
Hi~ Thanks for your interest. Honestly, the main purpose of this paper is just to attempt to make contributions to the research community, even if the contributions may be tiny. |
StableSR is so far the best identity preserving scaling method out there. Meaning if you downscale it back to its original res, each pixel should average back to it's original value and it shouldn't make up features larger than the pixels. While the new details should look plausible and not like a mere filter. Comparison between StableSr minus base image, and TiledDDPM minus base image using the highres image provided in @pkuliyi2015 's github page |
For the comparison with ControlNet Tile. It seems it is still in updating and not fully included in A1111. The gradio demo they provided currently does not support upscaling in tiles. And unfortunately, I am not familiar with gradio and failed to build it in A1111 after trying for two days. So I just skip this comparison. |
Comparison with Loopback Scaler: Similarly, we observe that Loopback Scaler has inferior performance in this real-world case. |
Comparison with DeepFloyd: Obviously, it is still mainly a fidelity issue, while the quality of some detailed textures are also not as good as StableSR. |
Conclusion: As observed in the comparisons above, our StableSR significantly differs from the above diffusion-based upscalers with higher fidelity, which is also the main challenge of applying diffusion prior for SR as discussed in our paper. Specifically, the above upscalers still focus on 'creation', and they mainly handle AIGC images whose degradation is different from real-world images captured by cameras. Hence, they mainly care about generation quality, which means generating new content in the upscaled results is allowed. We believe this is not the end, but the beginning to explore the powerful ability of diffusion models for image restoration. |
Hello, thanks for the work! We see many classic SR methods in the paper. The comparison to Real-ESRGAN+ looks promising!
However, it seems that the paper wants to claim that “our method using both synthetic and real world benchmarks demonstrates its superiority over current state-of-the-art approaches”. Just wondering would we have some comparisons to some real baselines and more common methods that people actually use?
For example:
Tiled diffusion’s DDIM inversion:
https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111
ControlNet Tile’s updates yesterday (looks like they are going to use this SR-like model to compete MidjourneyV5/5.1 in image details):
https://github.com/lllyasviel/ControlNet-v1-1-nightly#ControlNet-11-Tile
Loopback Scaler:
https://civitai.com/models/23188/loopback-scaler
DeepFloyd’s 256 stage model (IF-III-L):
https://github.com/deep-floyd/IF
Some of these methods are likely to use prompts, yet it seems that getting a prompt from small image is trivial for BLIP, and all ControlNets have a ‘guessmode’ that can use empty string as prompts. Loopback Scaler and Tiled diffusion seem to suggest people always using same string as prompts whatever the image is so they actually do not require prompts.
Most of these methods can be easily used by installing a latest version of automatic1111.
The text was updated successfully, but these errors were encountered: