by Dmitry Ulyanov,Andrea Vedaldi,Victor Lempitsky – CVPR 2018

Abstract

Deep CNN – able to learn image priors from a large number of example images
Contribution: Structure of a generator network is sufficient to capture a great deal of image statistics prior to any learning.
A handcrafted prior with randomly initialized NN – gives excellent results in standard inversion problems such as denoising, superresolution and inpainting
Same priors to invert
This approach highlights the inductive bias captured by std generator network architectures

Introduction

Image reconstruction problems – denoising, single-image superresolution – GAN, VAE, direct pixel-wise error minimization
Learning from large dataset alone is insufficient because some authors showed that same image classification network generalized well on trained data can also overfit when presented with random labels.
So, generalization requires the network structure to resonate with the structure of the data
This paper is showing that without learning, plenty of image statistics can be captured.
Untrained ConvnNets,
fit a generator network to a single degraded image.
The network weights are the parametrization of the restored image.
Weights are randomly initialized and fitted to maximize their likelihood.
Reconstruction as conditional image generation problem.(denoising, inpainting, superresolution)

Image generation – learn the generator networks x = f(z) that maps a random code vector zto an image x. x R3xHxW, z Rc`xH`C`,- network parameters
U-Net type “hourglass” architecture with skip connection. And z and x with same spatial size
x- clean image
x- corrupted image
x*- restored image, x*=minxE(x;x) + R(x), R(x) – Regularizer
- Denoising: E(x;x)=||x-x||2- Needs early stoping
- Inpainting: E(x;x)=||(x-x).m||2 – m is the binary mask
- Super-resolution: E(x;x)=||d(x)-x||2
- Feature Inversion: E(x;x)=||(x)-(x)||2
The choice of regularizer, which usually capture a generic prior on natural images, is more difficult and is the subject of much research. This paper replaces regularizer with implicit prior captured by the neural network, as follows
- *=argminE(f(z);x) , x*= f*(z)
- Minimizer *obtained by using optimizer such as SGD from random initialization of the parameters
Explicit prior: minx||d(x)-x||s.t x is a face, natural etc
Deep Image Prior: minx||d(x)-x||s.t xis an output of a CNN
MAP: x*= argmaxxP(x/x)
P(x/x) = P(x/x) P(x)P(x)∝P(x/x) P(x)
P(x/x) – Likelihood
P(x)- prior
In degradation, x= x + , (0,2) and P(x/x)=(x ;x,2)
In restoration: x*=argmaxxP(x/x)

= argmaxxP(x/x)P(x)

Recommended Reading: Meta-Transfer Learning for Few-Shot Learning

= argmaxxP(x/x) since prior is a constant(we don’t have any preference)

= argmaxx(x;x,2) = x – We will not restore anything

You Might Also Enjoy: Pedestrian Detection in Thermal Images using Saliency Maps

That means MAP estimate is the same as ML estimate if the prior is uniform. (ie no prior)
Parametrization:
- Regular: argminxE(x;x) + R(x) – search in image space
- Parametrized: argminE(g();x) + R(g()) – search in some other space
- If g is surjective(for each x exists :g()= x), then the two problems are equivalent
- In practice even for surjective g, the solution will be different
- Let’s treat g as a hyperparameter and tune it.
- g itself is a prior and maybe it’s sufficient to optimize only data term argminE(g(); x)
- g() == f(z)- convolutional network with parameters
Deep Image Prior – Step by Step
- 1. Initialize z (fill it with uniform noise U(-1,1)
- 2. Solve argminE(f(z);x) using gradient-based method k+1=k – ∝∂E(f(z);x)/∂
- 3. Get the solution x*=f*(z)

Denoising and generic reconstruction
- x=x+∈, where ∈follows the particular distribution.
- But in blind denoising the noise model is unknown.
Super-resolution
- LR image – x R3xHxWand upsample factor t, and generate corresponding HR image – x R3xtHxtW
- E(x;x)=||d(x)-x||2where d(.)is a downsampling operator that resizes an image by a factor of t.
Inpainting
Natural pre-image
Flash-no flash reconstruction

Highly related to self-similarity-based and dictionary-based priors.
Even a single-layer layer convolutional sparse coding is proposed for reconstruction.

Link: https://arxiv.org/abs/1711.10925