by Dmitry Ulyanov,Andrea Vedaldi,Victor Lempitsky – CVPR 2018
Abstract
- Deep CNN – able to learn image priors from a large number of example images
- Contribution: Structure of a generator network is sufficient to capture a great deal of image statistics prior to any learning.
- A handcrafted prior with randomly initialized NN – gives excellent results in standard inversion problems such as denoising, superresolution and inpainting
- Same priors to invert
- This approach highlights the inductive bias captured by std generator network architectures
Introduction
- Image reconstruction problems – denoising, single-image superresolution – GAN, VAE, direct pixel-wise error minimization
- Learning from large dataset alone is insufficient because some authors showed that same image classification network generalized well on trained data can also overfit when presented with random labels.
- So, generalization requires the network structure to resonate with the structure of the data
- This paper is showing that without learning, plenty of image statistics can be captured.
- Untrained ConvnNets,
- fit a generator network to a single degraded image.
- The network weights are the parametrization of the restored image.
- Weights are randomly initialized and fitted to maximize their likelihood.
- Reconstruction as conditional image generation problem.(denoising, inpainting, superresolution)
Method
- Image generation – learn the generator networks x = f(z) that maps a random code vector zto an image x. x R3xHxW, z Rc`xH`C`,- network parameters
- U-Net type “hourglass” architecture with skip connection. And z and x with same spatial size
- x- clean image
- x- corrupted image
- x*- restored image, x*=minxE(x;x) + R(x), R(x) – Regularizer
- Denoising: E(x;x)=||x-x||2- Needs early stoping
- Inpainting: E(x;x)=||(x-x).m||2 – m is the binary mask
- Super-resolution: E(x;x)=||d(x)-x||2
- Feature Inversion: E(x;x)=||(x)-(x)||2
- The choice of regularizer, which usually capture a generic prior on natural images, is more difficult and is the subject of much research. This paper replaces regularizer with implicit prior captured by the neural network, as follows
- *=argminE(f(z);x) , x*= f*(z)
- Minimizer *obtained by using optimizer such as SGD from random initialization of the parameters
- Explicit prior: minx||d(x)-x||s.t x is a face, natural etc
- Deep Image Prior: minx||d(x)-x||s.t xis an output of a CNN
- MAP: x*= argmaxxP(x/x)
- P(x/x) = P(x/x) P(x)P(x)∝P(x/x) P(x)
- P(x/x) – Likelihood
- P(x)- prior
- In degradation, x= x + , (0,2) and P(x/x)=(x ;x,2)
- In restoration: x*=argmaxxP(x/x)
= argmaxxP(x/x)P(x)
= argmaxxP(x/x) since prior is a constant(we don’t have any preference)
= argmaxx(x;x,2) = x – We will not restore anything
- That means MAP estimate is the same as ML estimate if the prior is uniform. (ie no prior)
- Parametrization:
- Regular: argminxE(x;x) + R(x) – search in image space
- Parametrized: argminE(g();x) + R(g()) – search in some other space
- If g is surjective(for each x exists :g()= x), then the two problems are equivalent
- In practice even for surjective g, the solution will be different
- Let’s treat g as a hyperparameter and tune it.
- g itself is a prior and maybe it’s sufficient to optimize only data term argminE(g(); x)
- g() == f(z)- convolutional network with parameters
- Deep Image Prior – Step by Step
- 1. Initialize z (fill it with uniform noise U(-1,1)
- 2. Solve argminE(f(z);x) using gradient-based method k+1=k – ∝∂E(f(z);x)/∂
- 3. Get the solution x*=f*(z)
Applications
- Denoising and generic reconstruction
- x=x+∈, where ∈follows the particular distribution.
- But in blind denoising the noise model is unknown.
- Super-resolution
- LR image – x R3xHxWand upsample factor t, and generate corresponding HR image – x R3xtHxtW
- E(x;x)=||d(x)-x||2where d(.)is a downsampling operator that resizes an image by a factor of t.
- Inpainting
- Natural pre-image
- Flash-no flash reconstruction
Related work
- Highly related to self-similarity-based and dictionary-based priors.
- Even a single-layer layer convolutional sparse coding is proposed for reconstruction.
Link: https://arxiv.org/abs/1711.10925