November 2, 2021 · Kid ML

Label Super Resolution

Kolya Malkin, Caleb Robinson, Le Hou, Rachel Soobitsky, Jacob Czawlytko, Dimitris Samaras, Joel Saltz, Lucas Joppa, Nebojsa Jojic [Microsoft Research, Yale University, Georgia Institute of Technology, Stony Brook University, Chesapeake Conservancy] – ICLR 2019

Abstract

DL method – Low-resolution labels to high-resolution labels given joint distribution between low and high-resolution labels
Noval loss function – minimizes the distance b/w distributions determined by a set of model outputs & corresponding distributions by low-resolution labels over the same set of outputs(?)
Class matching is not required. Also can be applied in a high-resolution semantic separation where HR labelled data is not available.
More performance than models trained only on HR labels

Introduction

Semantic segmentation – labeling each pixel of input image X = {xij}to L classes Y = {yij }, y ∈ {1, . . . , L}(application classes)
Weakly supervised segmentation – partial observation of target groundtruth labels, eg: summary of class labels instead of pixel-level labels
Low resolution classes – Z = {zk }; z ∈ {1, . . . , N }(accessory class)- defined for a set of pixels in i/p image
Joint distribution P(Y,Z)
Training image X is divided into K sets Bk each with accessory class label zk& model trained to produce HR label yij
Contribution: the creation of a general solution in weakly supervised image segmentation. (for land cover mapping and lymphocyte segmentation from pathology imagery)
State of the art methods disadvantages:
- Dai et al., Papandreou et al. – create bounding boxes around land cover object instances
- Kra ̈henbu ̈hl & Koltu, Hong et al.- match a class density functions to weak labels
- Lempitsky & Zisserman – localization and enumeration of small foreground objects with known sizes
- Chen et al. – expensive steps in inferences(CRF or iterative evaluation) – impractical in a large dataset
Proposed Method: segmentation network will output probabilistic estimates of the application labels(HR) & summarizes over the sets Bk-> estimated distribution of application labels(HR) for each set. These are compared with LR labels using std distribution distance metrics
1st Contribution: label SR network which utilizes the distribution of HR labels suggested by given LR labels, based on visual cues in the input images.
2nd Contribution: method utilizes more training data with weak labels

Converting a Semantic Segmentation Network into a Label Super-Resolution Network

φ- learned network parameters. Semantic segmentation distribution can be factorized as p(Y |X ; φ) = i,j???? p(yij |X ; φ), each p(yij |X ; φ) is a distribution over the possible labels y ∈ {1, . . . , L}.
The network trained on pairs of observed training images and label images (Xt,Yt) to maximize: φ = argmaxφ tlog p(Yt|Xt ;φ) = argmaxφti,j logp(ytij|Xt ;φ)
Assumption:
- Don’t have pixel level supervision(Yt) but have LR labels zk ∈{1,…,N}given on sets(blocks) of input pixels(Bk)
- A statistical joint distribution over the number of pixels clof each HR labell ∈ {1, . . . , L}, over the LR label z, pcoarse(c1, c2, . . . , cL|z)
Semantic segmentation network
- Using coarse labels as statistical descriptors:
  - Coarse labels can provide weak supervision by dividing block of pixels into categories that are statistically different from each other. For that we need to represent HR pixel count in these blocks, pcoarse(c|z)
- Label counting:
  - pcoarse(c|z)- a connection between coarse and fine labels
  - pnet(cl,k=c/X) – Gaussian distribution
  - p(Y |X)- model that outputs distributions over HR labels given input X
  - Label count will follow a Gaussian distribution(since avg many RVs)
  - Must summarize the model output over LR block Bk
  - Label counting layer computes a statistical representation
- Statistics matching loss:
  - Computes the amount of mismatch b/w 2 distributions, D(pnet,pcoarse), which then use as an optimization criterion for segmentation.

Applications and Experiments

Land Cover Super-Resolution:
- Land cover classified data difficult and expensive to acquire at high resolution.
- This work implemented an automated landcover change detection using it’s model
- Dataset and training: 3 goals.
  - (1). show how working of models trained only on low resolution data and label super-resolution compared to models with high resolution training data.
  - (2). show how models trained on label super-resolution works in heterogeneous land-cover data(urban area).
  - (3). effect of utilization of low and high resolution labels
- 3 datasets. –
  - 4-channel HR(1m) aerial images from US department of agriculture.
  - HR(1m) land-cover data covering bay watershed.
  - LR(30m) land-cover from NLCD
- divided the data to 4 geographical area. 1 training region with HR labels and 3 test regions
- train and test 4 groups of models.
  - HR model – which only have access to HR data
  - SR model – trained with their label super resolution model, that only have access to low resolution labels in which they are tested.
  - Baseline weakly supervised model – only have access to low resolution labels
  - HR + SR model – have access to both HR and LR
Baseline models:
- HR base model – U-Net core trained to minimize pixelwise cross-entropy loss using HR labels(UNet won against SegNet, ResNet & full-resolution ResNet)
- Soft naive – NLCD mean frequency as target label for every pixel
- Hard naive – using a one-hot vector of most frequent label
- EM approach –
  - M-Step: train the SR model only
  - E-Step: perform inference of HR label on training set, then super pixel denoising. Then assign labels in each block according to this smoothed prediction
  - Repeat EM iteration.
  - This paper uses superpixel denoising instead of dense-CRF for computational efficiency

Conclusions

SR Network, capable of deriving HR labels from low-resolution labels under the assumption that the joint distribution between joint distribution between LR and HR classes is known.

Link: https://openreview.net/pdf?id=rkxwShA9Ym

Abstract

DL method – Low-resolution labels to high-resolution labels given joint distribution between low and high-resolution labels

Noval loss function – minimizes the distance b/w distributions determined by a set of model outputs & corresponding distributions by low-resolution labels over the same set of outputs(?)

Class matching is not required. Also can be applied in a high-resolution semantic separation where HR labelled data is not available.

More performance than models trained only on HR labels

Introduction

Semantic segmentation – labeling each pixel of input image X = {xij}to L classes Y = {yij }, y ∈ {1, . . . , L}(application classes)

Weakly supervised segmentation – partial observation of target groundtruth labels, eg: summary of class labels instead of pixel-level labels

Low resolution classes – Z = {zk }; z ∈ {1, . . . , N }(accessory class)- defined for a set of pixels in i/p image

Joint distribution P(Y,Z)

Training image X is divided into K sets Bk each with accessory class label zk& model trained to produce HR label yij

Contribution: the creation of a general solution in weakly supervised image segmentation. (for land cover mapping and lymphocyte segmentation from pathology imagery)

State of the art methods disadvantages:

Dai et al., Papandreou et al. – create bounding boxes around land cover object instances
Kra ̈henbu ̈hl & Koltu, Hong et al.- match a class density functions to weak labels
Lempitsky & Zisserman – localization and enumeration of small foreground objects with known sizes
Chen et al. – expensive steps in inferences(CRF or iterative evaluation) – impractical in a large dataset

Proposed Method: segmentation network will output probabilistic estimates of the application labels(HR) & summarizes over the sets Bk-> estimated distribution of application labels(HR) for each set. These are compared with LR labels using std distribution distance metrics

1st Contribution: label SR network which utilizes the distribution of HR labels suggested by given LR labels, based on visual cues in the input images.

2nd Contribution: method utilizes more training data with weak labels

Converting a Semantic Segmentation Network into a Label Super-Resolution Network

φ- learned network parameters. Semantic segmentation distribution can be factorized as p(Y |X ; φ) = i,j???? p(yij |X ; φ), each p(yij |X ; φ) is a distribution over the possible labels y ∈ {1, . . . , L}.

The network trained on pairs of observed training images and label images (Xt,Yt) to maximize: φ = argmaxφ tlog p(Yt|Xt ;φ) = argmaxφti,j logp(ytij|Xt ;φ)

Assumption:

Don’t have pixel level supervision(Yt) but have LR labels zk ∈{1,…,N}given on sets(blocks) of input pixels(Bk)
A statistical joint distribution over the number of pixels clof each HR labell ∈ {1, . . . , L}, over the LR label z, pcoarse(c1, c2, . . . , cL|z)

Semantic segmentation network

Using coarse labels as statistical descriptors:
- Coarse labels can provide weak supervision by dividing block of pixels into categories that are statistically different from each other. For that we need to represent HR pixel count in these blocks, pcoarse(c|z)
Label counting:
- pcoarse(c|z)- a connection between coarse and fine labels
- pnet(cl,k=c/X) – Gaussian distribution
- p(Y |X)- model that outputs distributions over HR labels given input X
- Label count will follow a Gaussian distribution(since avg many RVs)
- Must summarize the model output over LR block Bk
- Label counting layer computes a statistical representation
Statistics matching loss:
- Computes the amount of mismatch b/w 2 distributions, D(pnet,pcoarse), which then use as an optimization criterion for segmentation.

Applications and Experiments

Land Cover Super-Resolution:

Land cover classified data difficult and expensive to acquire at high resolution.
This work implemented an automated landcover change detection using it’s model
Dataset and training: 3 goals.
- (1). show how working of models trained only on low resolution data and label super-resolution compared to models with high resolution training data.
- (2). show how models trained on label super-resolution works in heterogeneous land-cover data(urban area).
- (3). effect of utilization of low and high resolution labels
3 datasets. –
- 4-channel HR(1m) aerial images from US department of agriculture.
- HR(1m) land-cover data covering bay watershed.
- LR(30m) land-cover from NLCD
divided the data to 4 geographical area. 1 training region with HR labels and 3 test regions
train and test 4 groups of models.
- HR model – which only have access to HR data
- SR model – trained with their label super resolution model, that only have access to low resolution labels in which they are tested.
- Baseline weakly supervised model – only have access to low resolution labels
- HR + SR model – have access to both HR and LR

Baseline models:

HR base model – U-Net core trained to minimize pixelwise cross-entropy loss using HR labels(UNet won against SegNet, ResNet & full-resolution ResNet)
Soft naive – NLCD mean frequency as target label for every pixel
Hard naive – using a one-hot vector of most frequent label
EM approach –
- M-Step: train the SR model only
- E-Step: perform inference of HR label on training set, then super pixel denoising. Then assign labels in each block according to this smoothed prediction
- Repeat EM iteration.
- This paper uses superpixel denoising instead of dense-CRF for computational efficiency