by Rihuan Ke, Aurélie Bugeau, Nicolas Papadakis, Peter Schuetz, Carola-Bibiane Schönlieb [University of Cambridge]
Abstract
- Paper proposes a DCNN for multiclass segmentation trainable on coarse data labels combined with a very small number of images with pixel-wise annotations. – They call the labelling strategy as ‘lazy’ labels
- Image segmentation into 3 connected tasks: detection, separation, segmentation
- Gives accurate segmentation results even if exact boundary labels are missing for a majority of the annotated data.
Introduction
- Multi-class and multi-instance segmentation approach – split to 3 tasks: detection, separation and segmentation
- Task 1: instance detection. – detects and classifies each object and roughly determines its region through an under-segmentation mask. Instance counting(by-product) – trained on the weakly annotated image – rough region inside each object is marked
- Task 2: Separation of instances without a clear boundary dividing them:- trained on weak annotations
- Task 3: Pixel wise classification of instances: requires strong annotations, which are accurate up to the boundaries – very small set is used
- Singel DNN, jointly optimized. Based on U-Net. The same contracting path for all 3 tasks, and multi-task block for the expansive path.
- Weighted loss function over the samples since the mix of weak and strong annotations are present
Related Work
- Image segmentation – k-means, snakes(curve evolution-based method), Grabcu(graph-cut based method)
- DCNN segmentation – Fully Convolutional Network(FCN), atrous convolution(to handle spatial information with a fully connected conditional random field(CRF))
- Fully connected CRF for post-processing
- Common weak annotations – image-level labels, bounding boxes, scribbles and points.
- Segmentation masks can be improved recursively
Multi-task learning framework
- Procedure: Lazy Labels(input set of images I)
- Select inner regions for each object in Task 1(detection)
- Indicate scribbles on images of Task 2(separation)
- Generate a few pixelwise labels Task 3 from Task 1 using interactive segmentation tools(Grabcut)
- Procedure: Multitask U-Net training (Tk, si(k), k,, r)
- si(k)- labels, k,- loss function weights for k=1,2,3, adam parameters r, minibatch size m
- Set the 1st and 2nd momentum vectors m, v as zeros
- Initialize multi-task U-net parameter
- Obtain a mini-batch
- Compute gradient
- Optimize and return
- In the multi-task learning setting, one aim is approximating the conditional probability
- Model is parameterized by which determined s.t models match the desired prob distribution
- The set of samples for segmentation(task 3) is small, so do not optimize for each individual task, but consider a joint probability
Network architecture
- U-Net structure for multiple tasks, only one contracting path – encoder
- On the expansive branch – multitask block at each resolution to support different learning purpose.
- In each multitask block, detection and segmentation tasks have a common path(same weights) but inserted an additional residual sub-block for the segmentation task. Residual sub-block provides extra network parameters to learn information not known from the detection task.
- The network is trained by minimizing the weighted cross-entropy loss
![](https://www.kidml.com/wp-content/uploads/2021/11/image-1.png)
Lazy labels generation
- Data – Ice cream SEM images
- Scribble-based labelling to obtain detection regions of air bubbles & ice crystals for T1
- Training set 20 images. 2- manually labelled for T3, 15-T1, 20-T2. Validation set – 6 images annotated for all the tasks
Experiments
- With a small set of labelled images -> data augmentation to prevent over-fitting.
- Rescaled and rotated randomly and cropped. Random flipping also during training
- Adam optimizer with lr=2×10^-4, and a batch size of 16.
Conclusion
- It’s difficult to determine exactly how much-labelled data is necessary beforehand