Meta-Transfer Learning for Few-Shot Learning

Qianru Sun, Yaoyao Liu, Tat-Seng Chua, Bernt Schiele [NUS, MaxPlank] [CVPR2019]

Abstract

  • Meta-learning – to address the challenges of the few shot learning settings
  • Key idea – force a large number of similar few-shot tasks for learning a base-learner to a new task with few labels.
  • DNN overfit with few samples, but meta-learning uses shallow neural networks(SNN)
  • Contribution: Meta-Transfer Learning(MTL) – learns to adapt a DNN for few shot learning.
  • Meta – training multiple tasks
  • Transfer – achieved by learning scaling and shifting functions of DNN weights for each tasks
  • hard-task (HT) meta-batch: effective learning curriculum for MTL.
  • Benchmark: miniImageNet and Fewshot-CIFAR100 (5-class 1-shot and 5-class 5-shot)

Introduction

  • Few-shot learning – learn new concepts from few labeled examples. But in this CIFAR-100 archives only 40.1% accuracy for 1-shot learning
  •  Few-shot categorized into 2 classes.
    • Data augmentation:-data generator(conditioned on gaussian noise)-underperformed in 1-shot. 
    • Task-based meta learning:- meta-learning aims to accumulate learning from multiple tasks, while base-learning focuses on modeling the data distribution of single task. 
  • Model-Agnostic Meta Learning(MAML) – learns to search for optimal initialization state to fast adopt a base-learner to a new task. But limitations: requires large number of similar tasks->costly, and base learner is shallow NN to avoid overfitting, so unable to use DNN
  • MTL – novel learning method which converges faster with less probability to overfit.
  • Transfer – weight transfer with 2 lightweight neuron operations: scaling and shiftingαX + β.
  • 2nd contribution: effective meta-training curriculum. Curriculum learning and hard negative mining -> faster convergence and stronger performance. Inspired by this they designed hard task(HT) meta-batch strategy. HT meta-batch online re-samples harder ones according to past failure tasks with lowest validation accuracy.

Related Work 

  • Few-shot learning
    • Metric learning method: learn a similarity space in which learning is efficient
    • Memory network method: learn to store experience when learning seen task and generalize that to unseen tasks.
    • Gradient descent based methods: have a specific meta-learner, that learns to adapt a base learner, through different tasks.(MAML) – same as this
  • Transfer learning
    • Fine-tuning
    • Taking pre-trained networks as backbone and adding high-level functions (eg:object detection and recognition and image segmentation)
  • Curriculum learning & Hard sample mining
    • Curriculum learning: Instead of random sample observations, organize it in meaningful ways -> fast convergence, effective learning, better generalization
    • Hard sample mining: in object detection, it treats image proposals overlapped with ground-truth as hard negative samples. Training on more confusing data enables the model to achieve higher robustness & better performance  

Preliminary

  • Meta-learning: 2 phases on classification task, T(episode) samples from a p(T)distribution
    • Meta-train:- aims to learn from a number of episodes {T}
    • Meta-test
  • Meta-training phase: learn from multiple episodes. 2 stage optimization in each episode
    • Stage1, base-learning: – cross entropy loss to optimize parameters of base-learner
    • Stage2, feed-forward test on episode test data-points: test loss to optimize parameters of meta-learner.
  • Meta-test phase: test the fast adaptation to unseen task.

Methodology- 3 phases

  • DNN training on large-scale data
    • Eg: on miniImageNet(64-class, 600-shot) and then fix the low-level layers as feature extractor. 
    • 1st randomly initialize a feature extractor(conv layers in ResNets), and a classifier(last FC layers of ResNets), and then optimize them by GD
    • It will be frozen. And learned classifier will be discarded, bcz few-shot tasks have 5-class instead of 64.
  • Meta-transfer learning(MTL)
    • Learns scaling and shifting(SS) parameters for feature extractor neurons, enabling fast adaptation to few-shot tasks
    • SS through HT meta-batch training
    • The loss of T- to optimize the base-learner(classifier) `by GD, without updating (conv layers), also is different from previous phase(64 to 5 class)
  • Hard task (HT) meta-batch
    • Intentionally pickup failure cases in each task & recompose their data to be harder tasks for adverse retraining – “grow up through hardness”’
    • Pipeline: -> base learner optimized by loss of T(tr)-> SS parameters optimized by loss of T(te)once -> get recognition accuracy of T(te)for M classes. ->choose the lowest accuracy Accmto determine most difficult class-m
    • Choosing hard class-m: ranking, not threshold.
    • Two methods of hard tasking using m: chosen {m}, we resample tasks Thardby
      • Directly using samples of class-m in current task
      • Indirectly using the label of class-m to sample new samples of that class
  • Algorithm: Algo1-> training of large scale DNN & meta transfer learning, HT meta batch resampling. Failure classes by algo 2(learning process on single task)

Experiments

  • Datasets and Implementation details
    • miniImageNet:for few shot learning evaluation. 
    • Fewshot-CIFAR100(FC100)
  • Network architecture
    • Feature extractor: 2 options
      • 4CONV: 4 layers with 3×3 convolutions and 32 filters -> BN -> ReLU -> 2×2 max-pooling.
      • ResNet12: 4 residual blocks and each block with 3 conv layers with 3×3 kernels. End of each residual – 2×2 max-pooling layer. No of filters starts from 64 and doubled every next block

Conclusion

  • Top performance in tackling few-shot learning problem

Link: https://arxiv.org/pdf/1812.02391v3.pdf

Kid ML

Kid ML is a contributor at KidML. We are committed to providing well-researched, accurate, and valuable content to our readers.

You May Also Like

Label Super Resolution

Label Super Resolution

Kolya Malkin, Caleb Robinson, Le Hou, Rachel Soobitsky, Jacob Czawlytko, Dimitris Samaras, Joel Saltz, Lucas Joppa, Nebojsa Jojic [Microsoft Research,...

Learning to segment microscopy images with lazy labels

A multi-task U-net for segmentation with lazy labels

by Rihuan Ke, Aurélie Bugeau, Nicolas Papadakis, Peter Schuetz, Carola-Bibiane Schönlieb [University of Cambridge] Abstract Paper proposes a DCNN for...

Pedestrian Detection in Thermal Images using Saliency Maps

Pedestrian Detection in Thermal Images using Saliency Maps

Abstract Thermal images are good at predicting objects/people at night, but poor performance in daylightSoA networks use fusion networks with...

Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks

Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks

Nicolas Audebert,[ONERA, The French Aerospace Lab] Abstract Investigate various methods to deal with semantic labeling of very high resolution multi-modal...

About Kid ML

Passionate about making AI and machine learning accessible to everyone, especially young learners and beginners.

Leave a Reply

About | Contact | Privacy Policy | Terms of Service | Disclaimer | Cookie Policy
© 2026 KidML. All rights reserved.