A Step-by-step Guide to Few-Shot Learning

What is Few-Shot Learning, how does it work and what are its most prominent applications? Read our beginner's guide to Few-Shot Learning and start training reliable AI models with V7 today.
Read time
min read  ·  
June 13, 2022
Few-Shot learning

Humans can recognize new object classes from very few instances. However, most machine learning techniques require thousands of examples to achieve similar performance. 

In the past decade, computer vision researchers have primarily focused on solving generic tasks using millions of images. But it has led to a high correlation between data quantity and the performance of the models.

Therefore, researchers have developed ‘Few Shot Learning’ to mitigate the data scarcity issue, focusing on training models with fewer data without compromising their performance.

This guide will help you understand everything you need to know about Few-Shot Learning in a couple of minutes.

Here’s what we’ll cover:

  1. What is Few Shot Learning
  2. How does Few-Shot Learning work
  3. Few Shot Image Classification algorithms
  4. Few-Shot Learning applications
  5. Few Shot Learning research papers

And hey—

If you are searching for the tools to annotate your data and train you ML models, we got you covered!

Head over to our Open ML Datasets repository, pick a dataset, upload it to V7, and start annotating data to train your neural networks in one place. Have a look at these resources to get started:

  1. V7 Image and Video Annotation
  2. V7 Model Training
  3. V7 Dataset Management
  4. V7 Automated Annotation
  5. V7 Data Labeling Services
Accurate AI file analysis at any scale

Turn images, PDFs, or free-form text into structured insights

Let's begin!

What is Few-Shot Learning?

Few-Shot Learning is an example of meta-learning, where a learner is trained on several related tasks, during the meta-training phase, so that it can generalize well to unseen (but related) tasks with just a few examples, during the meta-testing phase.

Few-shot training stands in contrast to traditional methods of training machine learning models, where a large amount of training data is typically used. Few-shot learning is used primarily in Computer Vision.

In practice, few-shot learning is useful when training examples are hard to find (e.g., cases of a rare disease) or the cost of data annotation is high.

The importance of Few-Shot Learning

  • Learn for anomalies: Machines can learn rare cases by using few-shot learning. For example, when classifying images of rare diseases like COVID-19, a computer vision model trained with few-shot learning techniques can classify an image of a chest x-ray accurately after being exposed to a small number of x-ray images.
  • Reduce data costs: Few-shot learning requires lesser data to train a model, and high costs related to data collection and annotation are eliminated. A low amount of training data means low dimensionality in the training dataset, significantly reducing computational costs.
  • Learn like a human: After seeing a few examples, humans can spot the difference between handwritten characters. However, computers need large amounts of data to understand what they “see” and spot the difference. Few-shot learning is a test base where computers are expected to learn from a few examples like humans.
💡 Pro tip: Looking for quality data? Check out 65+ Best Free Datasets for Machine Learning and 21+ Healthcare Datasets for Computer Vision.

Few-shot learning uses the N-way-K-shot classification approach to discriminate between N classes with K examples. 

Using conventional methods will not work as modern classification algorithms depend on far more parameters than training examples and will generalize poorly.

If the data is insufficient to constrain the problem, then one possible solution is to learn from the experience of other similar problems. To this end, most approaches characterize few-shot learning as a meta-learning problem.

Meta-Learning framework for FSL
Meta-Learning framework for FSL
💡 Pro tip: Read these guides on data cleaning and data preprocessing.

N-Shot Learning (NSL)

A shot is nothing more than a single example available for training, so we have N examples for training in N-shot learning. 

In the N-shot learning field, we have n labeled images of each K class, i.e., N ∗ K total examples, which we call support set S . We also have to classify Query Set Q, where each example lies in one of the K classes.  

N-shot learning has mainly three sub-fields: 

  1. Zero-shot learning
  2. One-shot learning
  3. Few-shot learning

Zero Shot Learning (ZSL)

Zero Shot Learning aims to classify unseen data samples without any training. Having a general idea about the attributes of an object, its appearance, properties, and functionality, classifying data shouldn’t be a problem.

One Shot Learning (OSL)

In the One Shot Learning problem, we have a single sample of each class.

Few Shot Learning (FSL)

Few-Shot has two to five samples per class, making it just a more flexible version of OSL.

How does Few-Shot Learning work

Now, let's discuss how Few-Shot Learning works in more detail.

Few-Shot learning approaches

We use one set of classification problems to help solve other unrelated sets.

Here, each task mimics the few-shot scenario, so for N-way-K-shot classification, each task includes N classes with K examples. 

Meta Learning

In the classical learning framework, we learn how to classify from training data examples and evaluate the results using test data. In the meta-learning framework, we learn how to classify a given set of training tasks and evaluate using a set of test tasks. 

💡 Pro tip: Learn more about Train, Test, Validation Split.

N classes with K examples are known as the support set for the task and are used for learning how to solve this task.

In addition, there are further examples of the same classes, known as a query set, which are used to evaluate the performance of this task. Each task can be completely non-overlapping; we may never see the classes from one task in any of the others. 

Meta Learning algorithm

In the classic paradigm, when we have a specific task, an algorithm is learning if its task performance improves with experience. 

In the Meta Learning paradigm, we have a set of tasks. An algorithm is learning to improve with experience and the number of tasks. This algorithm is called a Meta Learning algorithm.

Let’s assume we have a test task TEST. We will train our Meta Learning algorithm on a batch of training tasks TRAIN. Training experience gained from attempting to solve TRAIN tasks will be used to solve the TEST task.

Training an FSL task has a set sequence of steps. Imagine we have a classification problem, as we mentioned before. To start, we need to choose a base dataset. Choosing a quality base dataset is crucial.

In the N-way-K-Shot classification problem, we have a large base dataset that we’ll use as a Meta Learning training set (TRAIN).  

The Meta Training process will have a finite number of episodes. We form an episode like this:

  • We sample N classes and K support-set images per each class from the TRAIN, along with Q query images. This way, we form classification tasks similar to our TEST task.
  • At the end of each episode, the model's parameters are trained to maximize the accuracy of Q images from the query set. This is where our model learns the ability to solve an unseen classification problem.
  • The overall efficiency of the model is measured by its accuracy on the TEST classification task.
Meta-Learning Training
Meta-Learning Training
Approaches to meta-learning

Approaches to meta-learning are diverse, and there is no single best approach. However, there are three distinct ways, each of which exploits a different type of prior knowledge:

Prior knowledge about similarity: ML models try to learn embeddings in training tasks that tend to separate different classes even when they are unseen.

Prior knowledge about learning: ML models use prior knowledge to constrain the learning algorithm to choose parameters that generalize well from a few examples.

Prior knowledge of data: ML models exploit prior knowledge about the structure and variability of the data, which enables constructing viable models from a few examples.

💡 Pro tip: Need to annotate data for your model training? Check out 13 Best Image Annotation Tools.

Data-level approach (DLA)

It’s based on the concept that if you don’t have enough data to build a reliable model and avoid overfitting and underfitting, you should add more data.

Many FSL problems are solved by using additional information from a large base dataset. The key feature of the base dataset is that it doesn’t have classes that we have in our support set for the Few-Shot task. For example, if we want to classify a specific bird species, the base dataset can have images of many other birds.

We can also produce more data ourselves. To reach this goal, we can use data augmentation or even generative adversarial networks (GANs).

Parameter-level approach (PLA)

From the parameter-level point of view, it’s relatively easy to overfit on Few-Shot Learning samples, as they have a high dimensional capacity to fit all data features.

We should limit the parameter space and use regularization and proper loss functions to overcome this problem. The model will generalize the limited number of training samples.

💡 Pro tip: Learn more about Neural Network Architectures and Activation Functions.

On the other hand, we can enhance model performance by directing it to the vast parameter space. Using a standard optimization algorithm might not give reliable results because of the small amount of training data.

That is why on the parameter level, we train our model to find the best route in the parameter space to give optimal prediction results. 

V7 Go interface
Solve any task with GenAI

Automate repetitive tasks and complex processes with AI

Few-Shot Image Classification algorithms

Next, let us briefly describe the most prominent Few-Shot Image Classification algorithms.

Model-Agnostic Meta-Learning (MAML)

MAML was inspired by the idea behind the question of how much data is needed to learn about something. Can we teach algorithms to learn how to learn?

Meta-learning algorithms can be designed to address the following tasks:

  1. Dynamic selection of inductive bias
  2. Building meta-rules for multi-task learning
  3. Learning to learn with hyperparameter optimization

Before explaining how to train MAML (meta-training), let’s define what we expect at meta-test time. Considering that we have found a good initialization parameter θ from which we can perform efficient, one-shot adaptation.

💡 Pro tip: Read Domain Adaptation in Computer Vision: Everything You Need to Know.

Given a new task, the new parameter θ’, obtained by gradient descent, should perform well on the new task. The figure below illustrates how MAML should work at meta-test time. We are looking for a pretrained parameter that can reach near-optimal parameters for every task in one (or a few) gradient step(s).

MAML Learning Algorithm
MAML Learning Algorithm

The meta-training algorithm is divided into two parts:

  • Firstly, for a given set of tasks, we sample multiple trajectories using θ and update the parameter using one (or multiple) gradient step(s) of the policy gradient objective. This is called the inner loop.
  • Second, for the same tasks, we sample multiple trajectories from the updated parameters θ’ and backpropagate to θ the gradient of the policy objective. This is called the outer loop.

MAML currently doesn’t work as well as metric learning algorithms on popular few-shot image classification benchmarks. It is quite hard to train because there are two levels of training, so the hyper-parameters search is much more complex. 

Plus, the meta-backpropagation implies the computation of gradients, so you have to use approximations to be able to train it on standard GPUs. For these reasons, you would probably rather use metric learning algorithms for your computer vision projects at home or at work.

💡  Pro tip: Want to train your own AI? You can do it using V7. Go ahead and train image classification, instance segmentation, and object detection models on V7.

Prototypical Networks

Prototypical networks are based on the concept that there exists an embedding in which several points cluster around a single prototype representation for each class. It aims to learn per-class prototypes based on sample averaging in the feature space. 

Prototypical networks compute each class’s M-dimensional representation or prototype through an embedding function with learnable parameters. Also, each prototype is the mean vector of the embedded support points belonging to its class.

Prototypical networks are more efficient than the recent meta-learning algorithms, making them an appealing approach to few-shot and zero-shot learning. 

Prototype Networks in Zero-Shot and Few-Shot scenarios
Prototype Networks in Zero-Shot and Few-Shot scenarios

Matching Networks

Matching Networks was the first to train and test on n-shot, k-way tasks. This appeal is straightforward — training and evaluating the same tasks lets us optimize for the target task in an end-to-end fashion. The matching networks paper develops a very novel idea of a fully differentiable neural neighbors algorithm.

Matching Networks based on deep neural networks combine embedding and classification to form an end-to-end differentiable nearest neighbors classifier.

💡 Pro tip: Check out also A Comprehensive Guide to Convolutional Neural Networks.

Matching Networks first embed a high dimensional sample into a low dimensional space and then perform a generalized nearest-neighbor classification form.

Matching Networks
Matching Networks

The embedding function they use for their few-shot classification problems is a CNN. It is differentiable hence making the attention and Matching Networks fully differentiable! It's straightforward to fit the whole model end-to-end with typical methods such as stochastic gradient descent. 

Relation Network

The distance function was not defined in advance but learned by the algorithm. RN has its relation module that does this. If you want to learn more, check out the paper.

Relation Networks
Relation Networks

The overall structure is as follows. The relation module is put on the top of the embedding module, which is the part that computes embeddings and class prototypes from input images.

The relation module is fed with the concatenation of embedding a query image with each class prototype, and it outputs a relation score for each couple. Applying a Softmax to the relation scores, we get a prediction.

Few-Shot Learning Object Detection

Few-shot object detection aims to generalize on novel objects using limited supervision and annotated samples. 

  • Let (S1, … Sn) be a set of support classes and Q be a query image with multiple instances and backgrounds. 
  • The given (S1, … Sn) and Q models aim to detect and localize all objects from support sets found in Q.

Most FSOD applications divide classes into two non-overlapping parts: base and novel classes during the training.

The training dataset includes base classes to train the baseline model. Then, the model is finetuned, where a combined dataset of base and novel classes is used. The last stage includes testing on a dataset composed of only novel classes.

Two popular few shot object detection tasks are used for benchmarking: MS-COCO on 10-shot and MS-COCO on 30-shot. Let’s look at the top 3 models for each of these tasks:

Few Shot Object Detection Algorithms
Few Shot Object Detection Algorithms

Depending on the task, these three algorithms outperform others. However, there is a massive gap in accuracy between classic object detection tasks and few-shot object detection. 

💡 Pro Tip: Read YOLO: Real-Time Object Detection Explained.

DeFRCN: Decoupled Faster R-CNN for Few-Shot Object Detection

The Faster R-CNN is modified for few-shot object detection. Faster R-CNN consists of 3 blocks: 

  • The shared convolutional backbone for extracting generalized features, 
  • Region Proposal Network (RPN) for generating class-agnostic proposals, and 
  • a task-specific RCNN head for performing class-relevant classification and localization.

The architecture of Decoupled Faster R-CNN (DeFRCN) for few-shot object detection. Compared to the standard Faster R-CNN, two Gradient Decoupled Layers (sky-blue) and an offline Prototypical Calibration Block (red) are inserted into the framework for decoupling for multi-stage and multi-task, respectively. 

DeFRCN Architecture
DeFRCN Architecture

Dual-Awareness Attention for Few-Shot Object Detection

Existing FSOD systems follow FSC approaches, neglect the problem of spatial misalignment and the risk of information entanglement, and consequently result in low performance.

The paper proposes a novel Dual-Awareness Attention (DAnA), which captures the pairwise spatial relationship across the support and query images. 

The generated query-position-aware (QPA) support features are robust to spatial misalignment and capable of guiding the detection network precisely. The DAnA component adapts to various object detection networks and enhances FSOD performance by paying attention to specific semantics conditioned on the query. 

Experimental results demonstrate that DAnA significantly boosts (+6.9 AP relatively) few-shot object detection performance on the COCO benchmark. By equipping DAnA, conventional object detection models, Faster- RCNN, and RetinaNet, which are not designed explicitly for few-shot learning, reach state-of-the-art performance in FSOD tasks.

Few Shot Object Detection using dAnA
Few Shot Object Detection using dAnA

Few-Shot Learning applications

Few Shot Learning has applications in a wide array of AI tasks.

Natural Language Processing (NLP)

Few-shot learning enables natural language processing (NLP) applications including:

  1. Sentence completion
  2. User intent classification for dialog systems
  3. Text classification
  4. Sentiment analysis

Computer Vision

Few-shot learning is used mainly in machine vision to deal with problems such as:

  1. Character recognition
  2. Object-related applications
  3. Object tracking
  4. Object recognition 
  5. Part labeling
  6. Image retrieval
  7. Shape view reconstruction for 3D objects
  8. Image classification 
  9. Video applications
  10. Video classification
  11. Event detection
💡 Pro Tip: Have a look at Optical Character Recognition: What is It and How Does it Work [Guide]

Audio Processing

Data that contains information regarding voices/sounds can be analyzed by acoustic signal processing, and few-shot learning can enable the deployment of the following tasks:

  1. Voice cloning from a few audio samples of the user (e.g. voices in GPS/navigation apps, Alexa, Siri, etc.)
  2. Voice conversion across different languages
  3. Voice conversion from one user to another

Few-Shot Learning research papers

Below is a curated list of some of the most cited and acknowledged research work in the Few Shot Learning domain.


  • Few-Shot Learning (FSL) targets to bridge the gap between AI and human learning. It can learn new tasks containing only a few examples with supervised information by incorporating prior knowledge. 
  • FSL acts as a test-bed for AI making the learning of rare cases possible or helping to relieve the burden of collecting large-scale supervised data in industrial applications. 
  • Few-shot learning in machine learning is the go-to solution whenever a minimal amount of training data is available. The technique helps overcome data scarcity challenges and reduce costs.
  • The core issue of FSL is the unreliable empirical risk minimizer that makes FSL hard to learn. 
  • FSL is still a very nascent field that requires a lot of research and development before it can make its way into mainstream applications.

💡 Read next:

A Step-by-Step Guide to Text Annotation [+Free OCR Tool]

The Complete Guide to CVAT - Pros & Cons

5 Alternatives to Scale AI

9 Essential Features for a Bounding Box Annotation Tool

9 Reinforcement Learning Real-Life Applications

Mean Average Precision (mAP) Explained: Everything You Need to Know

The Beginner's Guide to Deep Reinforcement Learning

Deval is a senior software engineer at Eagle Eye Networks and a computer vision enthusiast. He writes about complex topics related to machine learning and deep learning.

“Collecting user feedback and using human-in-the-loop methods for quality control are crucial for improving Al models over time and ensuring their reliability and safety. Capturing data on the inputs, outputs, user actions, and corrections can help filter and refine the dataset for fine-tuning and developing secure ML solutions.”
Automate repetitive tasks with V7's new Gen AI tool
Explore V7 Go
Ready to get started?
Try our trial or talk to one of our experts.
V7’s new Gen AI product