Computer vision

The Essential Guide to Data Augmentation in Deep Learning

12 min read

May 6, 2022

What is data augmentation, how does it work, and what are its most prominent use cases? Learn everything you need to know about data augmentation techniques for computer vision and start training your AI models on V7 today.

Deval Shah

Deval Shah

The accuracy of deep learning models largely depends on the quality, quantity, and contextual meaning of training data. However, data scarcity is one of the most common challenges in building deep learning models. In production use cases, collecting such data can be costly and time-consuming.

Companies leverage a low-cost and effective method—data augmentation to reduce dependency on the collection and preparation of training examples and build high-precision AI models quicker.

Here’s what we’ll cover:

  • What is Data Augmentation

  • How does Data Augmentation work

  • Data Augmentation techniques in Computer Vision

  • Data Augmentation use cases

And in case you are looking for a tool to annotate data and train your computer vision models—V7 got you covered. We won't go into details as to why V7 has been voted the top training data platform on the market, but you can go ahead and check out:

  1. V7 Image and Data Annotation

  2. V7 Model Training

  3. V7 Machine Learning Datasets

  4. V7 Auto-Annotation

Here's a sneak peak!

A video labeling annotation tool where drone footage of a port inspection is being annotated

Video annotation

AI video annotation

Get started today

A video labeling annotation tool where drone footage of a port inspection is being annotated

Video annotation

AI video annotation

Get started today

Now, let’s dive in.

What is data augmentation?

Data augmentation is a process of artificially increasing the amount of data by generating new data points from existing data. This includes adding minor alterations to data or using machine learning models to generate new data points in the latent space of original data to amplify the dataset. 

A question may arise about the difference between augmented data and synthetic data.

  • Synthetic data: When data is generated artificially without using real-world images. Synthetic data are often produced by Generative Adversarial Networks

  • Augmented data: Derived from original images with some sort of minor geometric transformations (such as flipping, translation, rotation, or the addition of noise) in order to increase the diversity of the training set.

Pro tip: Check out The Train, Validation, and Test Sets: How to Split Your Machine Learning Data

Today, there are a lot of privacy concerns revolving around data collection and usage. Hence, many researchers and companies are using synthetic data generation techniques to build datasets. However, due to limitations such as its lack of resemblance to the original data, augmented data is generally preferred over synthetic data.

Pro tip: To learn more about synthetic data, check out our guide: What is Synthetic Data in Machine Learning and How to Generate It

The Importance of Data Augmentation

Here are some of the reasons why data augmentation techniques have been gaining popularity in the last few years.

Improves the performance of ML models (more diverse datasets).

  • Augmented data is improving the performance and results of deep learning models by generating new and diverse instances for training datasets. 

  • Data collection and data labeling can be time-consuming and expensive processes for deep learning models. Companies can cut operational expenses by transforming datasets using data augmentation techniques. 

Pro tip: Ready to train your models? Here's the list of 65+ Best Free Datasets for Machine Learning and 20+ Open Source Computer Vision Datasets.

Limitations of Data Augmentation

Of course, this method also comes with its own challenges, including:

  • Cost of quality assurance of the augmented datasets.

  • Research and Development to build synthetic data with advanced applications.

  • Verification of image augmentation techniques like GANs is challenging.

  • Finding an optimal augmentation strategy for the data is non-trivial.

  • The inherent bias of original data persists in augmented data.

Now, let's dive into the practicalities of how Data Augmentation actually works.

How does Data Augmentation work?

If I ask you to label the two images below, you would quickly end up saying the one on the left is a horse and the one on the right is a zebra. We know that the black and white stripes, short tails, flatbacks, and long ears are the features that differentiate a  zebra from a horse. 

Comparison between a horse and zebra

Comparison of similar looking but different animals

When we build a deep learning model to perform this classification task, in order for the model to differentiate between the two images, it requires a lot of training data for both horses and zebra. 

Pro tip: Looking for the perfect data annotation tool? Have a look at 13 Best Image Annotation Tools.

A convolutional neural network (CNN) is invariant to translation, viewpoint, size, or illumination. Hence, CNN is able to classify accurately objects in different orientations. 

This is the fundamental concept of data augmentation.

In real-world use cases, we might have a dataset of photos captured under a specific set of conditions. Our target application, on the other hand, may exist in a number of variations, such as varied orientations, locations, scales, brightness, and so on. We can accommodate such cases by training deep neural networks with synthetically manipulated data.

Deep learning models like CNNs have a large number of parameters that help in learning these complex differentiating features by iteratively “looking” through a lot of examples. Hence, the performance of deep learning models depends on the type and size of the input dataset. 

Pro tip: Read The Essential Guide to Neural Network Architectures.

State-of-the-art computer vision models such as RESNET (60 M) and Inception-V3 (24M) have a huge number of parameters to learn complex features. Natural Language Processing (NLP) models such as BERT (340M) have even more parameters. 

In order to build a deep learning model, we will have to gather a lot of data.

Unfortunately, for many applications, we don't have access to large amounts of data. Data augmentation is a method to deal with the issue of limited data. In data augmentation, we opt to use a few techniques that artificially increase the amount of data from the existing data and address this problem. 

Data augmentation process

Source: The Stanford AI Lab Blog

A generic data augmentation workflow in computer vision tasks has the following steps:

1. Input data is fed to the data augmentation pipeline

2. The data augmentation pipeline is defined by sequential steps of different augmentations

  • TF1: Rotation

  • TF2: Grayscale to RGB

  • TF3: Blur

  • TFN: Flip

3. The image is fed through the pipeline and processed through each step with a probability.

4. After the image is processed, the human expert randomly verifies the augmented results and passes the feedback to the system.

5. After human verification, the augmented data is ready to use by the AI training process.

Pro tip: Check out A Simple Guide to Data Preprocessing in Machine Learning.

Data augmentation is less popular in the NLP domain compared to the computer vision domain. Automating the process of augmenting text data is difficult, due to the complexity of a natural language. Common methods for data augmentation in NLP include:

  • Easy Data Augmentation (EDA) operations: synonym replacement, word insertion, word swap, and word deletion

  • Back translation: re-translating text from the target language back to its original language

  • Contextualized word embeddings

Pro tip: Interesting to learn more about text data? Read A Step-by-Step Guide to Text Annotation [+Free OCR Tool].

Data Augmentation for Model Patching

Model patching enables automating the process of model maintenance and improvement when a deployed model exhibits flaws. 

Model patching is becoming a late-breaking area that would alleviate the major problem in safety-critical systems, including healthcare (e.g. improving models to produce MRI scans free of artifact) and autonomous driving (e.g. improving perception models that may have poor performance on irregular objects or road conditions).

Model patching fixes the subgroup performance gap between images of malignant lesions with and without colored bandages.

Model patching fixes the subgroup performance gap between images of malignant lesions with and without colored bandages.

Pro tip: You can check out this Simple Guide to Image Segmentation to learn more.

Data Augmentation techniques in Computer Vision

Finally, let's take a look at some of the most popular data augmentation methods.

1. Position Augmentation

  1. Center Crop: Crops the given image at the center. Size is the parameter given by the user.

  2. Random Crop: Crop the given image at a random location. 

  3. Random Vertical Flip: Vertically flips the given image randomly with a given probability. 

  4. Random Horizontal flip: Horizontally flip the given image randomly with a given probability. 

  5. Random Rotation: Rotate the image by some angle. 

  6. Resize: Resize the size of the input image to a given size. 

  7. Random Affine: Random affine transformation of the image keeping center invariant. 

2. Color Augmentation

  1. Brightness: One way to augment is to change the brightness of the image. The resultant image becomes darker or lighter compared to the original one.

  2. Contrast: The contrast is defined as the degree of separation between the darkest and brightest areas of an image. The contrast of the image can also be changed.

  3. Saturation: Saturation is the separation between the colors of an image.

Color augmentations on image‍

Color augmentations on image of a tiger

Advanced models for data augmentation

Here's a shortlist of advanced models for data augmentation that gained popularity in the last few years.

Adversarial training/Adversarial machine learning

Adversarial attacks are imperceptible changes to images (pixel-level changes) that can completely change the model prediction. In order to handle this issue, in adversarial training, images are transformed till the deep learning model is deceived and the model fails to correctly analyze the data.

These transformed or augmented images are used in the training examples to make the model robust toward adversarial attacks. 

Augmented image of a panda generated by adding little noise

Augmented image of a panda generated by adding little noise

In the above image, we can see by adding a small amount of noise to an image can confuse the AI classifier and classifies a panda as a gibbon. Hence, it is important to add such alterations to the training dataset to tackle the adversarial attacks.

Generative adversarial networks (GANs)

GANs (Generative adversarial networks) are widely used to generate synthetic images in a target domain.

The synthetic generated images by the GANs are used as augmented images for the input to the model. However, this would end up training the generator and discriminator and also the classifier (based on the use case). The downside to using GANs is that it needs high resource consumption and effort. 

In the below figure, you can see CT scan images generated by a cycleGAN, which is a variation of GAN. This is how GAN-generated CT scan images are being used in the medical field to increase the dataset. Once the dataset is created, it can be used for classification or any other task. 

CT Scan high-resolution images generated by CycleGAN

CT Scan high-resolution images generated by CycleGAN

Neural style transfer

Neural Style Transfer-based augmentation is a very interesting deep learning application.

Here, a series of convolutional layers are trained such that the images are deconstructed where content and style can be separated. 

After separation, the content from an image is composed with the style of another image to create an augmented style image. Thus, the content remains the same but the style is changed. This increases the robustness of the model as the model is working independently of the style of the image.

The below image shows an example of a style of sunflower applied to a photo of a person.

Style transfer: the style of a sunflower applied to the photo

Pro tip: Check out our Neural Style Transfer: Everything You Need to Know guide to learn more.

Data Augmentation use cases

As mentioned before, data augmentation has become one of the most popular methods for artificially increasing the amount of data needed to train robust AI models. It's especially important for domains where acquiring quality data can be a challenge. Here are a few industries that are leveraging data augmentation for data creation.

Healthcare

In medical imaging applications, curating datasets is not a viable option because acquiring a large number of annotated samples from experts is time-consuming and expensive. the network trained with augmentation needs to be more robust and accurate than expected variations of the same X-Ray images.

Pro tip: Have a look at 21+ Best Healthcare Datasets for Computer Vision and see how you can use V7 for Healthcare.

he augmentation step is domain-dependent, not an arbitrary step, that can be applied to all research fields in the same way.

In the below figure, although we can scale the dataset count by augmentations, certain augmentations are not recommended for the given task. For instance, random rotation and reflection on the x-axis are not appropriate for the X-ray images. Hence, the data augmentation technique is different for each task.

Geometric Augmentations on Xray images of the heart‍

Geometric Augmentations on Xray images of the heart

Self-driving cars

Another use case where data augmentation comes in handy pertains to autonomous vehicles.

For example, CARLA has been built for flexibility and realism in rendering and physics simulation. CARLA has been developed from scratch to support the development, training, and validation of autonomous driving systems. Built on top of Unreal Engine 4, it provides and ends to end simulator environment to test the autonomous driving systems in a controlled environment. 

Simulation environments built using reinforcement learning mechanisms can help in training and testing AI systems where data scarcity is an issue. The possibility for data augmentation is endless as the simulation environment can be modeled as per the requirement to generate real-world scenarios.

Autonomous driving simulation

Autonomous driving simulation

Pro tip: Read 9 Revolutionary AI Applications In Transportation.

Data Augmentation: Key takeaways

Here's a short recap of everything we've learned:

  • Data augmentation is a process of artificially increasing the amount of data by generating new data points from existing data.

  • Data augmentation techniques comes down to processes within position augmentation and color augmentation.

  • Advanced models for data augmentation include adversarial machine learning, GANs, and neural style transfer.

  • Data augmentation is used in situations where collecting large amounts of data is difficult. Healthcare and autonomous vehicles are two of the most prominent industries leveraging this method.

Read next:

Optical Character Recognition: What is It and How Does it Work [Guide]

The Complete Guide to CVAT—Pros & Cons

5 Alternatives to Scale AI

YOLO: Real-Time Object Detection Explained

The Ultimate Guide to Semi-Supervised Learning

9 Essential Features for a Bounding Box Annotation Tool

Annotating With Bounding Boxes: Quality Best Practices

Mean Average Precision (mAP) Explained: Everything You Need to Know

The Complete Guide to Ensemble Learning

A data labeling tool where a medical image is being labeled as Basophil Cell

Data labeling

Data labeling platform

Get started today

A data labeling tool where a medical image is being labeled as Basophil Cell

Data labeling

Data labeling platform

Get started today

Deval Shah

Deval Shah

Deval Shah

Deval Shah

Deval is a senior software engineer at Eagle Eye Networks and a computer vision enthusiast. He writes about complex topics related to machine learning and deep learning.

Next steps

Label videos with V7.

Rewind less, achieve more.

Try our free tier or talk to one of our experts.

Next steps

Label videos with V7.

Rewind less, achieve more.