Computer vision
The Essential Guide to Data Augmentation in Deep Learning
12 min read
—
May 6, 2022
What is data augmentation, how does it work, and what are its most prominent use cases? Learn everything you need to know about data augmentation techniques for computer vision and start training your AI models on V7 today.
Deval Shah
The accuracy of deep learning models largely depends on the quality, quantity, and contextual meaning of training data. However, data scarcity is one of the most common challenges in building deep learning models. In production use cases, collecting such data can be costly and time-consuming.
Companies leverage a low-cost and effective method—data augmentation to reduce dependency on the collection and preparation of training examples and build high-precision AI models quicker.
Here’s what we’ll cover:
What is Data Augmentation
How does Data Augmentation work
Data Augmentation techniques in Computer Vision
Data Augmentation use cases
And in case you are looking for a tool to annotate data and train your computer vision models—V7 got you covered. We won't go into details as to why V7 has been voted the top training data platform on the market, but you can go ahead and check out:
Here's a sneak peak!
Now, let’s dive in.
What is data augmentation?
Data augmentation is a process of artificially increasing the amount of data by generating new data points from existing data. This includes adding minor alterations to data or using machine learning models to generate new data points in the latent space of original data to amplify the dataset.
A question may arise about the difference between augmented data and synthetic data.
Synthetic data: When data is generated artificially without using real-world images. Synthetic data are often produced by Generative Adversarial Networks
Augmented data: Derived from original images with some sort of minor geometric transformations (such as flipping, translation, rotation, or the addition of noise) in order to increase the diversity of the training set.
Pro tip: Check out The Train, Validation, and Test Sets: How to Split Your Machine Learning Data
Today, there are a lot of privacy concerns revolving around data collection and usage. Hence, many researchers and companies are using synthetic data generation techniques to build datasets. However, due to limitations such as its lack of resemblance to the original data, augmented data is generally preferred over synthetic data.
Pro tip: To learn more about synthetic data, check out our guide: What is Synthetic Data in Machine Learning and How to Generate It
The Importance of Data Augmentation
Here are some of the reasons why data augmentation techniques have been gaining popularity in the last few years.
Improves the performance of ML models (more diverse datasets).
Data augmentation methods are widely used in practically every cutting-edge deep learning application such as object detection, image classification, image recognition, natural language understanding, semantic segmentation, and more.
Augmented data is improving the performance and results of deep learning models by generating new and diverse instances for training datasets.
Reduces operation costs related to data collection
Data collection and data labeling can be time-consuming and expensive processes for deep learning models. Companies can cut operational expenses by transforming datasets using data augmentation techniques.
Pro tip: Ready to train your models? Here's the list of 65+ Best Free Datasets for Machine Learning and 20+ Open Source Computer Vision Datasets.
Limitations of Data Augmentation
Of course, this method also comes with its own challenges, including:
Cost of quality assurance of the augmented datasets.
Research and Development to build synthetic data with advanced applications.
Verification of image augmentation techniques like GANs is challenging.
Finding an optimal augmentation strategy for the data is non-trivial.
The inherent bias of original data persists in augmented data.
Now, let's dive into the practicalities of how Data Augmentation actually works.
How does Data Augmentation work?
If I ask you to label the two images below, you would quickly end up saying the one on the left is a horse and the one on the right is a zebra. We know that the black and white stripes, short tails, flatbacks, and long ears are the features that differentiate a zebra from a horse.
Comparison of similar looking but different animals
When we build a deep learning model to perform this classification task, in order for the model to differentiate between the two images, it requires a lot of training data for both horses and zebra.
Pro tip: Looking for the perfect data annotation tool? Have a look at 13 Best Image Annotation Tools.
A convolutional neural network (CNN) is invariant to translation, viewpoint, size, or illumination. Hence, CNN is able to classify accurately objects in different orientations.
This is the fundamental concept of data augmentation.
In real-world use cases, we might have a dataset of photos captured under a specific set of conditions. Our target application, on the other hand, may exist in a number of variations, such as varied orientations, locations, scales, brightness, and so on. We can accommodate such cases by training deep neural networks with synthetically manipulated data.
Deep learning models like CNNs have a large number of parameters that help in learning these complex differentiating features by iteratively “looking” through a lot of examples. Hence, the performance of deep learning models depends on the type and size of the input dataset.
Pro tip: Read The Essential Guide to Neural Network Architectures.
State-of-the-art computer vision models such as RESNET (60 M) and Inception-V3 (24M) have a huge number of parameters to learn complex features. Natural Language Processing (NLP) models such as BERT (340M) have even more parameters.
In order to build a deep learning model, we will have to gather a lot of data.
Unfortunately, for many applications, we don't have access to large amounts of data. Data augmentation is a method to deal with the issue of limited data. In data augmentation, we opt to use a few techniques that artificially increase the amount of data from the existing data and address this problem.
Source: The Stanford AI Lab Blog
A generic data augmentation workflow in computer vision tasks has the following steps:
1. Input data is fed to the data augmentation pipeline
2. The data augmentation pipeline is defined by sequential steps of different augmentations
TF1: Rotation
TF2: Grayscale to RGB
TF3: Blur
TFN: Flip
3. The image is fed through the pipeline and processed through each step with a probability.
4. After the image is processed, the human expert randomly verifies the augmented results and passes the feedback to the system.
5. After human verification, the augmented data is ready to use by the AI training process.
Pro tip: Check out A Simple Guide to Data Preprocessing in Machine Learning.
Data augmentation is less popular in the NLP domain compared to the computer vision domain. Automating the process of augmenting text data is difficult, due to the complexity of a natural language. Common methods for data augmentation in NLP include:
Easy Data Augmentation (EDA) operations: synonym replacement, word insertion, word swap, and word deletion
Back translation: re-translating text from the target language back to its original language
Contextualized word embeddings
Pro tip: Interesting to learn more about text data? Read A Step-by-Step Guide to Text Annotation [+Free OCR Tool].
Data Augmentation for Model Patching
Model patching enables automating the process of model maintenance and improvement when a deployed model exhibits flaws.
Model patching is becoming a late-breaking area that would alleviate the major problem in safety-critical systems, including healthcare (e.g. improving models to produce MRI scans free of artifact) and autonomous driving (e.g. improving perception models that may have poor performance on irregular objects or road conditions).
Pro tip: You can check out this Simple Guide to Image Segmentation to learn more.
Data Augmentation techniques in Computer Vision
Finally, let's take a look at some of the most popular data augmentation methods.
1. Position Augmentation
Center Crop: Crops the given image at the center. Size is the parameter given by the user.
Random Crop: Crop the given image at a random location.
Random Vertical Flip: Vertically flips the given image randomly with a given probability.
Random Horizontal flip: Horizontally flip the given image randomly with a given probability.
Random Rotation: Rotate the image by some angle.
Resize: Resize the size of the input image to a given size.
Random Affine: Random affine transformation of the image keeping center invariant.
2. Color Augmentation
Brightness: One way to augment is to change the brightness of the image. The resultant image becomes darker or lighter compared to the original one.
Contrast: The contrast is defined as the degree of separation between the darkest and brightest areas of an image. The contrast of the image can also be changed.
Saturation: Saturation is the separation between the colors of an image.
Color augmentations on image of a tiger
Advanced models for data augmentation
Here's a shortlist of advanced models for data augmentation that gained popularity in the last few years.
Adversarial training/Adversarial machine learning
Adversarial attacks are imperceptible changes to images (pixel-level changes) that can completely change the model prediction. In order to handle this issue, in adversarial training, images are transformed till the deep learning model is deceived and the model fails to correctly analyze the data.
These transformed or augmented images are used in the training examples to make the model robust toward adversarial attacks.
Augmented image of a panda generated by adding little noise
In the above image, we can see by adding a small amount of noise to an image can confuse the AI classifier and classifies a panda as a gibbon. Hence, it is important to add such alterations to the training dataset to tackle the adversarial attacks.
Generative adversarial networks (GANs)
GANs (Generative adversarial networks) are widely used to generate synthetic images in a target domain.
The synthetic generated images by the GANs are used as augmented images for the input to the model. However, this would end up training the generator and discriminator and also the classifier (based on the use case). The downside to using GANs is that it needs high resource consumption and effort.
In the below figure, you can see CT scan images generated by a cycleGAN, which is a variation of GAN. This is how GAN-generated CT scan images are being used in the medical field to increase the dataset. Once the dataset is created, it can be used for classification or any other task.
CT Scan high-resolution images generated by CycleGAN
Neural style transfer
Neural Style Transfer-based augmentation is a very interesting deep learning application.
Here, a series of convolutional layers are trained such that the images are deconstructed where content and style can be separated.
After separation, the content from an image is composed with the style of another image to create an augmented style image. Thus, the content remains the same but the style is changed. This increases the robustness of the model as the model is working independently of the style of the image.
The below image shows an example of a style of sunflower applied to a photo of a person.
Style transfer: the style of a sunflower applied to the photo
Pro tip: Check out our Neural Style Transfer: Everything You Need to Know guide to learn more.
Data Augmentation use cases
As mentioned before, data augmentation has become one of the most popular methods for artificially increasing the amount of data needed to train robust AI models. It's especially important for domains where acquiring quality data can be a challenge. Here are a few industries that are leveraging data augmentation for data creation.
Healthcare
In medical imaging applications, curating datasets is not a viable option because acquiring a large number of annotated samples from experts is time-consuming and expensive. the network trained with augmentation needs to be more robust and accurate than expected variations of the same X-Ray images.
Pro tip: Have a look at 21+ Best Healthcare Datasets for Computer Vision and see how you can use V7 for Healthcare.
he augmentation step is domain-dependent, not an arbitrary step, that can be applied to all research fields in the same way.
In the below figure, although we can scale the dataset count by augmentations, certain augmentations are not recommended for the given task. For instance, random rotation and reflection on the x-axis are not appropriate for the X-ray images. Hence, the data augmentation technique is different for each task.
Geometric Augmentations on Xray images of the heart
Self-driving cars
Another use case where data augmentation comes in handy pertains to autonomous vehicles.
For example, CARLA has been built for flexibility and realism in rendering and physics simulation. CARLA has been developed from scratch to support the development, training, and validation of autonomous driving systems. Built on top of Unreal Engine 4, it provides and ends to end simulator environment to test the autonomous driving systems in a controlled environment.
Simulation environments built using reinforcement learning mechanisms can help in training and testing AI systems where data scarcity is an issue. The possibility for data augmentation is endless as the simulation environment can be modeled as per the requirement to generate real-world scenarios.
Autonomous driving simulation
Pro tip: Read 9 Revolutionary AI Applications In Transportation.
Data Augmentation: Key takeaways
Here's a short recap of everything we've learned:
Data augmentation is a process of artificially increasing the amount of data by generating new data points from existing data.
Data augmentation techniques comes down to processes within position augmentation and color augmentation.
Advanced models for data augmentation include adversarial machine learning, GANs, and neural style transfer.
Data augmentation is used in situations where collecting large amounts of data is difficult. Healthcare and autonomous vehicles are two of the most prominent industries leveraging this method.
Read next:
Optical Character Recognition: What is It and How Does it Work [Guide]
The Complete Guide to CVAT—Pros & Cons
YOLO: Real-Time Object Detection Explained
The Ultimate Guide to Semi-Supervised Learning
9 Essential Features for a Bounding Box Annotation Tool
Annotating With Bounding Boxes: Quality Best Practices
Mean Average Precision (mAP) Explained: Everything You Need to Know
The Complete Guide to Ensemble Learning