Computer vision
A Gentle Introduction to Deep Learning—the ELI5 Way
16 min read
—
Oct 27, 2021
What is Deep Learning, how does it work, and what are its most common applications? Here's the most comprehensive guide to Deep Learning for beginners.
Nilesh Barla
Here's an interesting fact—
Each month, there are 186.000 Google searches for the keyword "deep learning."
It's a boiling hot area of research, and the word is out—Deep Learning is a promising technology that can radically transform the world we live in.
No wonder it's been gaining traction and attracting the attention of researchers, AI-first businesses, and media alike.
The chances are that you've landed on this page looking for an explanation of what Deep Learning is all about and why you should care.
The good news is—we've got the answers you are looking for. And we are happy to explain them in plain English.
Here’s what we’ll cover:
What is Deep Learning?
Deep Learning vs. Machine Learning
How does Deep Learning work?
How to create and train Deep Learning models
Deep Learning limitations
4 Deep Learning applications
Best Deep Learning resources
And if you want to skip the written guide, make sure to check out this detailed video introduction to Deep Learning.
Now, let's break things down!
What is Deep Learning?
Deep Learning is a subset of Machine Learning that uses mathematical functions to map the input to the output. These functions can extract non-redundant information or patterns from the data, which enables them to form a relationship between the input and the output.
This is known as learning, and the process of learning is called training.
In traditional computer programming, input and a set of rules are combined together to get the desired output. In machine learning and deep learning, input and output are correlated to the rules.
These rules—when combined with new input—yield desired results.
Modern deep learning models use artificial neural networks or simply neural networks to extract information.
These neural networks are made up of a simple mathematical function that can be stacked on top of each other and arranged in the form of layers, giving them a sense of depth, hence the term Deep Learning.
Deep learning can also be thought of as an approach to Artificial Intelligence, a smart combination of hardware and software to solve tasks requiring human intelligence.
Deep Learning was first theorized in the 1980s, but it has only become useful recently because:
It requires large amounts of labeled data
It requires significant computational power (high performing GPUs)
If you are curious to learn more about the use of AI across various industries, check out:
Next, we'll define the key elements that make up the Deep Learning algorithms.
If you are looking for a free image annotation tool, check out The Complete Guide to CVAT—Pros & Cons.
Neural Networks
The neural network is the heart of deep learning models, and it was initially designed to mimic the working of the neurons in the human brain.
Here are its components.
The neuronal perception of deep learning is generally motivated by two main ideas:
It is assumed that the human brain proves that intelligent behavior is possible, and—by reverse engineering, it is possible to build an intelligent system
Another perspective is that to understand the working of the human brain and the principles that underlie its intelligence is to build a mathematical model that could shed light on the fundamental scientific questions.
In essence, neural networks enable us to learn the structure of the data or information and help us to understand it by performing tasks such as clustering, classification, regression, or sample generation.
Deep Learning vs. Machine Learning
Why is Deep Learning more powerful than traditional Machine Learning?
Deep Learning can essentially do everything that machine learning does, but not the other way around.
For instance, machine learning is useful when the dataset is small and well-curated, which means that the data is carefully preprocessed.
Data preprocessing requires human intervention. It also means that when the dataset is large and complex, machine learning algorithms will fail to extract information, and it will underfit.
Looking for quality training data? Check out 65+ Best Free Datasets for Machine Learning.
Generally, machine learning is alternatively termed shallow learning because it is very effective for smaller datasets.
Deep learning, on the other hand, is extremely powerful when the dataset is large.
It can learn any complex patterns from the data and can draw accurate conclusions on its own. In fact, deep learning is so powerful that it can even process unstructured data—data that is not adequately arranged like text corpus, social media activity, etc.
Furthermore, it can also generate new data samples and find anomalies that machine learning algorithms and human eyes can miss.
On the downside, deep learning is computationally expensive compared to machine learning, which also means that it requires a lot of time to process.
Deep Learning and Machine Learning are both capable of different types of learning: Supervised Learning (labeled data), Unsupervised Learning (unlabeled data), and Reinforcement Learning. But their usefulness is usually determined by the size and complexity of the data.
Learn more about Supervised vs. Unsupervised Learning.
To summarize:
Machine learning requires data preprocessing, which involves human intervention.
The neural networks in deep learning are capable of extracting features; hence no human intervention is required.
Deep Learning can process unstructured data.
Deep Learning is usually based on representative learning i.e., finding and extracting vital information or patterns that represent the entire dataset.
Deep learning is computationally expensive and time-consuming.
Check out 20+ Open Source Computer Vision Datasets to find quality data.
How does Deep Learning work?
Now, let's dive in to learn how Deep Learning works.
Deep Neural Networks have multiple layers of interconnected artificial neurons or nodes that are stacked together. Each of these nodes has a simple mathematical function—usually a linear function that performs extraction and mapping of information.
There are three layers to a deep neural network: the input layer, hidden layers, and the output layer.
The data is fed into the input layer.
Each node in the input layer ingests the data and passes it onto the next layer, i.e., the hidden layers. These hidden layers increasingly extract features from the given input layer and transform it using the linear function.
These layers are called hidden layers because the parameters (weights and biases) in each node are unknown; these layers add random parameters to transform the data, each of which yields different output.
Read 12 Types of Neural Network Activation Functions: How to Choose?
The output yielded from the hidden layers is then passed on to the final layer called the output layer, where depending upon the task, it classifies, predicts, or generates samples.
This process is called forward propagation.
In another process called backpropagation, an algorithm, like gradient descent, calculates errors by taking the difference between the predicted output and the original output.
This error is then adjusted by fine-tuning the weights and biases of the function by moving backward through the layers.
Both, the process of forward propagation and backpropagation allows a neural network to reduce the error and achieve high accuracy in a particular task. With each iteration, the algorithm becomes gradually more accurate.
Types of neural networks
There are several types of neural networks.
CNN
The Convolutional Neural Networks or CNNs are primarily used for tasks related to computer vision or image processing.
CNNs are extremely good in modeling spatial data such as 2D or 3D images and videos. They can extract features and patterns within an image, enabling tasks such as image classification or object detection.
RNN
The Recurrent Neural Networks or RNN are primarily used to model sequential data, such as text, audio, or any type of data that represents sequence or time. They are often used in tasks related to natural language processing (NLP).
GAN
Generative adversarial networks or GANs are frameworks that are used for the tasks related to unsupervised learning. This type of network essentially learns the structure of the data, and patterns in a way that it can be used to generate new examples, similar to that of the original dataset.
Transformers
Transformers are the new class deep learning model that is used mostly for the tasks related to modeling sequential data, like that in NLP. It is much more powerful than RNNs and they are replacing them in every task.
Recently, transformers are also being applied in computer vision tasks and they are proving to be quite effective than the traditional CNNs.
How to Create and Train Deep Learning Models
In this section, we'll discuss two distinct strategies for training deep learning models.
Train from scratch
To train a deep network from scratch, we need to have access to a large dataset, which you can find online. Once you have collected the data, you need to design a deep neural network that will extract and learn the features of the dataset.
Designing a deep neural network can be a tedious task.
In order to get started, you can make use of the V7.
Here's a quick tutorial:
1. Sign up for the 14-day free trial
V7 now offers you three models that you can explore and train: Image Classification, Object Detection, Instance Segmentation.
V7 also comes with a public, in-built Text Scanner (OCR) model that you can use for document processing. It also provides an AI document automation platform for extracting data from unstructured PDFs and other sources.
Learn more about Optical Character Recognition.
2. To get started, go to the main dashboard of V7 and click on the ‘Neural Networks’ tab on the left.
3. Once you are in, you can then click on the +NEW MODEL button on the top right-hand corner, this will navigate you to the menu page, where you will find the three models:
Instance Segmentation
Object Detection
Classification
Let us briefly walk you through the training of the instance segmentation model.
4. Select the Model card and click ‘Continue’ which will take you to the next page to select your dataset for training.
5. Once you have selected the dataset, click on "Continue". Next, you will see the breakdown of the number of images that will be used for training, validation, and testing.
6. Click on ‘Start Training’ which you will find at the bottom right of the dashboard.
7. Once the training is completed, V7 will notify you via email that your model has finished training and is ready to use.
Transfer Learning
Transfer learning is an approach where you use an existing pre-trained model and fine-tune it with your desired dataset. This is the most common approach.
Networks such as AlexNet or GoogLeNet, VGG16, and VGG19 are some of the most common pre-trained networks.
Transfer learning has advantages over training a model from scratch because:
a) You don’t need to design an entire architecture from scratch.
b) The training time is shorter.
c) You can train with less data.
Check out 15+ Top Computer Vision Project Ideas for Beginners to start building your own models.
Deep Learning Limitations
We hope that this does not come as a surprise, but it's worth mentioning that Deep Learning, indeed, has several limitations. We've listed a few of them below.
Data availability
Deep learning models require a lot of data to learn the representation, structure, distribution, and pattern of the data.
If there isn't enough varied data available, then the model will not learn well and will lack generalization (it won't perform well on unseen data).
The model can only generalize well if it is trained on large amounts of data.
The complexity of the model
Designing a deep learning model is often a trial and error process.
A simple model is most likely to underfit, i.e. not able to extract information from the training set, and a very complex model is most likely to overfit, i.e., not able to generalize well on the test dataset.
Deep learning models will perform well when their complexity is appropriate to the complexity of the data.
Lacks global generalization
A simple neural network can have thousands to tens of thousands of parameters.
The idea of global generalization is that all the parameters in the model should cohesively update themselves to reduce the generalization error or test error as much as possible. However, because of the complexity of the model, it is very difficult to achieve zero generalization error on the test set.
Hence, the deep learning model will always lack global generalization which can at times yield wrong results.
Incapable of Multitasking
Deep neural networks are incapable of multitasking.
These models can only perform targeted tasks, i.e., process data on which they are trained. For instance, a model trained on classifying cats and dogs will not classify men and women.
Furthermore, applications that require reasoning or general intelligence are completely beyond what the current generation’s deep learning techniques can do, even with large sets of data.
Hardware dependence
As mentioned before, deep learning models are computationally expensive.
These models are so complex that a normal CPU will not be able to withstand the computational complexity. However, multicore high-performing graphics processing units (GPUs) and tensor processing units (TPUs) are required to effectively train these models in a shorter time.
Although these processors save time, they are expensive and use large amounts of energy.
4 Deep Learning Applications
Now, let's have a closer look at the most important Deep Learning applications.
Deep Learning finds applications in:
Speech recognition: Some of the familiar software like Apple’s Siri, Google’s Alexa, and Microsoft Cortana are based on deep neural networks.
Pattern recognition: Pattern recognition is very useful in medical and life sciences. These algorithms can help radiologists to find tumor cells in the CT scans or even help them to understand the mechanism behind protein folding. Furthermore, other areas such as finance can also use pattern recognition systems to detect fraudulent transactions.
NLP: Natural language processing or NLP is one of the hot topics in deep learning these days. Modern architectures like the transformers have revolutionized and improved machine translation and language modeling. One of such models is GPT3 by Ope; it has almost reached general intelligence in all NLP tasks.
Recommender systems: Recommender systems are on almost every social media platform these days from Instagram to YouTube and Netflix. These companies use a recommendation system to recommend shows, videos, posts, and stories based on users' activities.
Real-life Deep Learning use cases
Finally, here are some of the real-life use cases of deep learning.
Healthcare
Medical image analysis: Medical images such as CT scans, MRI, and X-rays can sometimes be difficult to interpret; this mostly happens when the anomalies like the tumor get saturated in the background.
Deep learning algorithms can help to find anomalies that are unseen to the naked eye. Algorithms like the Hierarchical Probabilistic U-Net by Google’s DeepMind is one such example that is capable of finding tumor cells in medical images. Such algorithms are found to be a great tool for radiologists and doctors.
Surgical robotics: There are times when a critical patient is unable to find a surgeon; in such dire and life-threatening conditions surgical robots can come to the rescue. Such robots have a superhuman ability to repeat exact motions like that of a trained surgeon.
Go to Medical Image Annotation with V7 to learn more.
Transportation
Self-driving cars: Self-driving cars are becoming one of the trending topics in the world right now. Companies Tesla, Waymo, and others are pushing this trend of developing technology for safe driving.
All these companies use deep learning as their core algorithm; these models can consume a lot of data and enable these cars to navigate through roads while making correct decisions through analyzing the roads and vehicles around them. These cars are so advanced that they can even predict accidents.
Smart cities: Smart cities can manage their resources efficiently and manage traffic, public services, and disaster response. The way it works is that the input from different sensors from all over the city can be used to collect data and a deep learning system trained on that data can be used to predict different outputs based upon the scenario.
Agriculture
Robot picking: Deep learning can be used to enable robots that can classify and pick crops. These robots can save time and increase the production rate as well.
Crop and soil monitoring: Deep learning model trained on the crop and soil condition data can be used to build a system that can effectively monitor crop and soil help estimate yield.
Livestock monitoring: Animals can move from one place to another, making them difficult to monitor. That’s where image annotation for computer vision comes in. Image annotation with deep learning can enable farmers to track the location, predict the livestock's food needs, and monitor the rest cycle to ensure that they are in good health.
Plant disease and pest detection: Another useful area for deep learning in agriculture is to classify plants suffering from the disease from healthy plants. This type of system can help farmers take proper treatment of the plant before they die. Furthermore, deep learning can also be used to detect pest infestation.
Best Deep Learning Resources
Hungry for more? ;-)
Check out our TOP 3 Deep Learning resources to learn more:
Deep Learning: Key Takeaways
We've learned today that Deep learning is a very versatile tool.
Inspired by the biological brain deep learning has proven its usefulness in almost all areas of science and engineering. Here's a quick recap of everything we've discussed:
A deep learning model is made of up an interconnected multilayer neural network.
The basic part of the neural network is called a node, which is simply a mathematical linear function.
The deep learning model maps the input and the output to find a correlation between them. This correlation can be then used to cluster, predict, classify, and even generate new samples of data.
One needs to train a deep learning model to make it learn and produce accurate results.
The training process consists of two sub-processes called forward propagation and backward propagation. The former builds correlation by assigning parameters while the latter adjust those parameters with respect to the error it produces.
Read next:
A Step-by-Step Guide to Text Annotation [+Free OCR Tool]
The Complete Guide to CVAT—Pros & Cons
The Ultimate Guide to Semi-Supervised Learning
9 Essential Features for a Bounding Box Annotation Tool
The Complete Guide to Ensemble Learning
The Beginner’s Guide to Contrastive Learning
9 Reinforcement Learning Real-Life Applications
Mean Average Precision (mAP) Explained: Everything You Need to Know
Domain Adaptation in Computer Vision: Everything You Need to Know
Nilesh Barla is the founder of PerceptronAI, which aims to provide solutions in medical and material science through deep learning algorithms. He studied metallurgical and materials engineering at the National Institute of Technology Trichy, India, and enjoys researching new trends and algorithms in deep learning.