Annotation means any extra information that is attached to the data.
In the machine learning domain, it refers to assigning predefined categories and tags/labels to documents and images. This data-label pair can be used for training classification-related problems by supervised learning where finding hidden patterns is easy.
Artificial Neural Network is an algorithm that works in a similar fashion as the human brain processes and analyzes data.
It is made up of connected nodes called neurons stacked in different layers that perform complex computations to solve problems just like humans would do.
In data analysis refers to approaches for increasing the amount of data by adding slightly changed copies of current data or creating new synthetic data from an existing dataset.
Data augmentation in case of images includes techniques like cropping, rotation, and horizontal flipping.
In deep learning, backpropagation is one of the two sub-processes of the training process, which adjust the parameters of the forward propagation with respect to the error it produces.
Backpropagation stands for “backward propagation of errors”. It refers to the algorithm used for training feedforward neural networks by repeatedly adjusting the network’s weights to minimize the difference between the actual output vector of the net and the desired output vector.
Backpropagation aims to minimize the cost function by adjusting the network’s weights and biases. The cost function gradients determine the level of adjustment concerning parameters like activation function, weights, bias, etc.
COCO is an image dataset composed of 90 different classes of objects (cars, persons, sports balls, bicycles, dogs, cats, horses e.t.c). The dataset was gathered to solve common object detection problems.
Computer Vision is a field of Artificial Intelligence that focuses on developing techniques that help computers see and understand the content of digital images.
It finds application in object detection, facial recognition, robotics, etc.
Convolutional Neural Networks also known as ConvNets are a type of Feed-Forward Neural networks used in tasks like image analysis, natural language processing, or other complex image classification problems.
Dataset is a collection of meaningful data which the machine sees and learns.
Dataset may contain raw information in the form of images, tabular data, signals, videos etc that helps derive inferences. An example is a tabular set of data where each column defines attributes/characteristics of the data and each row is a tuple/record in the dataset.
Deep Learning is a sub-field of machine learning that works in a manner inspired by the neurons of the brain.
The learning method based on an artificial neural network imitates the working of the human brain in processing data and deriving meaningful information to make decisions.
In deep learning, feedforward propagation is one of the two sub-processes of the training process, which builds correlation by assigning parameters.
Feedforward Propagation occurs when the input data is fed in the forward direction through the network. Each hidden layer receives the input data, processes it (using an Activation Function), and passes it onto the next layer.
In the feedforward propagation, the Activation Function is a mathematical “gate” in between the input feeding the current neuron and its output going to the next layer.
Image Preprocessing are the steps we take to convert a raw image into an enhanced form that the model is ready to use for training and inference.
The images collected for any computer vision tasks may be of different sizes, contrast, orientation. Image Preprocessing involves all the deterministic steps that we take to make the images all formatted correctly.
Image recognition is the ability of the machine to identify objects, places, people in an image.
A common application of image recognition is optical character recognition (OCR) where image recognition technique is used to identify the characters in the image to convert it to text.
Image segmentation is a subfield of Artificial Intelligence and Computer Vision typically used to locate objects and boundaries in images.
It is based on partitioning an image into regions of meaningful information like finding people in the images or locating boundaries of lungs in medical images.
Instance segmentation models classify pixels into categories on the basis of “instances” rather than classes.
An instance segmentation algorithm has no idea of the class a classified region belongs to but can segregate overlapping or very similar object regions on the basis of their boundaries.
If the same image of a crowd we talked about before is fed to an instance segmentation model, the model would be able to segregate each person from the crowd as well as the surrounding objects (ideally), but would not be able to predict what each region/object is an instance of.
Machine Learning is a branch of Artificial Intelligence that allows computers to imitate humans in decision-making without being explicitly programmed.
It uses algorithms and statistical techniques to learn from data and draw patterns and hidden insights from them without human intervention.
Natural Language Processing refers to the branch of Artificial Intelligence that gives machines the ability to read, understand and derive meaning from languages, just like humans do.
NLP is a lot widespread in domains like chatbot services, speech recognition, machine translation and used in day-to-day activities like searching, autocorrection features, and many more.
Neuron is the basic computational unit of a neural network.
It is a node through which all the input data from the previous layer flows and later mathematical operations are applied on them to convert them into output signals.
Object detection is a technology that includes computer vision and image processing used to detect objects of a certain class in images or videos.
Data is collected through computer vision which is fed into deep learning models to draw conclusions as in self-driving cars, facial recognition systems, etc.
Object tracking is a deep learning process where the algorithm tracks the movement of an object. In other words, it is the task of estimating or predicting the positions and other relevant information of moving objects in a video.
Object tracking usually involves the process of object detection. Here’s a quick overview of the steps:
Object detection, where the algorithm classifies and detects the object by creating a bounding box around it.
Assigning unique identification for each object (ID).
Tracking the detected object as it moves through frames while storing the relevant information.
Overfitting refers to the model that models the training data way too well.
It is a common pitfall in deep learning algorithms in which a model tries to fit the training data entirely and ends up memorizing the data patterns and the noise and random fluctuations.
These models fail to generalize and perform well in the case of unseen data scenarios, defeating the model's purpose.
Panoptic segmentation can be expressed as the combination of semantic segmentation and instance segmentation where each instance of an object in the image is segregated and the object’s identity is predicted.
Panoptic segmentation algorithms find large-scale applicability in popular tasks like self-driving cars where a huge amount of information about the immediate surroundings must be captured with the help of a stream of images.
Polygons are a type of image annotation method, particularly effective thanks to the ability to create a mask around the desired object at a pixel level.
The tool works in a way that lets you simply start drawing a line made of individual points around the object in the image to create polygon masks.
Reinforcement Learning is a type of machine learning algorithm that learns to solve a multi-level problem by trial and error.
The machine is trained on real-life scenarios to make a sequence of decisions. It receives either rewards or penalties for the actions it performs. Its goal is to maximize the total reward.
By Deep Reinforcement Learning we mean multiple layers of Artificial Neural Networks that are present in the architecture to replicate the working of a human brain.
Training Data is a set of data that is fed to any machine learning algorithm for it to learn and derive patterns and use this knowledge for further predictions.
It forms the major part of the complete set of data available by the model.
Underfitting occurs when we have a high bias in our data, i.e., we are oversimplifying the problem, and as a result, the model does not work correctly in the training data.
Unsupervised Learning is a type of machine learning in which the algorithms are provided with data that does not contain any labels or explicit instructions on what to do with it.
The goal is for the learning algorithm to find structure in the input data on its own.
To put it simply—Unsupervised Learning is a kind of self-learning where the algorithm can find previously hidden patterns in the unlabeled datasets and give the required output without any interference.
Identifying these hidden patterns helps in clustering, association, and detection of anomalies and errors in data.