Training deep learning models for solving computer vision tasks requires feeding the algorithms with meaningful data they can learn from.
Therefore, your first job is to collect and annotate training data that includes examples of the objects you want to train on—and you need a tool to help with that.
This, as you’ve probably guessed, brings us to CVAT.
CVAT is one of the most popular free image and video annotation tools that you can use to label your data.
It’s used by computer vision amateurs and professional data annotation teams alike, and our tutorial will explore the ins and outs of the data annotation process using this tool.
That’s not all!
In the last section, we’ll also show you how to train a computer vision model on V7 using your dataset labeled with CVAT. Trust me, it’s much easier and faster than you think—stick with us to see for yourself.
Here’s what we’ll cover:
As you’ll learn in a few minutes, CVAT is relatively easy to use and quite flexible, but it does have plenty of limitations, too. Luckily—
We now have many more advanced data annotation tools that address those limitations and allow you to annotate your data even 10x faster (no joke!)
And while we’re not here to advertise ourselves and brag about V7’s 5-star reviews and extensive functionalities, we can’t help but let you know that CVAT is not the only option out there.
In fact, here’s a list of the 13 Best Image Annotation Tools you might want to check.
Solve any video or image labeling task 10x faster and with 10x less manual work.
Don't start empty-handed. Explore our repository of 500+ open datasets and test-drive V7's tools.
And in case you do end up trying V7, here are some useful links:
Now, let’s talk about all-things-CVAT (finally, right?!)
CVAT (Computer Vision Annotation Tool) is a popular open-source image & video annotation tool developed by Intel. You can either use it online (with some limitations) or install it on your local machine, which we’ll explain in a moment.
CVAT is used for labeling data for solving computer vision tasks such as:
It supports multiple annotation formats like YOLO, Pascal VOC, or MS COCO, to name a few, and if you want to dig deeper, you can check CVAT’s source code on Github. It’s distributed under the MIT license.
Don’t forget to also visit CVAT’s Documentation page in case something isn’t clear.
And now it’s finally time to roll up our sleeves and get some hands-on experience using CVAT for data annotation.
If you don’t have a dataset to work with, by the way—worry not!
You can find and download hundreds of quality datasets from our Open Datasets repository completely free of charge.
Ready to begin?
There are two ways you can label your data on CVAT—either on the CVAT website (online) or by configuring it on your local machine.
We’ll discuss both options now, starting with the simpler one—using CVAT’s web-based platform.
To start labeling, create an account on cvat.org. Once you’ve done that, log in, and you’ll land on this page.
To start annotating your data, you need to create a new labeling task.
Add the name of your dataset, the labels you want to use, and the attributes (if needed).
Next, upload your raw data either from your computer or the cloud. You can drag and drop your files or use CVAT’s command line interface (CLI) if that’s your preferred method.
You can also tweak some of the settings. Advanced configuration allows you to:
After your task is created, you can find it under the “Tasks” tab, and by then, you’re pretty much ready to start annotating. CVAT will also automatically calculate some parameters (e.g., image quality score) and estimate the time needed to finish your task.
For the sake of this tutorial, we’ll annotate our images using bounding boxes in order to create a dataset for training object detectors.
Open the task, choose a “rectangle” symbol from the menu on the left, pick your label and a drawing method, and draw your first box around the object you want to annotate.
Voila! You’ve just completed your first annotation. The class name and the annotation type will be visible on the right side of the interface for you to see.
Apart from bounding boxes, you can also annotate images using polygons, polylines, keypoints, cuboids, and tags.
Let’s not forget that CVAT also comes equipped with features such as interpolation (between keyframes for bounding boxes and polygons) and automatic annotation (more about it later).
Here are a couple of best practices to keep in mind when labeling with CVAT:
Once you’ve annotated all of your data, it’s time to export it!
Head over to the ”Tasks” tab, choose the task you’ve completed, and click “Dump Annotations.” CVAT allows you to export your annotations in multiple formats, including COCO, Pascal VOC, YOLO, LabelMe, and more.
Remember that CVAT’s online platform allows you to add up to 10 tasks per user and upload only 500Mb of data.
So, labeling on the CVAT web-based platform—pretty easy, right?
If you need to annotate large amounts of data without limitations, setting up CVAT locally is inevitable. Worry no, though—again, it’s very easy!
Below is the snippet of the installation instruction of CVAT for Windows 10 that you can find in CVAT’s documentation.
Before following this installation guide, make sure to:
Make sure to check the instructions specific to your operating system.
Apart from manual labeling tools, CVAT is also optimized for semi-automatic annotation that can help you speed up the process significantly—even up to 4x.
Have a look at this video to get a better understanding of how it all works.
You can choose from Interactors, Detectors, and Trackers.
You can use Interactors to create polygons semi-automatically. Available DL models from this category can be used to label any object. Depending on your use case, you’ll have to use regular, positive, or negative points to create a polygon.
Here are some models you can use:
You can use detectors to annotate one frame automatically. Supported models, such as YOLO-v3, are suitable only for specified labels.
You can use Trackers to annotate your objects with bounding boxes. Similar to Interactors, the available models can be used to annotate any objects. Labeled objects are automatically tracked when you move to the next frame.
CVAT’s documentation mentions SiamMask as one of the available deep learning models for Object Tracking and Segmentation.
To summarize everything we’ve covered so far, let’s have a quick look at some of the CVAT’s pros and cons:
Of course, apart from annotating bounding boxes for object detection, CVAT also allows you to annotate your data for image classification, semantic segmentation, instance segmentation, and object tracking.
Here are some of the use cases (courtesy of Andrey Zhavoronkov from Delta-Course.org)
You can draw bounding boxes, polylines, polygons, and keypoints on both image and video data.
You can apply tags with attributes, including boolean, choice, and number or text.
You can segment the image and manipulate the shapes.
If you’re curious to learn more about real-life applications of computer vision and AI, feel free to check out:
As we mentioned at the beginning of this article, CVAT is not the only data annotation tool out there.
It’s undoubtedly one of the most popular ones, but as you’ve probably already figured—it’s also quite basic.
If you’re serious about labeling large amounts of data and doing it efficiently, you might want to upgrade to a much sleeker and more powerful platform than CVAT.
Andrew Achkar, Technical Director at Miovison, did exactly that—and switched from Miovision’s internal tool build on top of CVAT to V7, saying:
“We chose V7 because we wanted to build new types of workflows. We had our own system, but we wanted it to accomplish additional tasks like creating other annotations types, re-annotations, annotations on videos—activities that would be a lot of effort in development. V7 met our needs."
However, there are also other open-source options out there. Here’s a shortlist of the most popular (and free) annotation platforms:
LabelMe is a free online annotation tool created by the MIT Computer Science and Artificial Intelligence Laboratory. Labelme supports six different annotation types such as polygon, rectangle, circle, line, point, and line strip.
A graphical image annotation tool to label objects using bounding boxes in images written in Python. You can export your annotations as XML files in PASCAL VOC format.
VoTT (Visual Object Tagging Tool) is a free and open-source image annotation and labeling tool developed by Microsoft.
ImgLab is an open-source and web-based image annotation tool. It provides multiple label types such as points, circles, boundary boxes, polygons.
Finally, let’s train your computer vision model on V7. Get your data labeled on CVAT ready!
To begin, you need to sign up for a 14-day free trial to get access to our tool. And once you are in, here's what comes next.
V7 also allows you to upload your data via API and CLI SDK.
Head over to the “Neural Networks” tab to pick the model you want to train.
Depending on how you labeled your data, you can choose to train an Instance Segmentation model (polygons), an Object Detector (bounding boxes), or an Image Classifier (tags).
Name your model, and click “Continue.”
Pick your labeled dataset and check whether your class distribution is balanced. Make sure you avoid situations where your classes are either overrepresented or underrepresented. This will hinder your model’s performance.
Next, V7 will show you the split between your training, validation, and test set. It will also calculate the time and cost of this training session.
All you have to do is click “Start training” and voila—
You trained your first computer vision model! You can go ahead and work with it or keep re-training your model to improve its performance.
V7 supports model-assisted labeling where your model can constantly learn on its own and help you annotate new batches of data even 10x faster.
Got questions? Let us know or head over to V7 Academy.
We hope to see you training your models on V7.
💡 Read more:
“Collecting user feedback and using human-in-the-loop methods for quality control are crucial for improving Al models over time and ensuring their reliability and safety. Capturing data on the inputs, outputs, user actions, and corrections can help filter and refine the dataset for fine-tuning and developing secure ML solutions.”