Computer vision
The Complete Guide to CVAT - Pros & Cons [2024]
11 min read
โ
Feb 9, 2022
What is Computer Vision Annotation Tool (CVAT), how does it work, and what are its pros and cons? Follow this tutorial to label your data on CVAT and train your computer vision model on V7.

Alberto Rizzoli
Co-founder & CEO
Training deep learning models for solving computer vision tasks requires feeding the algorithms with meaningful data they can learn from.
The data needs to be processed, cleaned, andโmost importantly - properly labeled.
Therefore, your first job is to collect and annotate training data that includes examples of the objects you want to train onโand you need a tool to help with that.
This, as youโve probably guessed, brings us to CVAT.
CVAT is one of the most popular free image and video annotation tools that you can use to label your data.
Itโs used by computer vision amateurs and professional data annotation teams alike, and our tutorial will explore the ins and outs of the data annotation process using this tool.

Source: GitHub
Butโ
Thatโs not all!
In the last section, weโll also show you how to train a computer vision model on V7 using your dataset labeled with CVAT. Trust me, itโs much easier and faster than you thinkโstick with us to see for yourself.
Hereโs what weโll cover:
What is CVAT?
Getting started with CVAT
CVAT Auto-Annotation
Pro and cons of CVAT
CVAT use cases
CVAT alternatives
Bonus: How to train your computer vision model?
As youโll learn in a few minutes, CVAT is relatively easy to use and quite flexible, but it does have plenty of limitations, too. Luckilyโ
We now have many more advanced data annotation tools that address those limitations and allow you to annotate your data even 10x faster (no joke!)
And while weโre not here to advertise ourselves and brag about V7โs 5-star reviews and extensive functionalities, we canโt help but let you know that CVAT is not the only option out there.
In fact, hereโs a list of the 13 Best Image Annotation Tools you might want to check.
Now, letโs talk about all-things-CVAT (finally, right?!)
What is CVAT?
CVAT (Computer Vision Annotation Tool) is a popular open-source image & video annotation tool developed by Intel. You can either use it online (with some limitations) or install it on your local machine, which weโll explain in a moment.
CVAT is used for labeling data for solving computer vision tasks such as:
It supports multiple annotation formats like YOLO, Pascal VOC, or MS COCO, to name a few, and if you want to dig deeper, you can check CVATโs source code on Github. Itโs distributed under the MIT license.
Donโt forget to also visit CVATโs Documentation page in case something isnโt clear.
And now itโs finally time to roll up our sleeves and get some hands-on experience using CVAT for data annotation.
If you donโt have a dataset to work with, by the wayโworry not!
You can find and download hundreds of quality datasets from our Open Datasets repository completely free of charge.
Pro tip: Looking for some inspiration? Check out 15+ Top Computer Vision Project Ideas for Beginners
Getting started with CVAT (Tutorial)
Ready to begin?
Great.
There are two ways you can label your data on CVATโeither on the CVAT website (online) or by configuring it on your local machine.
Weโll discuss both options now, starting with the simpler oneโusing CVATโs web-based platform.
Annotating online
To start labeling, create an account on cvat.org. Once youโve done that, log in, and youโll land on this page.

CVAT's dashboard
To start annotating your data, you need to create a new labeling task.
Add the name of your dataset, the labels you want to use, and the attributes (if needed).
Next, upload your raw data either from your computer or the cloud. You can drag and drop your files or use CVATโs command line interface (CLI) if thatโs your preferred method.

CVAT's labeling task configuration page
You can also tweak some of the settings. Advanced configuration allows you to:
Configure your sorting and cache settings
Tweak your image or video quality parameters
Attach a repository to store your annotations
Choose the desired annotation format
Attach an issue tracker
After your task is created, you can find it under the โTasksโ tab, and by then, youโre pretty much ready to start annotating. CVAT will also automatically calculate some parameters (e.g., image quality score) and estimate the time needed to finish your task.

For the sake of this tutorial, weโll annotate our images using bounding boxes in order to create a dataset for training object detectors.
Pro tip: Check out our list of Best Practices for Annotating with Bounding Boxes.
Open the task, choose a โrectangleโ symbol from the menu on the left, pick your label and a drawing method, and draw your first box around the object you want to annotate.
Voila! Youโve just completed your first annotation. The class name and the annotation type will be visible on the right side of the interface for you to see.

Annotating cars with bounding boxes on CVAT
Apart from bounding boxes, you can also annotate images using polygons, polylines, keypoints, cuboids, and tags.
And heyโ
Letโs not forget that CVAT also comes equipped with features such as interpolation (between keyframes for bounding boxes and polygons) and automatic annotation (more about it later).
Here are a couple of best practices to keep in mind when labeling with CVAT:
Remember to always click โSave Workโ as CVAT does not save it automatically
Press โNโ every time you want to create a new annotation
Always ensure pixel-perfect tightness of your labels
Label all objects in each class first
Once youโve annotated all of your data, itโs time to export it!
Head over to the โTasksโ tab, choose the task youโve completed, and click โDump Annotations.โ CVAT allows you to export your annotations in multiple formats, including COCO, Pascal VOC, YOLO, LabelMe, and more.
Pro tip: If you want to import your labeled data to V7 to train your computer vision model, check out V7 Supported Formats to pick the right option.
Remember that CVATโs online platform allows you to add up to 10 tasks per user and upload only 500Mb of data.
Installing CVAT on your local machine
So, labeling on the CVAT web-based platformโpretty easy, right?
Butโ
If you need to annotate large amounts of data without limitations, setting up CVAT locally is inevitable. Worry no, thoughโagain, itโs very easy!
Below is the snippet of the installation instruction of CVAT for Windows 10 that you can find in CVATโs documentation.
Before following this installation guide, make sure to:
Install WSL2
Download and install Docker Desktop for Windows
Download and install Git for Windows
Download and install Google Chrome. Itโs the only browser that CVAT fully supports.

CVAT's installation guide for Windows
Make sure to check the instructions specific to your operating system.
CVAT auto-annotation (Semi-Automatic Image Annotation Tools in CVAT)
Apart from manual labeling tools, CVAT is also optimized for semi-automatic annotation that can help you speed up the process significantlyโeven up to 4x.
Have a look at this video to get a better understanding of how it all works.
Remember: To use CVATโs AI tools, you need the corresponding deep learning models to be available in the modelsโ section.
You can choose from Interactors, Detectors, and Trackers.
Interactors
You can use Interactors to create polygons semi-automatically. Available DL models from this category can be used to label any object. Depending on your use case, youโll have to use regular, positive, or negative points to create a polygon.

Source: GitHub
Here are some models you can use:
Deep extreme cut (DEXTR)
Feature backpropagating refinement scheme (f-BRS)
High Resolution Net (HRNet)
Inside-Outside-Guidance
Pro tip: Check out The Essential Guide to Neural Network Architectures.
Detectors
You can use detectors to annotate one frame automatically. Supported models, such as YOLO-v3, are suitable only for specified labels.

Source: GitHub
Other models:
Mask R-CNN
Faster R-CNN
Trackers
You can use Trackers to annotate your objects with bounding boxes. Similar to Interactors, the available models can be used to annotate any objects. Labeled objects are automatically tracked when you move to the next frame.

Source: GitHub
CVATโs documentation mentions SiamMask as one of the available deep learning models for Object Tracking and Segmentation.
Pro tip: Ready to train your models? Have a look at Mean Average Precision (mAP) Explained: Everything You Need to Know.
Pros & Cons of CVAT
To summarize everything weโve covered so far, letโs have a quick look at some of the CVATโs pros and cons:

CVAT's pros and cons
CVAT use cases
Of course, apart from annotating bounding boxes for object detection, CVAT also allows you to annotate your data for image classification, semantic segmentation, instance segmentation, and object tracking.
Here are some of the use cases (courtesy of Andrey Zhavoronkov from Delta-Course.org)
Object detection
You can draw bounding boxes, polylines, polygons, and keypoints on both image and video data.

Source: Delta-Course.org
Image classification
You can apply tags with attributes, including boolean, choice, and number or text.

Source: Delta-Course.org
Semantic and Instance Segmentation
You can segment the image and manipulate the shapes.

Source: Delta-Course.org
Pro tip: Have a look at our Complete Guide to Panoptic Segmentation [+V7 Tutorial].
If youโre curious to learn more about real-life applications of computer vision and AI, feel free to check out:
CVAT alternatives
As we mentioned at the beginning of this article, CVAT is not the only data annotation tool out there.
Itโs undoubtedly one of the most popular ones, but as youโve probably already figuredโitโs also quite basic.
If youโre serious about labeling large amounts of data and doing it efficiently, you might want to upgrade to a much sleeker and more powerful platform than CVAT.
Andrew Achkar, Technical Director at Miovison, did exactly thatโand switched from Miovisionโs internal tool build on top of CVAT to V7, saying:
โWe chose V7 because we wanted to build new types of workflows. We had our own system, but we wanted it to accomplish additional tasks like creating other annotations types, re-annotations, annotations on videosโactivities that would be a lot of effort in development. V7 met our needs."
However, there are also other open-source options out there. Hereโs a shortlist of the most popular (and free) annotation platforms:
1. LabelMe
LabelMe is a free online annotation tool created by the MIT Computer Science and Artificial Intelligence Laboratory. Labelme supports six different annotation types such as polygon, rectangle, circle, line, point, and line strip.
2. Labelimg
A graphical image annotation tool to label objects using bounding boxes in images written in Python. You can export your annotations as XML files in PASCAL VOC format.
3. VoTT
VoTT (Visual Object Tagging Tool) is a free and open-source image annotation and labeling tool developed by Microsoft.
4. ImgLab
ImgLab is an open-source and web-based image annotation tool. It provides multiple label types such as points, circles, boundary boxes, polygons.
Bonus: How to build a computer vision model?
Finally, letโs train your computer vision model on V7. Get your data labeled on CVAT ready!
To begin, you need to sign up for a 14-day free trial to get access to our tool. And once you are in, here's what comes next.
1. Upload your labeled data

Dataset tab view on V7
V7 also allows you to upload your data via API and CLI SDK.
2. Choose your model
Head over to the โNeural Networksโ tab to pick the model you want to train.
Depending on how you labeled your data, you can choose to train an Instance Segmentation model (polygons), an Object Detector (bounding boxes), or an Image Classifier (tags).

Training a computer vision model on V7
Name your model, and click โContinue.โ
3. Choose your dataset and review class distribution
Pick your labeled dataset and check whether your class distribution is balanced. Make sure you avoid situations where your classes are either overrepresented or underrepresented. This will hinder your modelโs performance.

Class distribution view
Next, V7 will show you the split between your training, validation, and test set. It will also calculate the time and cost of this training session.

Training, validation, and test set split
All you have to do is click โStart trainingโ and voilaโ
You trained your first computer vision model! You can go ahead and work with it or keep re-training your model to improve its performance.
V7 supports model-assisted labeling where your model can constantly learn on its own and help you annotate new batches of data even 10x faster.
Got questions? Let us know or head over to V7 Academy.
We hope to see you training your models on V7.
Good luck!
Read more:
The Beginner's Guide to Self-Supervised Learning
Overfitting vs. Underfitting: What's the Difference?
The Beginner's Guide to Deep Reinforcement Learning
The Complete Guide to Ensemble Learning
A Newbie-Friendly Guide to Transfer Learning
The Essential Guide to Zero-Shot Learning
Supervised vs. Unsupervised Learning: Whatโs the Difference?
9 Reinforcement Learning Real-Life Applications
Mean Average Precision (mAP) Explained: Everything You Need to Know







