Data annotation is one of the most important parts of the machine learning pipeline, where the success of such a pipeline depends on the number of annotated samples and the annotation quality.
With labeled data being the only source of information the machine learning model has about our natural environment, it is no surprise that poor annotations quickly lead those models to perform poorly.
Data annotation is often incredibly tedious and time-consuming. In fact, more and more organizations tend to outsource or crowdsource this process.
💡 Pro tip: Looking to get your data annotated by pros? Check out V7 Labeling Services and get in touch with our team.
As an alternative to costly annotation services and software, open source annotation tools that enable easy and fast annotation are often used by researchers and students. In this article, we will talk about “LabelImg”, a lightweight and popular open source annotation tool often used for annotating image data for computer vision tasks like object detection and recognition.
Here’s what we’ll cover:
What is LabelImg?
LabelImg installation guide
Annotating images with LabelImg
Solve any video or image labeling task 10x faster and with 10x less manual work.
LabelImg is an open-source graphical image annotation tool originally developed by TzuTa Lin and maintained by a community of developers in Label Studio. Currently hosted in a GitHub organization named heartexlabs, LabelImg is written in Python and uses Qt for its graphical interface.
As of now, LabelImg offers annotations only in the form of bounding boxes which can be exported to PASCAL VOC, YOLO, and CreateML formats in the form of XML files.
Check out the demo video here:
Features & limitations
If you are looking for a free tool for labeling data for your object detection projects, LabelImg might be just the perfect solution for your needs. It gets the job done.
Although LabelImg makes it possible for users to label data using bounding boxes and to export annotations to multiple forms, like every open-source tool, it comes with several limitations that can slow you down.
Let’s have a look.
LabelImg saves annotations in the form of XML files in PASCAL VOC format and allows storage in multiple formats like YOLO and CreateML. Supporting these formats generally used in object detection pipelines make it a useful tool for annotating data for object detection.
LabelImg is written in Python and uses Qt for its graphical interface, making it a great choice for Linux-based systems, which many annotation software do not support. Furthermore, for Windows, LabelImg provides a standalone application that does not require installation and is just over 13 MB in size.
LabelImg provides hotkeys for fast navigation and annotation of multiple images.
Limitations of LabelImg
Allowing only bounding box annotations, LabelImg strictly limits its usability to annotations for object detection, face detection, and recognition tasks. Even in bounding box annotations, LabelImg’s export support does not include popular export formats like COCO and OpenImages. Tasks like image classification, segmentation, and pose recognition need additional annotation support in the form of image tagging, creation of masks, and keypoint tagging correspondingly. Restriction to rectangular bounding boxes is one of the biggest limitations of LabelImg as an image annotation tool.
LabelImg does not support any form of data augmentation or image manipulation, which limits its use as a tool to create datasets. Lack of augmentation at this stage means that the dataset has to be augmented while training with the help of functions provided by Deep Learning libraries like PyTorch and Tensorflow.
While installation is easy on Ubuntu and not needed on Windows, LabelImg can be very time-consuming to install on systems using Mac OS, with the basic install command prone to failure due to dependencies.
In fact, one of the V7ers tried to install LabelImg on her MacOs but…
She encountered several issues and eventually got back to labeling on V7 ;-)
If you end up giving up on LabelImg, too, our team at V7 would be happy to help you label your data hassle-free ;-)
LabelImg installation guide
Installing LabelImg requires some technical skills, such as using the command line. Here are a couple of options depending on your operating system. You can find detailed installation instructions in LabelImg Github documentation.
You can install it via PyPi (for Python 3.0) (for Linux and Mac users)
For Windows use the standalone application available under their release files.
If the other alternatives fail, you can build it from the source. For building it from the source, you need to manually install Qt5 and then run the file “labelImg.py” as a simple python3 file.
Annotating images with LabelImg
LabelImg has a fairly intuitive user interface for annotating images for object detection. Ready to label some data? Have a look at our quick tutorial.
Go to View and check if Auto Save mode is enabled. Auto Save mode prevents you from losing your annotations mid-way.
Create a directory as our base directory and keep all images for annotation there. Creating a directory enables you to load all images at once and swipe through them as you annotate. Open the directory with LabelImg.
LabelImg will then ask you where to save your annotations. Proceed with the corresponding directory.
Prior to labeling images, you should check if the export file format is correct. The default format is PASCAL VOC. Clicking on the format in the menu bar on the left allows you to change it to YOLO.
To start annotation, click on the Create RectBox button that changes your mouse cursor to that of a crosshair with which you can draw a rectangular bounding box on the image.
After drawing this bounding box, the tool will ask you to provide a label. You can either add a new label or provide one from the predefined list of labels that you can find in the drop-down menu.
Clicking on Create RectBox multiple times allows you to annotate multiple objects in the same image at the same time.
Keep clicking on the next image or use the hotkeys “a” and “d” on your keyboard to navigate through images in your directory and keep annotating them. These annotations will be automatically saved in the previously defined annotation folder.
Best practices for labeling with LabelImg
Below are some of the best practices for labeling your images using LabelImg.
The bounding boxes annotations should be tight in nature. What this means is that the bounding boxes should not include anything that does not contain the corresponding object, and the boxes should be as small as possible to fit the entire object. Care should be taken, however that the entire object is present in the box and some part of the object is not sticking out of the bounding box.
The annotations should be complete in nature. All the objects with their corresponding tags should be annotated so that the machine learning model is not penalized for detecting an unannotated object in the image along with the annotated one.
Objects that are partially blocked in the image should be annotated nevertheless. Annotation should proceed as if the occluded object were in full view. Similarly, objects that are partially absent from the image frame should be annotated nevertheless.
In case the annotation task is outsourced to an organization or freelancers, make sure to always include specific and crystal clear instructions to prevent ambiguity in annotations.
While LabelImg is a great beginner annotation tool, it might not be powerful enough for advanced projects requiring you to label large quantities of data for tasks such as semantic segmentation.
Luckily, LabelImg has various alternatives, including both free and paid options.
Free and open source alternatives to LabelImg include CVAT and LabelMe, which are both easy to use. For researchers and students, V7’s Free Education Plan serves as a great alternative, not only for labeling images for object detection but doing much more with the auto-annotate tool for superfast annotations and the inbuilt training pipeline for training on your annotated data.
While the process of labeling image data is tedious and often requires manual work, quality image annotation is necessary for machine learning models to perform optimally.
As an open-source tool for image annotation, LabelImg provides an opportunity for researchers and students to have hands-on experience annotating datasets and learn the best practices for data annotation themselves without having to bear the costs that come with annotation softwares and services.
While being easy to use and simple in nature, LabelImg lacks the plethora of tools needed for image annotation software to be sufficient for annotating and maintaining large-scale datasets.
Hmrishav Bandyopadhyay studies Electronics and Telecommunication Engineering at Jadavpur University. He previously worked as a researcher at the University of California, Irvine, and Carnegie Mellon Univeristy. His deep learning research revolves around unsupervised image de-warping and segmentation.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.