There are plenty of image annotation platforms out there, and a bounding box tool seems like a simple enough functionality.
But here’s the thing—
The accuracy and quality of your bounding boxes define your model performance, and you may need millions of these to build the most accurate model to market within your use case.
Have you taken the time to consider every feature that will help you achieve this?
We spoke to hundreds of teams labeling data with bounding boxes and listed (plus implemented at V7) features that we believe every bounding box annotation tool should offer:
But before exploring each feature more in-depth, let’s quickly discuss bounding box annotation basics.
Here’s how you can perform bounding box annotations using V7.
Bounding box annotations contain the coordinates with information about where the object is located in the image or the video. They are suitable for uniformly shaped objects, low-compute projects, and objects that don’t overlap.
V7 allows you to draw bounding boxes with pixel-perfect precision, add attributes, copy-paste boxes of similar size, interpolate them in the video, and easily convert polygon masks to bounding boxes.
The first thing to look out for is your class structure.
Making a box is easy, but—
How is that data stored?
Classes are the names of objects in a dataset. If you’re building a service to detect dents and scratches, you will want to make sure these two entries can be reused in new projects, or branched out hierarchically as your data grows.
Here are a few must-have functionalities:
Below is the class creation experience on V7.
We kept our design language consistent and added rich info tooltips to inform users of what each functionality does because we understand that not everyone is familiar with computer vision terminology.
Prior to building V7, we tested several bounding box tools in the market and found that most didn’t prioritize interaction design.
Placing and editing millions of bounding boxes requires a very smooth user experience.
Here are the things to look out for:
V7 supports videos and a number of series-like data like volumetric MRI or CT scans or time-lapses.
All of these allow you to interpolate boxes throughout a sequence smoothly.
We spent six months prototyping our video annotation features to ensure a seamless video labeling experience. We wanted an experience that required minimal tweaks on the timeline, automatically generating keyframes where you can edit boxes manually or using models.
We’ve also separated position keyframes with attribute keyframes, allowing bounding boxes to gain or lose attributes or other sub-annotations throughout the video as part of the same instance.
Here are a few things to look out for:
💡 Note: Tracking in image annotation isn’t as good as it may sound initially. Trackers focus on individual features (usually at the center of an object) while bounding boxes rely on the edges of an object being pixel perfect. Therefore, they can create more work than necessary and you might need to adjust box edges for each frame.
Most importantly: Is the video system frame accurate?
HTML5 video players aren’t, they can have errors of up to 200ms. Most video labeling tools rely on browser-based video players, resulting in the exported box timecode not matching with the original files. This can happen at any point throughout a video and is most prevalent in CCTV.
Got a few similar objects to annotate using bounding boxes?
Copy-pasting your boxes can be very handy at speeding up your annotation process. It also ensures that your annotations are consistent for the same objects located in different areas of an image or a video.
Are hotkeys a priority in your annotation tool?
Your labeling team should attempt to turn everyone into a power user. Keyboard shortcuts are a good way to get more training data and less fatigue (which leads to some of the hardest training data errors to spot).
Shortcuts to consider are switching classes, cycling between boxes, or points in a box. V7 also offers keyboard shortcuts for moving annotations and moving individual points of boxes.
Some projects might require you to copy all your annotations from one image to another. It often happens when your dataset images are sequential.
We added a button on V7 to carry over annotations from one image to another.
Here are things to look out for in power user shortcuts:
We’ve added a handy list of shortcuts on every page and append each next to a button to encourage learning them while using the platform with a mouse.
Attributes are simply annotation tags that can define the specific features of a given object.
Many object detection projects require labelers to add label attributes on top of the bounding box annotation—it helps describe a given object in greater detail.
For example, it’s common to add label attributes such as occluded, truncated, and crowded, indicating that annotated objects are in close relationship with other objects in the image.
V7 allows you to add attributes to your bounding box annotations. You can annotate an image and add as many tags as you need to describe an object on a much more granular level.
We also included the ability to add other sub-annotation types, such as free text, directional vector, a custom ID (used in object re-identification, multi-camera setups, or other edge cases), and there are many more to come.
Here are things to watch out for in sub annotations:
💡 Note: A few annotation tools have dataset management capabilities, which means that if you make a change to an attribute or class name after creating it, you might have to go and propagate the change to every annotation file using a script to avoid breaking changes.
This can be incredibly frustrating, so it's always best to invest in a dataset management solution before you start any labeling project.
Drawing and editing boxes are one part of the challenge.
How easily can you see them and the image below them?
V7 was built to have every annotation with an editable Z-value, You simply have to drag an annotation to reorder it.
The same can be done in the video timeline, with an option to automatically adjust this order to save vertical space.
This one is especially useful when you have hundreds of annotations—such as in sports analytics.
You can also adjust the box opacity, border opacity, and visual features of the image.
V7 also has windowing and color map options, which allow you to see elements of the image not visible by the naked eye in regular RGB monitors.
In the example above, this x-ray has over 6,000 units of greyscale color per pixel, whilst our monitors can only display 255.
Make sure that your tool is tested for performance when hundreds of bounding boxes enter the scene. This is especially important in videos where annotations must be kept in memory to ensure smooth playback.
At V7 we established our maximum at 10,000 per image. It’s the highest in the market by an order of magnitude. The same goes for polygons and other annotation types.
The GIF below shows over 500 polygons with over 50 coordinate points each.
You can technically add more than 10,000 annotations per frame on V7 but will start seeing performance issues unless your machine is top-notch.
Here are things to consider:
Some annotation formats such as COCO expect a bounding box to be around each polygon. Models like Mask R-CNN also benefit from this detector/segmenter approach.
We give you this option out of the box.
Since it’s easy to make polygons on V7 using Auto-Annotate, you can export these as bounding boxes.
Moreover, you won’t have to make a box “around” a polygon, you can simply draw a polygon and use its “free” surrounding box to train a detector.
Ultimately, nothing can be more dangerous than a tool you commit to and encounter breaking bugs in its API halfway through your project.
Here are the most common bugs, or feature failures that we’ve encountered across image annotation tools, in order of frequency:
These are all issues we’ve heard of at least once from customers who were switching from their internal tools or other labeling platforms.