Auto-Annotate
Annotation Tool for automatically generating polygon and pixel-wise masks.
Try it

Automated Annotation with V7 Darwin

Generating a well-supervised dataset can be very time consuming. Auto-Annotate speeds up your annotation time by up to 80% by using AI models to supervise your training data.

An automated annotation tool that works for all data.

One of V7's research goals is to enable the generalization of vision AI. This means machine learning models should identify parts and objects across multiple domains, including those that look different from their training data, one step closer to the way we humans interpret the world.

When creating Darwin's Auto-Annotate AI, our priority was to enable users to generate masks of whichever object they were interested in, without needing to provide hundreds of training examples. Something V7 engineers paid particular attention to is the pixel-level accuracy of the generated masks, which should be close enough to perfection not to require more than a few seconds of manual correction, if any. The result is a class-agnostic object segmentation neural network that works on any data - from dense histological sample slides to in-the-wild images of busy intersections.

The images above were annotated using the same model. No manual corrections were done, each mask is a first guess.

Auto-Annotate generates a segmentation mask around any object or part of an object in under one second. You can define a region of interest where your object is present, and the model will identify the most salient object or part visible and segment it. If a part of it has been omitted, too much of something was added, you can click on the region to add or subtract and Auto-Annotate will correct its previous prediction by removing it.

This isn't a simpler detection of edges or superpixel approach, but rather a generalized object and part segmentation model that can work at any scale or domain. For example, defining a region around a human nose will annotate only a person's nose, while capturing their face will segment the person's face, leaving out hair, neck, and any surrounding objects of the same color as their face's skin. In medical and scientific imaging, Auto-Annotate can segment most types of cell, organ, or abnormality even if it isn't present in the model's original training data, given a boxed region.

There are also domains where Auto-Annotate does not yet work better than manual approaches, such as capillaries, branches, or other elongated strands, as well as certain elements of "stuff" used in panoptic segmentation such as the sky, ground, or wall surfaces.

Benchmarking - Auto Annotate vs Manual Annotation

We tested Auto-Annotate on a person segmentation task in a crowded scene, and on the instance segmentation of french fries. Each poses a different computer vision challenge for the model:
Human imagery is composed of parts, such as clothing with distinct colors and borders, accessories, and items held. While clothing, skin, and hair should be considered "human" for detection purposes, surrounding items such as backpacks, skateboards, or bicycles should not.
French fries are analogous to most 2D slides or scientific imagery, where multitudes of a similar object are cluttered together on seemingly a flat surface, mostly within the same hue of color. More importantly, Auto-Annotate's training set does not contain french fries. A completed annotation looks like tangled web of sticks:

Benchmarking Setup

Five densely populated (20+ instances) images were annotated by separate people instructed to focus on quality. The ground truth was annotated by a fourth person who was given unlimited time to annotate the images at a pixel perfect level, where possible.
The images were annotated twice, once using auto-annotate, and later using a manual polygon annotation tool. Users could correct the Auto-Annotate output only by clicking to add/subtract content (one of the tools' functionalities) and could not switch to a brush tool.

The results are shown below:

Average Time per Instance - Person Annotation

Person - Manual
Person - Auto
41.7 seconds
13.1 seconds
68% Improvement

Average Time per Instance - French Fries Annotation

French Fry- Manual
French Fry - Auto
23.9 seconds
10.5 seconds
56% Improvement

Person annotation required on average 42 seconds per instance when manually annotated, with some being a full human body, while others a partial one occluded by another in a crowd. Auto-Annotate scored 13.1 seconds with on average 2.7 correction clicks per instance in order to match the accuracy of the manual label.
French fries are twice as fast to annotate manually compared to human figures, mainly because their rectangular nature means you need less points to mask them. Nonetheless, Auto-Annotate still outperforms manual work by 56%. Moreover, the IoU qualitatively improves over the manual ground-truth by sticking to pixel borders better than manual annotations, as viewed in the example below: (Red: Manual, Yellow: A-A).

Overall, if pixel-perfection is not the main priority of a dataset, Auto-Annotate is a superior choice in both speed, and in some cases quality. In cases where pixel perfection is needed, it can nonetheless provide an initial mold with some pixel-perfect edges that can be manually corrected - for this, we recommend using pixel-wise masks rather than Polygons.

You can try Auto-Annotate for free yourself today, it is included in V7's free, academia, and standard plans. The enterprise tier has additional fine-tuning options.

Ready to get started?

Create an account or schedule a demo.