Creating datasets for diagnostic usage requires uncommon file support, and a series of features required to maintain accountability of annotations.
The first step towards generating a dataset of medical images is being able to import the original filetype or a close representation. The size of the original image may exceed 50,000 by 50,000 pixels and be composed of multiple files stitched together. Often the easiest way of supporting certain files is to convert them to common formats such as .png and display them for annotation. This allows Darwin to preserve the original file resolution, but it will also display image compression during annotation. Sometimes this does not present a problem - most image compression has no documented effects on annotation quality, however it may affect regulatory approvals.
The color intensity for certain instrumentation sensors may sometimes exceed 256 units, and its range may go beyond RGB or greyscale. In the majority of cases color intensities outside of the ordinary are supported by Darwin and normalized for display on a regular screen. If a viewer for a filetype requires a slider to switch to different wavelengths of light, Darwin will not be able to support it without dedicated support form our team. Other image adjustments used in medical imaging viewers, such as contrast or brightness changes, are instead available.
Darwin supports certain non-proprietary medical imaging file formats. If you need to annotate a dataset of uncommon filetypes that requires a specialized viewer, we may be able to convert it to a similar equivalent, or develop a specific functionality to support it.
Image datasets used in clinical diagnosi need to have an accurate history of who took part in developing which annotation. This form of accountability is mostly integrated within Darwin available on all spatial annotation types (anything but tags) on all image formats.
Auto-Annotate generates a segmentation mask around any object or part of an object in under one second. You can define a region of interest where your object is present, and the model will identify the most salient object or part visible and segment it. If a part of it has been omitted, too much of something was added, you can click on the region to add or subtract and Auto-Annotate will correct its previous prediction by removing it.
This isn't a simpler detection of edges or superpixel approach, but rather a generalized object and part segmentation model that can work at any scale or domain. For example, defining a region around a human nose will annotate only a person's nose, while capturing their face will segment the person's face, leaving out hair, neck, and any surrounding objects of the same color as their face's skin. In medical and scientific imaging, Auto-Annotate can segment most types of cell, organ, or abnormality even if it isn't present in the model's original training data, given a boxed region.
There are also domains where Auto-Annotate does not yet work better than manual approaches, such as capillaries, branches, or other elongated strands, as well as certain elements of "stuff" used in panoptic segmentation such as the sky, ground, or wall surfaces.
We tested Auto-Annotate on a person segmentation task in a crowded scene, and on the instance segmentation of french fries. Each poses a different computer vision challenge for the model:
Human imagery is composed of parts, such as clothing with distinct colors and borders, accessories, and items held. While clothing, skin, and hair should be considered "human" for detection purposes, surrounding items such as backpacks, skateboards, or bicycles should not.
French fries are analogous to microscopy slides, where multitudes of a similar object are cluttered together on seemingly a flat surface, mostly within the same hue of color.
Five densely populated (20+ instances) images were annotated by three separate people instructed to focus on quality. The ground truth was annotated by a fourth person who was given unlimited time to annotate the images at a pixel perfect level, where possible.
The images were annotated twice, once using auto-annotate, and later using a manual polygon annotation tool. Users could correct the Auto-Annotate output by either clicking to add/subtract content (one of the tools' functionalities) or by switching to a brush tool.
The results are shown below: