Building scalable datasets necessary for high-quality AI products requires stellar data management pipelines.
V7 and Voxel51, two best-in-class AI data platforms, have joined forces to empower their customers to impactfully fine-tune their training data preparation.
This integration enables joint customers to get the most value from training data and maximize annotation efficiency—while decreasing data volumes. Joint users can curate and improve datasets, easily annotate and reannotate data, and transfer it seamlessly between the platforms. All of this will help improve model accuracy, minimize annotation costs, and speed up time-to-market.
Machine learning products are only as good as the data they’re trained on. Particularly in the noisy AI product space, cutting corners can cost companies a competitive advantage. In fact, according to the 2022 IBM Global AI Adoption Index, 24% of businesses cite “too much data complexity” as one of the top barriers to AI adoption.
One of the deceptively simple solutions to boost model accuracy is to increase the number of training samples. However, large datasets can be a double-edged sword. The time and resources needed for data collection and annotation, as well as the high infrastructural demands of storage and computing, can hinder any project—by inflating budgets and extending time-to-production.
That’s why it can be better to train models on smaller, carefully curated datasets. However, the data selection process is not without its challenges. Selecting data of the highest quality, ensuring equal class distribution, and monitoring existing datasets for mistakes are enormous strains on resources—unless you’re supported by appropriate software.
To battle these challenges, V7 and Voxel51 have connected their efforts to empower their customers to build smart, scalable, and high-quality dataset management pipelines.
This integration makes it easy to explore, visualize, and understand datasets, as well as streamline annotation—to improve efficiency, build better-performing models, and maximize the ROI of your training data operations. Now, users can easily curate new datasets, calibrate, correct, and augment existing ones, and leverage active learning workflows.
V7 is a powerful AI data engine enabling better AI products to reach the market faster. Used by enterprise customers worldwide, including Continental, Wanzl, and Boston Scientific, V7’s unique workflows enable faster, and more accurate labeling. Features such as auto-annotation, model visualization, advanced video labeling, bespoke workflow design, intelligent QA, and elite labeling task forces converge to offer a scalable solution that prioritizes impactful AI development.
Voxel51 is the company behind FiftyOne, the leading open-source toolkit for building high-quality datasets and computer vision models. AI teams around the world rely on FiftyOne and FiftyOne Teams to visualize, curate, manage, and QA data, and automate the workflows that support enterprise machine learning.
Together, these two platforms provide customers with cutting-edge solutions primed to deliver top-tier AI products.
FiftyOne by Voxel51 helps users identify the most relevant samples from datasets to send to V7 for annotation. It does this by providing a variety of tools and workflows to:
And more. These workflows enable the creation of diverse, representative data subsets while minimizing data volume—getting you the most out of your annotation budget and boosting model performance.
This integration clears the way for efficiently optimizing and augmenting existing datasets to boost model performance even further.
FiftyOne enables powerful image and object-level annotation review and QA workflows. The platform’s embedding visualization, compatible with off-the-shelf and custom models, can be used to help analyze the quality of the annotations and the dataset—to weed out mistakes and find areas for improvement.
By adding model predictions to a dataset for comparison with ground truth, or using the built-in similarity search features and vector database integrations, you can identify the difficult samples with targeted precision and pinpoint inaccurate annotations. FiftyOne’s sample- and label-level tags, as well as saved views, make it easy to mark samples for reannotation back in V7.
Data can be easily sent back and forth between Voxel51 and V7 via an API—to reduce the time and effort needed for transfers and ensure top data security. The integration will allow for seamless conversion of all data formats, retaining all existing annotations (including labels made in other tools).
V7 supports all data in its native formats—be it images, videos, or medical imagery. Powered with auto-annotation, specialized video labeling, and SAM integration, labeling teams can annotate data faster without sacrificing quality.
Notably, V7 supports DICOM, NIfTI, and WSI imagery, showcasing its commitment to delivering fit-for-purpose infrastructure for industries of all types.
The integration between Voxel51’s FiftyOne and V7 Darwin is provided by darwin_fiftyone.
It enables FiftyOne users to send subsets of their datasets to Darwin for annotation and review. The annotated data can then be imported back into FiftyOne.
Let’s go through a quick rundown of how to set it up.
To start, you need to install FiftyOne, an open-source tool for building high-quality datasets and computer vision models.
Connect FiftyOne with V7’s Darwin to start annotating files. Here’s how to integrate with the Darwin backend:
1. Install the backend
2. Configure FiftyOne to use darwin-fiftyone
Let’s start by loading example data into FiftyOne.
Now let’s load this data into V7 for annotation and refinement.
To illustrate, let's upload all samples from this dataset into a Darwin dataset named "quickstart-example".
If the dataset doesn't already exist in Darwin, it will be created.
After the annotations and reviews are completed in Darwin, you can fetch the updated data as follows:
In addition to the standard arguments provided by dataset.annotate(), we also support:
The data annotated in V7 can now be further reviewed and refined before it’s used for model training.
FiftyOne has a variety of tools to enable easy integration with your model training pipelines. You can easily export your data in common formats like COCO or YOLO suitable for use with most training packages. FiftyOne also provides workflows for popular model training libraries such as PyTorch, PyTorch Lightning Flash, and Tensorflow.
What’s more, with FiftyOne’s new plugins architecture, custom training workflows can be directly integrated into the FiftyOne App and become available at the click of a button. The delegated execution feature lets you process the workflows on dedicated compute nodes.
Once a model is trained, you can easily run inference, load the model predictions back into FiftyOne, and evaluate them against the ground truth annotation.
FiftyOne’s evaluation and filtering capabilities make it easy to spot discrepancies between model predictions and ground truth, including annotation errors or difficult samples where your model underperforms.
Tag annotation mistakes in FiftyOne for reannotation in V7, and add the difficult samples to the next iteration of your training set. Take a snapshot of your dataset and move on to the next round of improvements.
If you’re a FiftyOne Teams customer and work with cloud-backed files, you will be able to load items directly from your cloud storage.
The two steps you need to complete are:
The name will be identical to the name field in the V7 external storage settings.
Once you follow these steps, you are ready to connect your cloud-backed media to FiftyOne Teams and V7 Darwin.
FiftyOne Teams also includes dataset versioning, which means every annotation and model you ran can now be captured and versioned in a history of dataset snapshots. Dataset snapshots in FiftyOne Teams can be created, browsed, linked to, and re-materialized with ease—without complex naming conventions or manual tracking of versions
Voxel51's platform, FiftyOne, supercharges your machine learning workflows by enabling you to visualize datasets and interpret models faster and more effectively. With V7 added to the equation, you’ll streamline the process of annotating data and training more accurate models even further.
We are excited about the continuing collaboration between Voxel51 and V7, and we look forward to exploring new capabilities and adding even more value to the AI product development of V7 and Voxel51’s joint customers.
Do you have any feedback, comments, or questions regarding the integration? Let us know. Our mission is to help you move the needle in the AI space—so we want to offer you the best-in-class solutions.