In the process of building machine learning models, creating high-quality training data is essential for achieving accurate results. However, reviewing every data point in a dataset manually can be impractical and time-consuming, particularly when dealing with large data volumes. To address this, V7 introduced the Sampling Stage, which enables users to incorporate efficient quality assurance into their workflows while minimizing manual review time.
In this article, we will explore what the sampling stage is, its benefits, use cases and how how you can set it up on V7.
The core functionality of the Sampling Stage enables you to send a randomly selected sample of items to the next stage in a workflow. You can set a sampling threshold, expressed as a percentage figure.
To give an example - by using the sampling stage, users can route only a small subset (e.g. 10% or 15%) of their data for review to ensure that they build in the QA, but without wasting time painstakingly checking their entire dataset or every item that has a given class or tag.
The sampling stage feature provides a variety of advantages for ML teams and data scientists who work with large datasets. Here are some key benefits of sampling your data with this functionality:
The sampling stage feature provides users with an efficient way to conduct quality assurance (QA) on large datasets. Instead of manually reviewing every item in the dataset, the sampling stage enables users to select a small subset of data for QA. This feature is particularly valuable for teams working with specialists such as radiologists, who are paid on an hourly basis. By using the sampling stage, AI teams can reduce the costs associated with manual review by selecting only the amount of data that needs to be reviewed by specialists.
The sampling stage is also a valuable feature for auditing AI models. By combining it with the model and logic stages, users can review model performance and identify areas of underperformance. You can take a look at the example in the use cases section below.
With the V7 sampling stage feature, users can easily conduct flexible quality assurance on specific data classes. By combining the logic stage with a sampling stage and setting the condition to route only a particular type of annotation for review, the user can set up a QA system for a subset of a given class/tag. This makes it easy to review only the necessary data and minimize the time and effort required for manual review.
The V7 sampling stage feature provides a valuable method for measuring the consistency of annotations created by individual labelers. By combining it with the consensus stage, users can send a subset of data (e.g. 90%) to be annotated by a labeler, and the remaining 10% to be labeled independently by other labelers using the consensus stage, which helps analyze inter-reader variability. In this way, users can review whether the annotations created by a given annotator are consistent throughout the time period and provide feedback when needed.
Setting up a Sampling Stage on V7 is easy and extremely intuitive. Here’s a step-by-step tutorial to help you navigate this process seamlessly.
1. Add and connect a Sampling Stage to your workflow in the workflows view.
2. Specify the sampling threshold to decide what percentage of data will be routed to next stages of the workflows.
Note: The algorithm behind the sampling is based on a random number generation that determines if an item belong to A or B. This means that the split between Sample A and Sample B won’t be exactly precise. For instance, if you set up a 20% / 80% ratio and then send through 100 items, it might produce a split like 83/17. As the size of the data grows, so does the precision of the sampling stage.
However, you’ll only see very small divergences from your chosen ratio.
3. Connect your Sampling Stage (or mutliple ones!) with other workflows stages and voilà! You are ready to start annotating.
Let’s take a look at a couple of real-life use cases of the sampling stage.
As an example of the sampling stage's utility, consider a case where a medical team must annotate and review large volumes of data, such as pathology or radiology images. Hiring medical specialists to manually review all data items is expensive and time-consuming.
The workflow below demonstrates how the sampling stage can streamline the QA process by routing only a small subset of data for manual review by the Board of Certified Radiologists. With annotators (Residents) who are well-trained in the labeling task, 90% of annotated data can be sent straight to the Complete stage. The remaining 10% will be manually reviewed, thus saving time and reducing costs associated with manual data review of each file. This approach ensures that the critical data is reviewed by the specialist team, while minimizing the time and effort required for manual review.
In this example, let’s look at how you can set up a sampling stage to measure the consistency of annotations throughout the time period.
By setting up a sampling stage and adding the consensus stage to the workflow, users can route 80% of their data to be annotated by a trusted labeler and 20% of data to be labeled independently in the consensus stage by two additional annotators.
This approach allows for continual monitoring of the consistency of annotations from each annotator and ensures that the accuracy of annotations is not compromised (e.g. due to annotator fatigue) as new data is added to the project.
Here’s another version of how you might want to set up your workflow with two sampling stages to measure the consistency of annotations for not one, but two annotators.
Another powerful use case for the V7 sampling stage feature is auditing the performance of AI models and identifying errors that can help improve their accuracy. In the example below, the Sampling Stage is used to route sample A to the Logic Stage, where annotations with the bounding box class “Nodule” are reviewed by a Senior Reviewer. The remaining data is sent to a regular Review Stage.
By leveraging the sampling stage in combination with the Logic Stage, users can easily identify areas where an AI model may be underperforming for specific classes or types of annotations. If underperformance is identified for a particular class, users can improve the model's accuracy by feeding it more relevant data and re-training it.
To take this further, consider a case where at the beginning you might want to review 100% of all nodules to identify where you need to improve your training data, but as you gain confidence in your model you may decrease this over time until you are only reviewing 10% of these annotations.
Finally, you can leverage the sampling stage to for a granular QA of a chosen class in your dataset. This is particularly useful when dealing with rare, difficult, or subjective classes that require specialist knowledge for accurate labeling.
In the example below, we used the Logic stage to route annotations with classes “Nodule” and “Pneumonia” to the Sampling Stage and then 50% of them are sent to the Senior Reviewer stage to ensure that they are labeled correctly. This approach enables users to set up an efficient and flexible QA system that ensures high-quality annotations of difficult or specialized classes within the dataset.