There’s no doubt that machine learning has the power to transform the healthcare industry
The potential applications are wide-ranging and include the entirety of the medical imaging life cycle—from image creation and analysis to diagnosis and outcome prediction.
However, medical professionals are dealing with a multitude of obstacles that are preventing them from successfully implementing AI technology for clinical practice.
In this article, we will address both those problems.
Here’s what we’ll cover:
Let’s get started.
Speed up labeling radiology images 10x. Use V7 to develop AI faster.
Don't start empty-handed. Explore our repository of 500+ open datasets and test-drive V7's tools.
And if you landed here looking to roll up your sleeves and get some hands on experience annotating medical data—look no further!
Have a look at this quick tutorial on labeling MRI and CT images.
Medical image annotation is the process of labeling medical imaging data such as X-Ray, CT, MRI scans, Mammography, or Ultrasound.
It is used to train AI algorithms for medical image analysis and diagnostics, helping doctors save time, make better-informed decisions, and improve patient outcomes.
However, as you’ll soon learn, the process is not as easy as it seems.
Limited access to medical image data is a substantial problem that explains current limitations related to the development of robust machine learning models.
Small sample sizes from small geographic areas and the time-consuming (and costly) process of data preparation create bottlenecks that result in algorithms with limited utility.
Here are a few things to keep in mind when preparing data for medical imaging annotation.
Your dataset needs to be representative with respect to the environment in which the model will be deployed —this will ensure the model’s accuracy.
Using images from multiple diverse datasets (e.g., different imaging machines, different populations, and medical centers) is ideal for lowering the risk of bias. Most commonly, the ratio of the training, validation, and testing data is close to 80:10:10.
After collecting your data and training your model, it’s time to use the validation set to check for overfitting or underfitting, and adjust parameters accordingly.
Finally, the model’s performance is evaluated against a testing set.
Acquiring a quality testing dataset is critical because it functions as the reference standard, and will decide on the further regulatory approval of your trained model.
Although you can still train a relatively reliable model for specific targeted applications using smaller datasets, it is better to collect large sample sizes.
As such, the larger and more diverse your dataset is, the more accurate your model will be.
Large, relevant datasets are especially important when the differences between imaging phenotypes are subtle, or when you collect data on populations with substantial heterogeneity.
To develop generalizable ML algorithms in medical imaging, you need statistically powered data sets with millions of images.
Most medical imaging will be in DICOM format.
What is DICOM?
DICOM (Digital Imaging and Communications in Medicine ) is the standard for the communication and management of medical imaging information and related data. A DICOM file represents a case that may contain one or more images.
From a machine learning perspective, the DICOM file will be converted to another lossless image format during training; therefore, using DICOM files for AI research is not a necessity.
However, preserving the DICOM image’s integrity can be helpful in the data labeling phase, particularly as radiologists are familiar with how DICOM viewers work after operating them for years.
Multi-layer TIF files are also used. These are slices of an image—often from microscopy—and notoriously sparsely supported. Much like other TIF files, the acronym is jokingly referred to as "Thousands of Incompatible Formats.” While V7 supports most TIF files, we routinely encounter new versions to include support for.
Finally, some research will use ultra-high-resolution images that require tiling, such as Leica or Aperio's SVS.
These are often used in pathology. While many viewers support these high-resolution images, which may exceed 100,000 pixels squared and several gigabytes, very few allow you to add any markups or annotations on them that deep learning frameworks can read.
If your ultimate goal is to train machine learning models, there are a few differences between annotating a medical image versus a regular PNG or JPEG.
Here are a few things to consider about medical imaging that do not apply to other vision data.
Let’s explore some of them in more detail.
“Garbage in, garbage out” is a popular machine learning quip that highlights how important the quality of the data is when training ML models.
You need quality data to build clinically meaningful models.
Access to medical imaging data through the picture archiving and communication system (PACS) is restricted to accredited medical professionals, and obtaining all legal documents and permissions is very time-consuming.
Additionally, most healthcare institutions don’t have the proper infrastructure to share large amounts of medical images.
Finally, collected data often requires anonymization (de-identification), which further complicates the whole process.
Image datasets used in a clinical environment need to have an accurate history of who took part in developing which annotation.
Annotation authorship, dataset integrity, and a history of data reviews are required for regulatory approval.
The US FDA and European CE provide guidelines on how datasets should appear when developing models for clinical diagnostics. Working on a platform that already covers those guidelines is a good start.
The other part involves ensuring that the right data processor agreements are in place with whoever will host, process, and perform the annotations.
What we mean by 'medical imaging contains transparencies' is that occlusions must be treated differently
Objects in front of one another may appear behind one another. It's no secret that AI handles occlusion poorly due to a lack of presence of mind, and transparent objects can be even worse.
Luckily, though, an organ, cell or bone appearing transparent is far more obvious to an AI than a pane of glass.
See the chest X-Ray below and decide for yourself—are the lungs behind or in front of the diaphragm?
A chest x-ray displaying the lower portion of the lungs, extending in front of the diaphragm posteriorly and behind it anteriorly
The answer is … both! Traditional computer vision methods cannot perceive the occluded portion of the lungs; however, a deep neural network can easily learn to spot it.
A case may contain 2D or 3D imaging.
In both examples, more than one view is often necessary to assess what's happening. For example, the X-Ray of a hand may only reveal a fracture when the hand is in a certain pose or angle.
Nonetheless, it is standard to capture a frontal view of the hand.
A small fracture at the 3rd and 4th middle phalanx base is mostly only visible on the right image.
It's important not to include un-usable data in your machine learning dataset.
If a view such as the one above is useful for reference purposes but cannot be labeled and turned into training data, it's best to discard it.
Similarly, volumetric data such as MRI, CT, or OCT can be browsed by the sagittal, coronal, or axial planes.
For browsing and reference purposes, these are useful as they give a better sense of anatomy. From a machine learning perspective, unless you wish for a model to process all three planar views, it's best to stick to one and reconstruct those annotations in the other two planes.
It provides more consistent results across cases.
For example, a team of 10 annotator radiologists labeling 100 brain CT cases axially, and another team of 10 labeling another 100 cases sagittally, will achieve slightly differing results. These can introduce bias to your model and lead to both plane modalities performing worse than if the team had consistently applied labels to one series.
HIPAA guidelines are not something to take lightly.
When searching for a platform to process your data, ensure it provides a clear answer to the following:
The six questions above are basic technical compliance requirements.
Below are a few data-access-related ones, which you will want to pay attention to when adding users to your training data platform:
A good reference point to start with for understanding HIPAA requirements is a checklist like this one.
Often the strictness of data security requirements scales with the size of your project, and the fines for infringing HIPAA requirements are high.
If HIPAA compliance is a requirement for your project, it's always recommended to have a professional legal audit of the firm you are working with to ensure everything is in place. The last thing you want is for the company handling all of your data to incur enormous penalties due to a small security oversight.
Radiologists annotate (or markup) medical images on a daily basis.
This can be done in DICOM viewers, which contain basic annotation capabilities such as bounding boxes, arrows, and sometimes polygons.
Machine learning (ML) may sometimes leverage these labels, however, their format is often inconsistent with the needs of ML research, such as lack of instance IDs, attributes, a labeling queue, or the correct formats for deep learning frameworks like Pytorch or TensorFlow.
For example, you can't develop a neural network analyzing pulmonary fibrosis from radiologist DICOM markup. Instead, you will have to carefully label slices using a professional tool.
Here are a few questions you should ask when choosing the medical image annotation software.
It's always good to start to partner up with a company that has already invested the time and effort required to comply with the various data formats, regulatory requirements, and user experience needed for a successful medical AI project.
V7 is one of them.
If you’d like to discuss your medical data annotation project, don’t hesitate to schedule a call or send us an email today.
💡 Read more:
“Collecting user feedback and using human-in-the-loop methods for quality control are crucial for improving Al models over time and ensuring their reliability and safety. Capturing data on the inputs, outputs, user actions, and corrections can help filter and refine the dataset for fine-tuning and developing secure ML solutions.”