We are proud to announce the partnership of V7 and Databricks!
V7’s integration into the Databricks platform will allow customers to transform their unstructured data into training data and ground truth quickly and accurately. V7 fills a relevant and growing need for Databricks’ customers, focusing on the Healthcare and Industrials verticals and unique video and workflow management features.
The collaboration will enable customers to create a dataset in Databricks, annotate the data in V7 and then load the annotations back into Databricks for easy querying and model training. V7 will also support pre-annotations performed in Databricks, which you’ll be able to push into the platform. In the future, you’ll also be able to connect Databricks models to V7.
Combining the powers of both tools will enable customers to better manage machine learning workflows and their underlying data—at the scale required to build better AI products.
Preparing large amounts of data for big-scale machine learning projects is often challenging. Paired with concerns about security, tiresome data transfers, and annotation efforts, it can significantly slow product delivery time.
This partnership aims to solve these problems, helping joint customers manage data securely and efficiently, and build powerful machine learning workflows—ensuring maximum quality of training data and significantly speeding up time-to-market.
V7 is the AI data engine enabling better AI products to reach the market faster, with reduced costs and less risk. Its key features include auto-labeling, dataset management, customizable workflows, and the highest security standards.
Used by enterprise customers worldwide, V7’s unique workflows enable 10x faster labeling. The platform balances automation with human input in a beautiful and intuitive UX, allowing customers to transform raw data into value and innovate with speed and accuracy.
Databricks is a data and AI company. The Databricks Lakehouse Platform combines the best elements of data lakes and data warehouses to deliver the reliability, strong governance, and performance of data warehouses with the openness, flexibility, and machine learning support of data lakes.
Built on an open lakehouse architecture, AI and Machine Learning on Databricks empowers ML teams to prepare and process data, streamlines cross-team collaboration and standardizes the full ML lifecycle from experimentation to production, including for generative AI and large language models.
Together, these two powerful and fast-growing players in the data and machine learning world deliver fresh capabilities to their customers. Here’s what joint customers can gain:
Take a look at the quick video below introducing the integration:
And here’s a reference architecture diagram that showcases how the interaction works:
The partnership enables the joint users to seamlessly fit both V7 and Databricks together and into their wider technical stack.
The customers can utilize Databricks’ Delta Lake features, such as ACID transactions and Parquet for columnar storage. They can then effortlessly and securely send this data to and from V7.
It’s possible thanks to our darwinpyspark library—a wrapper around the V7 API, which lets its users:
Let’s go through a quick rundown on how to set it up.
This framework is designed to be used alongside Python SDK. You can see examples of darwin-py in the V7 docs here.
To get started with DarwinPyspark, you'll first need to create a DarwinPyspark instance with your V7 API key, team slug, and dataset slug:
To upload a PySpark DataFrame to V7, use the upload_items method:
The upload_items method takes a PySpark DataFrame with columns 'object_url' (accessible open or pre-signed URL for the image) and 'file_name' (the name you want the file to be listed as in V7).
V7 lets you annotate files such as images, videos, or DICOM, train and test models, set up custom ML workflows, and more.
If you’re new to the tool, take a look at this quick rundown:
Or jump to the documentation to see how V7 can help your specific use case! You can also check the github repo or our pypi page for more information.
To download data from V7 as a PySpark DataFrame, use the download_export method:
We are delighted to have this partnership live, and we look forward to exploring adding value to your AI projects.
Do you have any feedback, comments, or questions regarding the integration? Let us know. We’d like to hear from you and we promise to respond quickly to any cool idea!
Keep an eye out for upcoming webinars and materials by following V7 on LinkedIn.
We look forward to helping you transform your data into software. Connect with us on LinkedIn or book a chat using the links below: