We explain how to export datasets, using the V7 UI, command line interface, SDK, and REST API.
In this Darwin Advanced session, we take a close look at Exports. We tackle the four ways to export a dataset you’ve created within Darwin - the V7 user interface, CLI, SDK, or REST API - and provide a step-by-step guide on how to use each of these methods to export your data. Looking for something simpler? Head to our Darwin Fundamentals session on Exports.
V7’s Darwin is built to flex around your needs, which is why you’ll find Exports an intuitive process. When exporting from Darwin, you have a host of supported file formats, including COCO, Darwin JSON, CVAT, Instance PNG, YOLO, and many others.
In this tutorial, we take you through the export process beginning with the V7 UI. This is a simple-to-use interface that allows you to export complete annotations, specifically selected files, or apply filters to export specific data.
Next, we explore the SDK. We discuss the necessity to import your dependencies - as with any script or code you write - and outline how to connect to V7 using an API key. After the SDK, we explore how to use the REST API to export data, making note of dependencies, targeting datasets, and defining parameters for your export version.
By the end of this video, you’ll have a comprehensive understanding of how to export datasets within V7, backed by best practices tips, and tricks to effectively leverage your data for training models.
If you want to export a dataset you have created using V7 for model training, there are four ways of doing so. These are: using the V7 UI, command line interface, SDK, or REST API. Let's go ahead and look at how to use all four of these methods to export any data in the supported formats.
When looking at my “birds” dataset here, we can see that some images are completed and have annotations and some don't. Out of these 79 complete annotations, I have samples of African grey parrots, European robins, and Nightingales. Now to export this dataset, we simply need to click on the export data button in the upper right corner.
We can now create a new export version. We hereby need to specify the version name and can select one annotation format out of the supported ones. We now need to decide whether we want to export the data that actually has complete annotations, only specifically selected files, or all data given a specific filter.
We can of course create multiple different export versions of the same dataset. These can be used for model training as a way of creating training experiments between different versions, or as a form of version control. From here on we can directly download the annotations with one click from the browser or we can copy the command line interface command and download the whole dataset, including images. We now have pulled our dataset into our Darwin datasets directory on our local machine, and it's as simple as that!
Let's now look at how to export data using the SDK.
As with any script or code that you write, the first thing that you want to do is import your dependencies.
Now, in this example, we'll just look at the functionalities of the specific modules when actually using them. So let's go ahead and import our dependencies and we arrive at this specific cell here. Now, we somehow need to connect to V7 and authorize ourselves using our API key. There are multiple different versions, of course, for example, you could directly provide your API key when using the client's module.
In this case, we are using our local authorization. This can be set up very simply using the command line interface. There you will be asked to provide your API key and everything will be stored in a config file on your local machine.
So let's create our client module with our local authorization to have an interface with V7.
Now that we have our client set up, we can now actually get access to our dataset. We're therefore going to use the method called “get remote datasets”. We need to provide our team name or team slug: in this case, Boris Mainatus. That's me.
We also need to provide the dataset that we want to access. We'll again use our “birds” dataset. What we end up with is a dataset manager with which we have access to our remote dataset. We can then use this to then create our specific export version.
We therefore again just specify a release name. In this case, we will call it v1-all-sdk, just to again, differentiate between the different versions and know that this one was created using the SDK.
If we run the cell, nothing will print out, but we'll have created a release name. We can actually verify that by looking into our UI.If we go to our export data and we'll see that we have actually created an export called v1-all-sdk.
Now we have created this export but we haven't actually pulled anything. That is what we'll do in the next cell. In this cell, we'll be using a “while true loop” and always be waiting for 10 seconds for V7 to actually create the export version because depending on the filter complexity or the size of the dataset, it might take a bit on V7's side.
Once everything is set up, we simply can again use our dataset manager and get the release. by providing the specific release name. If we don't provide a release name, it will just take the latest version of this dataset, of the export. In our case, in this specific instance, again, our latest version is our v1-all-sdk version, but you know, you get the point.
Okay.
So now we have our interface to our release and therefore we can actually download it into a zip file and let me just run this cell and we'll see the results. Let's wait for a second for this to happen and we'll see each other again in a bit.
So that went pretty fast and we can also verify that we actually have downloaded our data by going into our directory. We can see here that we have our zip file, we can unzip our data, and here we have a directory with all our annotations. Great.
So now let's look at how we can do pretty much the same, but using the REST API.
So again, let's import our dependencies. As we are working with the REST API, we only need our requests module and also our API key I have stored in a separate file to not share my API key.
Okay, after that is done, we need to define which dataset we actually want to target. For that, we again use the exact same team name or team slug as before, and we'll again be targeting the exact same dataset as before.
Those two parameters will be plugged into the URL that we'll be then requesting from and we'll continue to the next cell. As you can see, the amount of code that we need is really not that much. We again specify our release name, which in this case will be v1-all-REST to specify that we're using the REST API and that we are now using a filter, which was a fun little thing.
Okay, so the header is pretty much always the same. We need to specify which format we are accepting with our responses. We then also define which content type we are working with ourselves. Then we also need to provide the API key for authorization. Now in the payload, we again have a few parameters that we can specify, for example, the format, we are working with, and if we want to include authorship or not. If this is set to true, we'll provide the metadata of every annotator for the respective annotations. We then can also specify whether or not we want to include export tokens. What are export tokens? It’s a pending URL with a specific export token that then allows pretty much any user, whether they have V7 or not, access to the original image. When setting this to true, be aware that your data is not encrypted anymore.
Okay. We now also have, of course, the release name and the filters. In this case, I have just selected the filter item name contains, which just says that the name of the specific image contains this specific sequence of numbers, or just characters. This is really helpful, if in your dataset, your images already follow a specific naming convention. That way you can just directly filter through your images.
That’s pretty much it. Let's just run the cell and look at the export version that we have created. Let's just pull in again our UI, let's refresh and we can see we have our v1-all-REST with the filter applied.
There's only one image in this dataset or in this export version, rather, because only one image actually contains this string that I've provided in the filter. Now, this hasn't really pulled any data yet. It has only created the export version. Now to pull this data, you can use the different versions that we talked about, including directly pulling from the UI with the download button, copying the command line interface command, using the SDK, or using the REST API itself.
So if you use the REST API, we simply want to get a list of all the export versions that we have. Those export versions will include a download URL. So as you can see, we have now received our list of all the different download versions. In this case, the first one it lists out is V1 All.
What we'd want to do is just click on the link or access the link and that will automatically download our zip file as we have done in the SDK approach and that's it. We have pulled our export version.
Actually that was it for the whole video. You now know how to quickly export datasets from V7.
I hope this video helped you out with getting started with V7.