Official library to manage datasets along with V7 Darwin annotation platform.
Darwin-py can both be used from the command line and as a python library.
Main functions are (but not limited to):
- Client authentication
- Listing local and remote datasets
- Create/remove datasets
- Upload/download data to/from remote datasets
- Direct integration with PyTorch dataloaders
Support tested for python 3.8.
pip install darwin-py
You can now type darwin
in your terminal and access the command line interface.
If you wish to use the PyTorch bindings, then you can use the ml
flag to install all the additional requirements
pip install darwin-py[ml]
To run test, first install the test
extra package
pip install darwin-py[test]
Once installed, darwin
is accessible as a command line tool.
A useful way to navigate the CLI usage is through the help command -h/--help
which will
provide additional information for each command available.
To perform remote operations on Darwin you first need to authenticate. This requires a team-specific API-key. If you do not already have a Darwin account, you can contact us and we can set one up for you.
To start the authentication process:
$ darwin authenticate
API key:
Make example-team the default team? [y/N] y
Datasets directory [~/.darwin/datasets]:
Authentication succeeded.
You will be then prompted to enter your API-key, whether you want to set the corresponding team as
default and finally the desired location on the local file system for the datasets of that team.
This process will create a configuration file at ~/.darwin/config.yaml
.
This file will be updated with future authentications for different teams.
Lists a summary of local existing datasets
$ darwin dataset local
NAME IMAGES SYNC_DATE SIZE
mydataset 112025 yesterday 159.2 GB
Lists a summary of remote datasets accessible by the current user.
$ darwin dataset remote
NAME IMAGES PROGRESS
example-team/mydataset 112025 73.0%
To create an empty dataset remotely:
$ darwin dataset create test
Dataset 'test' (example-team/test) has been created.
Access at https://darwin.v7labs.com/datasets/579
The dataset will be created in the team you're authenticated for.
To delete the project on the server:
$ darwin dataset remove test
About to delete example-team/test on darwin.
Do you want to continue? [y/N] y
Uploads data to an existing remote project. It takes the dataset name and a single image (or directory) with images/videos to upload as parameters.
The -e/--exclude
argument allows to indicate file extension/s to be ignored from the data_dir.
e.g.: -e .jpg
For videos, the frame rate extraction rate can be specified by adding --fps <frame_rate>
Supported extensions:
- Video files: [
.mp4
,.bpm
,.mov
formats]. - Image files [
.jpg
,.jpeg
,.png
formats].
$ darwin dataset push test /path/to/folder/with/images
100%|████████████████████████| 2/2 [00:01<00:00, 1.27it/s]
Before a dataset can be downloaded, a release needs to be generated:
$ darwin dataset export test 0.1
Dataset test successfully exported to example-team/test:0.1
This version is immutable, if new images / annotations have been added you will have to create a new release to included them.
To list all available releases
$ darwin dataset releases test
NAME IMAGES CLASSES EXPORT_DATE
example-team/test:0.1 4 0 2019-12-07 11:37:35+00:00
And to finally download a release.
$ darwin dataset pull test:0.1
Dataset example-team/test:0.1 downloaded at /directory/choosen/at/authentication/time.
The framework is designed to be usable as a standalone python library.
Usage can be inferred from looking at the operations performed in darwin/cli_functions.py
.
A minimal example to download a dataset is provided below and a more extensive one can be found in
darwin_demo.py.
from darwin.client import Client
client = Client.local() # use the configuration in ~/.darwin/config.yaml
dataset = client.get_remote_dataset("example-team/test")
dataset.pull() # downloads annotations and images for the latest exported version
Follow this guide for how to integrate darwin datasets directly in PyTorch.