Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions docs/scripting/Endpoints.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# GirderClient and DIVE REST Endpoints

DIVE can be interacted with programatically utizing both Girder and DIVE endpoints to upload data/download data modify annotations and run pipelines/training.

## Main DIVE Endpoints

Going the the `{URL}/api/v1` like [viame.kitware.com/api/v1](viame.kitware.com/api/v1) will provie a Swagger description of the all of the Girder endpoints

### dive_dataset/

Operating directly on DIVE datasets for higher level information about the media, attributes and configuration.

#### `dive_dataset/`
- **Method:** GET
- **Usage:** List DIVE datasets in the system that the current user has access to. By default it uses a limit of 50 to prevent listing a large number of datasets

#### `dive_dataset/`
- **Method:** POST
- **Usage:** This endpoint is used to create a new dataset in the system, by generating a clone of an existing DIVE dataset.

#### `dive_dataset/export`
- **Method:** GET
- **Usage:** This endpoint is used to export the entire dataset, including annotations and media files, in a specified format. It will export the data in a zip file. The input parameter of folderIds in an array of DIVE dataset FolderIds

#### `dive_dataset/{id}/configuration`
- **Method:** GET
- **Usage:** Gets the configuration for the DIVE dataset Id and returns a JSON file for downloading that contains the configuration options/

### dive_annotation/

Operating on the Annotations for a specific DIVE Dataset Id. These incloude getting and modification of the DIVE Dataset annotation values.

#### `dive_annotation/track`
- **Method:** GET
- **Usage:** This endpoint is used to get detailed information about specific tracks within a dataset, including their attributes and associated detections. There are options to retrieve the annotations at specific revisions

#### `dive_annotation/revision`
- **Method:** GET
- **Usage:** This endpoint is used to access the list of revisions for annotations. I.E everytime a user modified the annotations through a pipeline or through saving changes

#### `dive_annotation/rollback`
- **Method:** POST
- **Usage:** Rolls back the Annotations to a specific revision version

#### `dive_annotation`
- **Method:** PATCH
- **Usage:** This endpoint is used to modify existing annotations, such as updating track information or adding new attributes.


#### `dive_annotation/export`
- **Method:** GET
- **Description:** Exports annotations for a given dataset.
- **Usage:** This endpoint is used to export annotations in a specified format (e.g., CSV, JSON) for a dataset. This endpint is different from dive_annotation/track because it returns a file rather than direct JSON like dive_annotation/track does.

### dive_rpc/

These are remote procedural calls to run jobs or perform actions that may be a bit longer running than simple request. This is where pipelines and training will be run or the initial transcoding for videos/images can be kicked off.

#### `dive_rpc/postprocess/{id}`
- **Method:** POST
- **Usage:** This endpoint is used to trigger postprocessing tasks on a dataset. It is a requirement that after new data is uploaded this endpoint is called to transcode data and process any uploaded CSV or JSON files to generate attributes and the base annotations. After uploading any data this endpoint should be called with `skipJobs = True` to process the annotation file and update the attributes.

#### `dive_rpc/pipeline`
- **Method:** POST
- **Usage:** This endpoint is used to execute a specified pipeline on a dataset, which can include tasks like object detection, tracking, and classification.

#### `dive_rpc/train`
- **Method:** POST
- **Usage:** This endpoint is used to train a machine learning model using the annotations and media in a dataset, allowing for the creation of custom models for specific tasks.
51 changes: 51 additions & 0 deletions docs/scripting/GirderClient.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Girder Client

Girder Client is a python client library that makes it eaiser to work with Girder/DIVE endpoints.

[Girder Client Documentation](https://girder.readthedocs.io/en/latest/python-client.html#the-python-client-library)

## Initialization

```
apiURL = "viame.kitware.com" # can also use localhost for local development
port = 443 # 8010 for local development
def login():
gc = girder_client.GirderClient(apiURL, port=port, apiRoot="girder/api/v1")
gc.authenticate(interactive=True)
return gc
```

This code snippet creates a login context for girder-client. By interactively in the python script asking for the username and password. If you want to utilize an apiKey you can use the following instead of `interactive=True` asking for the username/password:

```
gc.authenticate(apiKey=apiKeyVar)
```

The apiKey can be accessed by going to the girder-endpoint `/token/current` if you are logged into the system. I.E for viame.kitware.com this would be [viame.kitare.com/api/v1/token/current](https://viame.kitware.com/api/v1#/token/token_currentSession_current)

## Common Girder Client Functions

### [girder_client.get(path, parameters=None, jsonResp=True)](https://girder.readthedocs.io/en/latest/python-client.html#girder_client.GirderClient.get)

Function used to send a simplified GET request to a specific endpoint path like `/dive_dataset`.
The paramemters can be entered as a dictionary of values
The jsonResp defaults to true. If an endpoint returns binary data or data that is not JSON set jsonResp to False.


### [girder_client.sendRestRequest(method, path, parameters=None, data=None, files=None, json=None, headers=None, jsonResp=True)](https://girder.readthedocs.io/en/latest/python-client.html#girder_client.GirderClient.sendRestRequest)

Sends a specific Method: (GET, PATCH, DELETE, POST...) to the path with the parameters and possible files. This allows for more fine grained control of the REST request send to the endpoint

### [girder_client.listFolder(parentId, parentFolderType='folder', name=None, limit=None, offset=None)](https://girder.readthedocs.io/en/latest/python-client.html#girder_client.GirderClient.listFolder)

Given a parentId folder this will list all of the folders based on the other filter parameters. This is useful to list subfolders within a parent folder.

### [girder_client.addMetadataToFolder(folderId, metadata)](https://girder.readthedocs.io/en/latest/python-client.html#girder_client.GirderClient.addMetadataToFolder)

Allows adding metadata to a specific folder. This can be usefule to change attributes specifications for a folder or mark a folder as 'annotate=True' to indicate that it is a DIVE Dataset folder.


### [girder_client.uploadFileToFolder(folderId, filepath, reference=None, mimeType=None, filename=None, progressCallback=None)](https://girder.readthedocs.io/en/latest/python-client.html#girder_client.GirderClient.uploadFileToFolder)

Uploads a specific filder to a parent folder. This would probably be used in conjunction with girder_client.sendRestRequest('POST', 'dive_rpc/postprocess/{id}', parameters= {'skipJobs': True, 'skipTranscoding': True}) where the Id is the folderId. This way you can upload a CSV/JSON annotation file to a folder then call postprocess to process that data

25 changes: 25 additions & 0 deletions docs/scripting/Scripting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Scripting

The data managment system through DIVE and Girder may not be enough for very large datasets or complicated datasets. Scripting the uploading/downloading and processing of data may become necessary. This can be done utilizing the DIVE/Girder Rest Endpoints and a Python pacakge called 'GirderClient' that helps with interfacing these endpoints.





## [Endpoints](Endpoints.md)

The endpoints documentation provides a more comprehensive list of commonly used endpoints when scripting. These are endpoints for uploading/downloading data as well as running postprocess, pipelines and training on DIVE Datasets.

## [GirderClient](GirderClient.md)

The documentation for GirderClient and some introductury scripts for authentication along with some commonly used functions for DIVE Scripting



There are several example scripts provided in the repository including:

* [userCount.py](https://github.com/Kitware/dive/blob/main/samples/scripts/userCount.py) - A script utilized by admins to download information about all of the datasets in the system and determine the ttoal number of users that are avaialable.
* [setAnnotationFPS.py](https://github.com/Kitware/dive/blob/main/samples/scripts/setAnnotationFPS.py) - Sets the annotation FPS on a sample folder by modification of the metdata on the DIVE Dataset Folder
* [uploadScript](https://github.com/Kitware/dive/blob/main/samples/scripts/uploadScript.py) - The process of uploading a new JSON or ViameCSV formatted file to a DIVE Dataset folder and running the `postprocess` endpoint to add these new annotations.
* [syncAnnotationsScript.py](https://github.com/Kitware/dive/blob/main/samples/scripts/syncAnnotationsScript.py) - Script to sync a folder hierarchy of Annotation files with a similar folder structur within DIVE
* [exportAnnotations.py](https://github.com/Kitware/dive/blob/main/samples/scripts/exportAnnotations.py) - Script that takes in a list of base FolderIds, recursively finds the DIVE Datasets under the hierarchy and then downloads all annotations to a folder structure mimicing the girder structure.
4 changes: 4 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,5 +78,9 @@ nav:
- Running with Docker Compose: Deployment-Docker-Compose.md
- Cloud Storage Integration: Deployment-Storage.md
- REST API Specification: https://viame.kitware.com/api/v1
- Scripting:
- Introduction: scripting/Scripting.md
- Endpoints: scripting/Endpoints.md
- GirderClient: scripting/GirderClient.md
- Frequently Asked Questions: FAQ.md
- Support: Support.md
66 changes: 66 additions & 0 deletions samples/scripts/exportAnnotations.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
import os
import click
import girder_client
from datetime import datetime, timedelta, timezone

apiURL = "viame.kitware.com"
port = 443
baseFolderIds = ["66d76d42a71db63d08401aa7"] # Sample folder girder Id
export_format = 'viame_csv' # viame_csv or dive_json


# Login to the girder client, interactive means it will prompt for username and password
def login():
gc = girder_client.GirderClient(apiURL, port=port, apiRoot="girder/api/v1")
gc.authenticate(interactive=True)
return gc


# Function to recursively search for DIVE datasets
def find_dive_datasets(gc, parent_id):
export_folders = []
folders = gc.listFolder(parent_id)
for folder in folders:
if folder['meta'].get('annotate') == True:
export_folders.append({'id': folder['_id'], 'name': folder['name']})
export_folders += find_dive_datasets(gc, folder['_id'])

return export_folders

# Function to export annotations
def export_annotations(gc, datasets):
for folder_info in datasets:
folder_id = folder_info['id']
folder_name = folder_info['name']
folder = gc.get(f'folder/{folder_id}')
folder_name = folder['name']
export_path = f'./exports/{folder_name}'
os.makedirs(export_path, exist_ok=True)

# Export annotations
# Annotations are returning a file so we need the raw data to download
response = gc.sendRestRequest(
"GET",
"dive_annotation/export",
parameters={'folderId': folder_id, 'format': export_format},
jsonResp=False, # Ensure we get raw response for file handling
)

extension = 'csv' if export_format == 'viame_csv' else 'json'
output_file = os.path.join(export_path, f'annotations.{extension}')
with open(output_file, 'wb') as f: # Open the file in binary write mode
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)


@click.command(name="ExportAnnotations", help="Export annotations from DIVE datasets")
def main():
gc = login()
datasets = []
for parent_id in baseFolderIds:
datasets += find_dive_datasets(gc, parent_id)

export_annotations(gc, datasets)

if __name__ == '__main__':
main()
Loading