Skip to content

[DE-4880] Update annotation related endpoints for multiple ground truth sets #444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,11 @@ jobs:
pkg-manager: poetry
args: -E metrics -E launch
include-python-in-cache-key: false
- run:
name: Black Formatting Check # Only validation, without re-formatting
command: |
poetry run black --check .
# - run:
# name: Black Formatting Check # Only validation, without re-formatting
# command: |
# poetry show black
# poetry run black --check .
Comment on lines -37 to +41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add this back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see thread: https://scaleapi.slack.com/archives/C05Q7DSPQF9/p1734546043282449, i think we need to up the black versioning, then python dependency version but not sure about how to still support testing for python versions less than 3.8

- run:
name: Ruff Lint Check # See pyproject.toml [tool.ruff]
command: |
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@ All notable changes to the [Nucleus Python Client](https://github.com/scaleapi/n
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.17.8](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.17.7) - 2024-11-05

Comment on lines +8 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date this wrong, and the URL is wrong, please update.

### Added
- Adding `only_most_recent_tasks` parameter for `dataset.scene_and_annotation_generator()` and `dataset.items_and_annotation_generator()` to accommodate for multiple sets of ground truth caused by relabeled tasks. Also returns the task_id in the annotation results.

## [0.17.7](https://github.com/scaleapi/nucleus-python-client/releases/tag/v0.17.7) - 2024-11-05

### Added
Expand Down
22 changes: 22 additions & 0 deletions nucleus/annotation.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
POLYGON_TYPE,
POSITION_KEY,
REFERENCE_ID_KEY,
TASK_ID_KEY,
TAXONOMY_NAME_KEY,
TRACK_REFERENCE_ID_KEY,
TYPE_KEY,
Expand Down Expand Up @@ -158,6 +159,7 @@ class BoxAnnotation(Annotation): # pylint: disable=R0902
metadata: Optional[Dict] = None
embedding_vector: Optional[list] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata if self.metadata else {}
Expand All @@ -178,6 +180,7 @@ def from_json(cls, payload: dict):
metadata=payload.get(METADATA_KEY, {}),
embedding_vector=payload.get(EMBEDDING_VECTOR_KEY, None),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand All @@ -195,6 +198,7 @@ def to_payload(self) -> dict:
METADATA_KEY: self.metadata,
EMBEDDING_VECTOR_KEY: self.embedding_vector,
TRACK_REFERENCE_ID_KEY: self.track_reference_id,
TASK_ID_KEY: self.task_id,
}

def __eq__(self, other):
Expand All @@ -209,6 +213,7 @@ def __eq__(self, other):
and sorted(self.metadata.items()) == sorted(other.metadata.items())
and self.embedding_vector == other.embedding_vector
and self.track_reference_id == other.track_reference_id
and self.task_id == other.task_id
)


Expand Down Expand Up @@ -275,6 +280,7 @@ class LineAnnotation(Annotation):
annotation_id: Optional[str] = None
metadata: Optional[Dict] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata if self.metadata else {}
Expand Down Expand Up @@ -304,6 +310,7 @@ def from_json(cls, payload: dict):
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
metadata=payload.get(METADATA_KEY, {}),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand All @@ -317,6 +324,7 @@ def to_payload(self) -> dict:
ANNOTATION_ID_KEY: self.annotation_id,
METADATA_KEY: self.metadata,
TRACK_REFERENCE_ID_KEY: self.track_reference_id,
TASK_ID_KEY: self.task_id,
}
return payload

Expand Down Expand Up @@ -367,6 +375,7 @@ class PolygonAnnotation(Annotation):
metadata: Optional[Dict] = None
embedding_vector: Optional[list] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata if self.metadata else {}
Expand Down Expand Up @@ -397,6 +406,7 @@ def from_json(cls, payload: dict):
metadata=payload.get(METADATA_KEY, {}),
embedding_vector=payload.get(EMBEDDING_VECTOR_KEY, None),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand All @@ -411,6 +421,7 @@ def to_payload(self) -> dict:
METADATA_KEY: self.metadata,
EMBEDDING_VECTOR_KEY: self.embedding_vector,
TRACK_REFERENCE_ID_KEY: self.track_reference_id,
TASK_ID_KEY: self.task_id,
}
return payload

Expand Down Expand Up @@ -507,6 +518,7 @@ class KeypointsAnnotation(Annotation):
annotation_id: Optional[str] = None
metadata: Optional[Dict] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata or {}
Expand Down Expand Up @@ -559,6 +571,7 @@ def from_json(cls, payload: dict):
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
metadata=payload.get(METADATA_KEY, {}),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand All @@ -574,6 +587,7 @@ def to_payload(self) -> dict:
ANNOTATION_ID_KEY: self.annotation_id,
METADATA_KEY: self.metadata,
TRACK_REFERENCE_ID_KEY: self.track_reference_id,
TASK_ID_KEY: self.task_id,
}
return payload

Expand Down Expand Up @@ -678,6 +692,7 @@ class CuboidAnnotation(Annotation): # pylint: disable=R0902
annotation_id: Optional[str] = None
metadata: Optional[Dict] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata if self.metadata else {}
Expand All @@ -694,6 +709,7 @@ def from_json(cls, payload: dict):
annotation_id=payload.get(ANNOTATION_ID_KEY, None),
metadata=payload.get(METADATA_KEY, {}),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand Down Expand Up @@ -926,6 +942,7 @@ class CategoryAnnotation(Annotation):
taxonomy_name: Optional[str] = None
metadata: Optional[Dict] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata if self.metadata else {}
Expand All @@ -938,6 +955,7 @@ def from_json(cls, payload: dict):
taxonomy_name=payload.get(TAXONOMY_NAME_KEY, None),
metadata=payload.get(METADATA_KEY, {}),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand All @@ -948,6 +966,7 @@ def to_payload(self) -> dict:
REFERENCE_ID_KEY: self.reference_id,
METADATA_KEY: self.metadata,
TRACK_REFERENCE_ID_KEY: self.track_reference_id,
TASK_ID_KEY: self.task_id,
}
if self.taxonomy_name is not None:
payload[TAXONOMY_NAME_KEY] = self.taxonomy_name
Expand All @@ -963,6 +982,7 @@ class MultiCategoryAnnotation(Annotation):
taxonomy_name: Optional[str] = None
metadata: Optional[Dict] = None
track_reference_id: Optional[str] = None
task_id: Optional[str] = None

def __post_init__(self):
self.metadata = self.metadata if self.metadata else {}
Expand All @@ -975,6 +995,7 @@ def from_json(cls, payload: dict):
taxonomy_name=payload.get(TAXONOMY_NAME_KEY, None),
metadata=payload.get(METADATA_KEY, {}),
track_reference_id=payload.get(TRACK_REFERENCE_ID_KEY, None),
task_id=payload.get(TASK_ID_KEY, None),
)

def to_payload(self) -> dict:
Expand All @@ -985,6 +1006,7 @@ def to_payload(self) -> dict:
REFERENCE_ID_KEY: self.reference_id,
METADATA_KEY: self.metadata,
TRACK_REFERENCE_ID_KEY: self.track_reference_id,
TASK_ID_KEY: self.task_id,
}
if self.taxonomy_name is not None:
payload[TAXONOMY_NAME_KEY] = self.taxonomy_name
Expand Down
8 changes: 4 additions & 4 deletions nucleus/annotation_uploader.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,19 +214,19 @@ def fn():

@staticmethod
def check_for_duplicate_ids(annotations: Iterable[Annotation]):
"""Do not allow annotations to have the same (annotation_id, reference_id) tuple"""
"""Do not allow annotations to have the same (annotation_id, reference_id, task_id) tuple"""

# some annotations like CategoryAnnotation do not have annotation_id attribute, and as such, we allow duplicates
tuple_ids = [
(ann.reference_id, ann.annotation_id) # type: ignore
(ann.reference_id, ann.annotation_id, ann.task_id) # type: ignore
for ann in annotations
if hasattr(ann, "annotation_id")
if hasattr(ann, "annotation_id") and hasattr(ann, "task_id")
]
tuple_count = Counter(tuple_ids)
duplicates = {key for key, value in tuple_count.items() if value > 1}
if len(duplicates) > 0:
raise DuplicateIDError(
f"Duplicate annotations with the same (reference_id, annotation_id) properties found.\n"
f"Duplicate annotations with the same (reference_id, annotation_id, task_id) properties found.\n"
f"Duplicates: {duplicates}\n"
f"To fix this, avoid duplicate annotations, or specify a different annotation_id attribute "
f"for the failing items."
Expand Down
1 change: 1 addition & 0 deletions nucleus/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@
SUCCESS_STATUS_CODES = [200, 201, 202]
SLICE_TAGS_KEY = "slice_tags"
TAXONOMY_NAME_KEY = "taxonomy_name"
TASK_ID_KEY = "task_id"
TRACK_REFERENCE_ID_KEY = "track_reference_id"
TRACK_REFERENCE_IDS_KEY = "track_reference_ids"
TRACKS_KEY = "tracks"
Expand Down
7 changes: 6 additions & 1 deletion nucleus/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1450,13 +1450,14 @@ def items_and_annotations(
return convert_export_payload(api_payload[EXPORTED_ROWS])

def scene_and_annotation_generator(
self, slice_id=None, page_size: int = 10
self, slice_id=None, page_size: int = 10, only_most_recent_tasks=True
):
"""Provides a generator of all Scenes and Annotations in the dataset grouped by scene.

Args:
slice_id: Optional slice ID to filter the scenes and annotations.
page_size: Number of scenes to fetch per page. Default is 10.
only_most_recent_tasks: If True, only the annotations corresponding to the most recent task for each item is returned.

Returns:
Generator where each element is a nested dict containing scene and annotation information of the dataset structured as a JSON.
Expand Down Expand Up @@ -1509,6 +1510,7 @@ def scene_and_annotation_generator(
result_key=EXPORT_FOR_TRAINING_KEY,
page_size=page_size,
sliceId=slice_id,
onlyMostRecentTask=only_most_recent_tasks,
)

for data in json_generator:
Expand All @@ -1518,12 +1520,14 @@ def items_and_annotation_generator(
self,
query: Optional[str] = None,
use_mirrored_images: bool = False,
only_most_recent_tasks: bool = True,
) -> Iterable[Dict[str, Union[DatasetItem, Dict[str, List[Annotation]]]]]:
"""Provides a generator of all DatasetItems and Annotations in the dataset.

Args:
query: Structured query compatible with the `Nucleus query language <https://nucleus.scale.com/docs/query-language-reference>`_.
use_mirrored_images: If True, returns the location of the mirrored image hosted in Scale S3. Useful when the original image is no longer available.
only_most_recent_tasks: If True, only the annotations corresponding to the most recent task for each item is returned.

Returns:
Generator where each element is a dict containing the DatasetItem
Expand All @@ -1550,6 +1554,7 @@ def items_and_annotation_generator(
page_size=10000, # max ES page size
query=query,
chip=use_mirrored_images,
onlyMostRecentTask=only_most_recent_tasks,
)
for data in json_generator:
for ia in convert_export_payload([data], has_predictions=False):
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ ignore = ["E501", "E741", "E731", "F401"] # Easy ignore for getting it running

[tool.poetry]
name = "scale-nucleus"
version = "0.17.7"
version = "0.17.8"
description = "The official Python client library for Nucleus, the Data Platform for AI"
license = "MIT"
authors = ["Scale AI Nucleus Team <nucleusapi@scaleapi.com>"]
Expand Down
Loading