Skip to content

Conversation

@sergei-encord
Copy link
Contributor

Showing the possible implementation of paginated label rows call
We definitely don't want to switch returning type on list_label_rows_v2, as it would ruin the existing client though. So this pr just doing it for the sake of the demo.

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

Unit test report (Python 3.8.18, Pydantic 1.10.22)

1 tests   - 254   0 ✅  - 255   0s ⏱️ -9s
1 suites ±  0   0 💤 ±  0 
1 files   ±  0   0 ❌ ±  0   1 🔥 +1 

For more details on these errors, see this check.

Results for commit 28969ab. ± Comparison against base commit 8e38420.

This pull request removes 255 and adds 1 tests. Note that renamed tests count towards both.
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[-12]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[-3]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[-7]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[0]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[11]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[2]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[5]
tests.common.test_datetime_parser ‑ test_parse_datetime_custom_string
tests.common.test_datetime_parser ‑ test_parse_datetime_long_string[GMT]
tests.common.test_datetime_parser ‑ test_parse_datetime_long_string[UTC]
…
Unknown test

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

Unit test report (Python 3.8.18, Pydantic 2.10.6)

1 tests   - 254   0 ✅  - 255   0s ⏱️ -9s
1 suites ±  0   0 💤 ±  0 
1 files   ±  0   0 ❌ ±  0   1 🔥 +1 

For more details on these errors, see this check.

Results for commit 28969ab. ± Comparison against base commit 8e38420.

This pull request removes 255 and adds 1 tests. Note that renamed tests count towards both.
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[-12]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[-3]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[-7]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[0]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[11]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[2]
tests.common.test_datetime_parser ‑ test_format_datetime_to_long_string[5]
tests.common.test_datetime_parser ‑ test_parse_datetime_custom_string
tests.common.test_datetime_parser ‑ test_parse_datetime_long_string[GMT]
tests.common.test_datetime_parser ‑ test_parse_datetime_long_string[UTC]
…
Unknown test

@sergei-encord sergei-encord changed the title [WIP] Label row pagination demo [WIP][PLA-1168] Label row pagination demo Aug 4, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @sergei-encord, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request serves as a demonstration of how to implement pagination for the list_label_rows_v2 API call. It introduces new client-side pagination logic and refactors the existing method to return an iterable, allowing for more efficient processing of large datasets by fetching results in chunks rather than all at once. This is presented as a proof-of-concept, with a note that the return type change for list_label_rows_v2 is for demo purposes only and would not be applied to existing clients.

Highlights

  • Client-Side Pagination API: A new get_label_row_page method has been added to encord/client.py to facilitate fetching label row metadata in paginated responses, including support for page_token and limit parameters.
  • list_label_rows_v2 as a Generator: The list_label_rows_v2 method in encord/project.py has been transformed into a generator, yielding LabelRowV2 objects iteratively as pages are fetched, improving memory efficiency for large result sets.
  • New Paginated Response DTO: A LabelRowMetadataPage data transfer object has been introduced in encord/orm/label_row.py to model the structure of paginated API responses, containing a list of results and a next_page_token.
  • Flexible Type Handling for LabelRowV2: The LabelRowV2 constructor and internal parsing logic in encord/objects/ontology_labels_impl.py have been updated to accept both full LabelRowMetadata and the new LabelRowMetadataDTO, ensuring compatibility with the paginated response format.
  • Demonstration-Specific Logging: Debugging print statements and a hardcoded page_size have been added in encord/http/v2/api_client.py and encord/project.py to illustrate the pagination process during the demo.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a proof-of-concept for paginated fetching of label rows. The changes are centered around a new get_label_row_page method in the client and a new implementation for list_label_rows_v2 in the Project class that uses it to yield results lazily.

The implementation looks like a good starting point for the feature. My main concerns are around leftover debugging code (print statements and timing logic) which should be removed. I've also pointed to a few areas for improvement, such as a hardcoded API limit, commented-out code, and an inaccurate docstring. Given this is a Work-In-Progress pull request, these are likely known, but it's good to flag them for cleanup before this is finalized.

Comment on lines +89 to +100
print("Getting page...")

start = time.perf_counter()
params.page_size = 1000

page = self.get(
path,
params=params,
result_type=Page[result_type], # type: ignore[valid-type]
allow_none=allow_none,
)

print(f" page received: {time.perf_counter() - start:.3f}s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block contains debugging code for measuring performance (the print statements and time logic). Please remove it before this PR is merged. Also, the page size is hardcoded, which might be acceptable for a demo, but should be made configurable for the final implementation.

            params.page_size = 1000

            page = self.get(
                path,
                params=params,
                result_type=Page[result_type],  # type: ignore[valid-type]
                allow_none=allow_none,
            )

Comment on lines +282 to +284
print(
f"Got page of {page_items_count} in {time.perf_counter() - time_start:.2f} s . Overall got: {total_count}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This print statement and the associated timing logic appear to be for debugging. Please remove them before merging.

"include_workflow_graph_node": include_workflow_graph_node,
"include_all_label_branches": include_all_label_branches,
"branch_name": branch_name,
"limit": 1000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A hardcoded limit of 1000 has been added to list_label_rows. This method doesn't appear to support pagination, so this will cap the results silently. If a limit is needed, it should be exposed as a parameter to the method to avoid unexpected behavior for users of this function.

page_token: Optional[str] = None,
limit: Optional[int] = 1000,
) -> LabelRowMetadataPage:
"""This function is documented in :meth:`encord.project.Project.list_label_rows`."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring for get_label_row_page seems to be copied from another method and is not accurate. This new method should have its own docstring that explains its purpose and documents the pagination-specific parameters like page_token and limit.

Suggested change
"""This function is documented in :meth:`encord.project.Project.list_label_rows`."""
"""Get a page of label row metadata. This is the paginated version for listing label rows."""

Comment on lines +293 to +296
# label_rows = [
# LabelRowV2(label_row_metadata, self._client, self._ontology) for label_row_metadata in label_row_metadatas
# ]
# return label_rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out code should be removed. Version control can be used to see the previous implementation if needed.

@github-actions
Copy link

github-actions bot commented Aug 4, 2025

SDK integration test report

284 tests  ±0   249 ✅  - 31   20m 25s ⏱️ -14s
  1 suites ±0     4 💤 ± 0 
  1 files   ±0    31 ❌ +31 

For more details on these failures, see this check.

Results for commit 28969ab. ± Comparison against base commit 8e38420.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants