-
Notifications
You must be signed in to change notification settings - Fork 15
[WIP][PLA-1168] Label row pagination demo #957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Unit test report (Python 3.8.18, Pydantic 1.10.22)1 tests - 254 0 ✅ - 255 0s ⏱️ -9s For more details on these errors, see this check. Results for commit 28969ab. ± Comparison against base commit 8e38420. This pull request removes 255 and adds 1 tests. Note that renamed tests count towards both. |
Unit test report (Python 3.8.18, Pydantic 2.10.6)1 tests - 254 0 ✅ - 255 0s ⏱️ -9s For more details on these errors, see this check. Results for commit 28969ab. ± Comparison against base commit 8e38420. This pull request removes 255 and adds 1 tests. Note that renamed tests count towards both. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @sergei-encord, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request serves as a demonstration of how to implement pagination for the list_label_rows_v2 API call. It introduces new client-side pagination logic and refactors the existing method to return an iterable, allowing for more efficient processing of large datasets by fetching results in chunks rather than all at once. This is presented as a proof-of-concept, with a note that the return type change for list_label_rows_v2 is for demo purposes only and would not be applied to existing clients.
Highlights
- Client-Side Pagination API: A new
get_label_row_pagemethod has been added toencord/client.pyto facilitate fetching label row metadata in paginated responses, including support forpage_tokenandlimitparameters. list_label_rows_v2as a Generator: Thelist_label_rows_v2method inencord/project.pyhas been transformed into a generator, yieldingLabelRowV2objects iteratively as pages are fetched, improving memory efficiency for large result sets.- New Paginated Response DTO: A
LabelRowMetadataPagedata transfer object has been introduced inencord/orm/label_row.pyto model the structure of paginated API responses, containing a list of results and anext_page_token. - Flexible Type Handling for LabelRowV2: The
LabelRowV2constructor and internal parsing logic inencord/objects/ontology_labels_impl.pyhave been updated to accept both fullLabelRowMetadataand the newLabelRowMetadataDTO, ensuring compatibility with the paginated response format. - Demonstration-Specific Logging: Debugging
printstatements and a hardcodedpage_sizehave been added inencord/http/v2/api_client.pyandencord/project.pyto illustrate the pagination process during the demo.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a proof-of-concept for paginated fetching of label rows. The changes are centered around a new get_label_row_page method in the client and a new implementation for list_label_rows_v2 in the Project class that uses it to yield results lazily.
The implementation looks like a good starting point for the feature. My main concerns are around leftover debugging code (print statements and timing logic) which should be removed. I've also pointed to a few areas for improvement, such as a hardcoded API limit, commented-out code, and an inaccurate docstring. Given this is a Work-In-Progress pull request, these are likely known, but it's good to flag them for cleanup before this is finalized.
| print("Getting page...") | ||
|
|
||
| start = time.perf_counter() | ||
| params.page_size = 1000 | ||
|
|
||
| page = self.get( | ||
| path, | ||
| params=params, | ||
| result_type=Page[result_type], # type: ignore[valid-type] | ||
| allow_none=allow_none, | ||
| ) | ||
|
|
||
| print(f" page received: {time.perf_counter() - start:.3f}s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block contains debugging code for measuring performance (the print statements and time logic). Please remove it before this PR is merged. Also, the page size is hardcoded, which might be acceptable for a demo, but should be made configurable for the final implementation.
params.page_size = 1000
page = self.get(
path,
params=params,
result_type=Page[result_type], # type: ignore[valid-type]
allow_none=allow_none,
)| print( | ||
| f"Got page of {page_items_count} in {time.perf_counter() - time_start:.2f} s . Overall got: {total_count}" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "include_workflow_graph_node": include_workflow_graph_node, | ||
| "include_all_label_branches": include_all_label_branches, | ||
| "branch_name": branch_name, | ||
| "limit": 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| page_token: Optional[str] = None, | ||
| limit: Optional[int] = 1000, | ||
| ) -> LabelRowMetadataPage: | ||
| """This function is documented in :meth:`encord.project.Project.list_label_rows`.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring for get_label_row_page seems to be copied from another method and is not accurate. This new method should have its own docstring that explains its purpose and documents the pagination-specific parameters like page_token and limit.
| """This function is documented in :meth:`encord.project.Project.list_label_rows`.""" | |
| """Get a page of label row metadata. This is the paginated version for listing label rows.""" |
| # label_rows = [ | ||
| # LabelRowV2(label_row_metadata, self._client, self._ontology) for label_row_metadata in label_row_metadatas | ||
| # ] | ||
| # return label_rows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SDK integration test report284 tests ±0 249 ✅ - 31 20m 25s ⏱️ -14s For more details on these failures, see this check. Results for commit 28969ab. ± Comparison against base commit 8e38420. |
Showing the possible implementation of paginated label rows call
We definitely don't want to switch returning type on list_label_rows_v2, as it would ruin the existing client though. So this pr just doing it for the sake of the demo.