Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] Paginate metadata records in controller #914

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

ian-shim
Copy link
Contributor

Why are these changes needed?

Instead of querying all available metadata to process, limit the number of metadata to process per iteration and keep the cursor for the next iteration.

Checks

  • I've made sure the tests are passing. Note that there might be a few flaky tests, in that case, please comment that they are not relevant.
  • I've checked the new test coverage and the coverage percentage didn't drop.
  • Testing Strategy
    • Unit tests
    • Integration tests
    • This PR is not tested :(

@ian-shim ian-shim marked this pull request as ready for review November 20, 2024 05:38
@ian-shim ian-shim force-pushed the dispatcher-paginated branch from 5cff769 to d45c2a2 Compare November 20, 2024 06:44
Comment on lines 53 to 56
type StatusIndexCursor struct {
PK string
UpdatedAt uint64
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low priority ask, could you add a short godoc for PK, it's not obvious at first glance what it means.

Copy link
Contributor Author

@ian-shim ian-shim Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, you made me realize PK is probably not the right field in the exported cursor. Will change it to BlobKey

@ian-shim ian-shim requested a review from anupsv November 20, 2024 18:39
@ian-shim ian-shim force-pushed the dispatcher-paginated branch from d45c2a2 to d7c6f25 Compare November 20, 2024 21:14
// even when there are no more results or there are no results at all.
// This cursor can be used to get new set of results when they become available.
// Therefore, it's possible to get an empty result from a request with exclusive start key returned from previous response.
func (s *BlobMetadataStore) GetBlobMetadataByStatusPaginated(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about actually build a cursor that streams the data?
The StatusIndexCursor is really a position or offset; and an actual Cursor will take a cursor position/offset, and keep iterating forward (and internalizing the bookkeeping of moving position).
A Cursor (or Iterator) will provide Next(), Item(), Position() etc. so the client (encoder manager and dispatcher) can stream through the data.
It'll be a better abstraction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that's a better abstraction for querying individual blobs one by one.
However, there isn't a use case for that right now as the controller will always request a batch of the blobs by status.
We can probably make your cursor abstraction work with batches, but I'm not sure if the usability will significantly improve vs. effort to introduce this new abstraction

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My cursor abstraction can take in status which serves as a filter of blobs to iterate over

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point wasn't that it can't take in status. It was more about the need to support iteration one by one vs. batches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No the underlying mechanism isn't one by one -- it's the same by a small batch and buffer the results. Consuming that buffer is one by one, which is no difference than what's written here with a for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punting this as discussed offline

Comment on lines 260 to 265
metadata[i], err = UnmarshalBlobMetadata(item)
if err != nil {
return nil, nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are ever unable to unmarshal blob metadata due to corrupt field, does that completely halt our ability to process other valid blobs? Seems like we would need to delete the corrupt blob metadata to restore processing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it to it logs and skips those instead

@ian-shim ian-shim force-pushed the dispatcher-paginated branch from d7c6f25 to 8618e8e Compare November 21, 2024 05:33
@ian-shim ian-shim merged commit 41e80d5 into Layr-Labs:master Nov 21, 2024
6 checks passed
Copy link
Contributor

@anupsv anupsv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants