-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v2] Paginate metadata records in controller #914
Conversation
5cff769
to
d45c2a2
Compare
type StatusIndexCursor struct { | ||
PK string | ||
UpdatedAt uint64 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Low priority ask, could you add a short godoc for PK
, it's not obvious at first glance what it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you made me realize PK
is probably not the right field in the exported cursor. Will change it to BlobKey
d45c2a2
to
d7c6f25
Compare
// even when there are no more results or there are no results at all. | ||
// This cursor can be used to get new set of results when they become available. | ||
// Therefore, it's possible to get an empty result from a request with exclusive start key returned from previous response. | ||
func (s *BlobMetadataStore) GetBlobMetadataByStatusPaginated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about actually build a cursor that streams the data?
The StatusIndexCursor
is really a position or offset; and an actual Cursor will take a cursor position/offset, and keep iterating forward (and internalizing the bookkeeping of moving position).
A Cursor (or Iterator) will provide Next()
, Item()
, Position()
etc. so the client (encoder manager and dispatcher) can stream through the data.
It'll be a better abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that's a better abstraction for querying individual blobs one by one.
However, there isn't a use case for that right now as the controller will always request a batch of the blobs by status.
We can probably make your cursor abstraction work with batches, but I'm not sure if the usability will significantly improve vs. effort to introduce this new abstraction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My cursor abstraction can take in status which serves as a filter of blobs to iterate over
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main point wasn't that it can't take in status. It was more about the need to support iteration one by one vs. batches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No the underlying mechanism isn't one by one -- it's the same by a small batch and buffer the results. Consuming that buffer is one by one, which is no difference than what's written here with a for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
punting this as discussed offline
metadata[i], err = UnmarshalBlobMetadata(item) | ||
if err != nil { | ||
return nil, nil, err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are ever unable to unmarshal blob metadata due to corrupt field, does that completely halt our ability to process other valid blobs? Seems like we would need to delete the corrupt blob metadata to restore processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it to it logs and skips those instead
d7c6f25
to
8618e8e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg
Why are these changes needed?
Instead of querying all available metadata to process, limit the number of metadata to process per iteration and keep the cursor for the next iteration.
Checks