-
Notifications
You must be signed in to change notification settings - Fork 79
Remove lastReadSequenceNumber.isEmpty condition #90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
elainearbaugh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense to me. I don't quite understand the comment about stuck in the loop where we have data near the tip of the stream but we are not spending enough time to read it -- why would reading data at the beginning be any different than elsewhere?
|
Agreed. Here is an example of a consumer that is working from a set of empty shards running this code (these are the persisted checkpoint offsets): Notice at batch 4, data appears and we move to the sequence number of the incoming data. |
|
In the code, I am handling the following scenario - Say we have started reading from time_horizon. So we need to make multiple get-records API calls to reach to a point where kinesis has data in it. (unfortunately unlike other data sources, Kinesis streams won't give the first available record in 1 API call). And if we dont reach the point where kinesis has data on it within the specified I agree that the current approach violates the meaning of maxFetchTimeInMs and will lead to AWS throttling when we are already on the tip of the stream and there is no new data to read. Do you have any good ideas in handling the above-mentioned scenario? |
|
Thanks for merging this @itsvikramagr. I believe the answer to your question is that in a call at |
Fixes bug #87.
This condition causes an infinite loop and throttling from AWS when the shard is empty. In my example on the ticket, I show it hitting the Kinesis API iteratively for minutes before giving up on the shard. We should more gracefully handle empty shards which admit no
lastReadSequenceNumberno matter how many times you hit them sequentially.Moreover, I believe the condition is unnecessary, because one can just increase
maxReadTimeInMsif you want to spend a longer time reading on the shard.