Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Kinesis partial failures #359

Open
binarylogic opened this issue May 14, 2019 · 7 comments
Open

Handle Kinesis partial failures #359

binarylogic opened this issue May 14, 2019 · 7 comments
Labels
domain: reliability Anything related to Vector's reliability have: should We should have this feature, but is not required. It is medium priority. provider: aws Anything `aws` service provider related sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related type: enhancement A value-adding code change that enhances its existing functionality.

Comments

@binarylogic
Copy link
Contributor

binarylogic commented May 14, 2019

Since we use the PutRecords endpoint, we could experience partial failures. The list of errors is minimal, but this does offer the opportunity for data loss. For example, if a specific shard encounters a ProvisionedThroughputExceededException error, then any records going to that shard will be lost.

For this particular sink, I think we should collect any failed records and run them through the configured retry policy.

Related #140

@binarylogic binarylogic added sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related type: enhancement A value-adding code change that enhances its existing functionality. labels May 14, 2019
@ckdarby
Copy link

ckdarby commented May 19, 2020

@binarylogic This would be nice to see before 1.0, not specific to Kinesis but handling partial sinks.

This is something today we bumped into not being able to deploy out to production as a replacement to our existing custom stuff as we deal with Kinesis and do get back partial failures.

@binarylogic binarylogic added the have: should We should have this feature, but is not required. It is medium priority. label May 19, 2020
@binarylogic
Copy link
Contributor Author

Thanks @ckdarby. I agree. I'm curious what you think about the following behavior:

  1. The request partially fails.
  2. Pluck individual failed records.
  3. Retry with failed records only.
  4. Continue retrying in accordance with the configured policy (request.retry_* options).

@ckdarby
Copy link

ckdarby commented May 19, 2020

@binarylogic That logically makes sense to me & would be the expected behaviour from my point.

I reached out to the internal people on my side who encountered this issue for their two cents. If they have different opinions I would suspect they'll drop it here.

@binarylogic binarylogic added domain: reliability Anything related to Vector's reliability provider: aws Anything `aws` service provider related labels Aug 7, 2020
@abrahamchaibi
Copy link

looks like this is also an issue for us!

@binarylogic binarylogic added this to the 2020-12-21 Kryptek Yeti milestone Dec 17, 2020
@jamtur01 jamtur01 removed this from the 2020-12-21 Kryptek Yeti milestone Dec 21, 2020
@jasongoodwin
Copy link
Contributor

jasongoodwin commented Mar 3, 2023

This ticket seems to be an issue.
#140 <- this ticket has some design discussions for ES.
Looks like it may not be the easiest item to tackle without retrying the entire payload.

kevinburke pushed a commit to kevinburke/vector that referenced this issue Mar 29, 2023
I assumed that these lines meant that failures would be retried but it
does not mean that in many cases, so let's make that explicit and
describe the plans to address the behavior.

Updates vectordotdev#359.
Updates vectordotdev#7659.
Updates vectordotdev#16954.
spencergilbert pushed a commit that referenced this issue Mar 30, 2023
I assumed that these lines meant that failures would be retried but it
does not mean that in many cases, so let's make that explicit and
describe the plans to address the behavior.

Updates #359.
Updates #7659.
Updates #16954.
@kuzaxak
Copy link

kuzaxak commented Jul 20, 2024

Hey hey, no plans to implement that?

@jszwedko
Copy link
Member

We'd like to, but no immediate plans. Contributions, of course, are always welcome 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: reliability Anything related to Vector's reliability have: should We should have this feature, but is not required. It is medium priority. provider: aws Anything `aws` service provider related sink: aws_kinesis_streams Anything `aws_kinesis_streams` sink related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

No branches or pull requests

7 participants