Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidSequenceTokenException: The given sequenceToken is invalid. The next expected sequenceToken is: #137

Open
nish20490 opened this issue Sep 23, 2020 · 15 comments

Comments

@nish20490
Copy link

In our project we have 6 microservices running and we using Seneca for them.
We have a centralized logger implemented using winston.
Recentry we tried to add logs to cloudwatch so we have added Winston cloudwatch.
But wheneverI start my development server, each microservice starts which puts a log to cloudwatch.
So during this, each time I get this InvalidSequenceTokenException for atleast 2 of the microservices start logs.
I am not sure why this is happening.
Any help would be appreciated.

@nish20490
Copy link
Author

Because of this exception, some logs are not reaching to cloudwatch. We are facing same issue on staging as well as on prod.
Is there a way to retry these missing logs?

@lazywithclass
Copy link
Owner

Are you using 6 different streams? If I remember correctly (it's been years since I've used this library in production), you should assign to each different source of logs.

There should be a best practice somewhere in AWS Cloudwatch documentation explaining this.

@Lisenish
Copy link

Lisenish commented Aug 9, 2021

@nish20490 @lazywithclass

Same happens to me, we have just two services (worker and API) on in AWS, each of them has 2 instances.
Initially services used different streams but instances shared it and everything worked fine but after some time I noticed that services stop log anything after ~6-24 since the last deploy. Restart or deploy fixes it for the next ~6-24 hours.

I saw DataAlreadyAcceptedException and InvalidSequenceTokenException errors in logs and decided that the reason is the shared streams, I reconfigured them so now each instance has it's own steram - but unfortunately it didn't help, I continue to see the exactly same behaviour and the same errors in logs

I also think this closed ticket may be related #59.
And the similar issue in another library for logging mirkokiefer/bunyan-cloudwatch#20 (they also provided info why DataAlreadyAcceptedException may happen instead of InvalidSequenceTokenException

UPD:
I've created this PR which supposedly should fix the issue with DataAlreadyAcceptedException at least #158.
I also noticed that in InvalidSequenceTokenException error handling we don't try to use expectedSequenceToken from the error payload. Maybe we should add this in a separate PR, it will allow us to get the correct token without an extra API call.

@lazywithclass
Copy link
Owner

Did the proposed PR solved your issue? If so I will review and test it in my environment later this week.

@Lisenish
Copy link

@lazywithclass I tested it yesterday and unfortunately it's not :(.
Maybe I did something wrong in the PR but after I deployed this fix I stopped seeing DataAlreadyAcceptedException or InvalidSequenceTokenException errors in the logs but they still suddenly stopped appearing after some time.

@lazywithclass
Copy link
Owner

Ok I will have to give this a look, but I am studying for September's exam session, so I cannot guarantee a prompt fix.

@kubakrzempek
Copy link

Now the lib upon InvalidSequenceTokenException will try to get a new sequence token and retry sending the logs. However, if there are many resources racing for the same stream (e.g., 4 hosts running the same app from behind load balancer), it might result in the InvalidSequenceTokenException reappearing, and in this case, the error will be thrown and the logs (from that batch) discarded.

How about introducing an option retryCount (or something) where we could decide how many times the lib should try to get the correct token (for example I'm okay if it will try let's say 30 times). Would that work? If so, I can submit a PR.

@symbianm
Copy link

Any updates? Have the same issue.

@kallekj
Copy link

kallekj commented Jun 2, 2022

I'm also experiencing this issue, it occurs when we have multiple tests running in parallel which overloads the stream, or it tries to reuse the same sequence token.

@sporteman
Copy link

Any news on that? I'm getting the same issue here.

@sowmenappd
Copy link

Hey guys, has anyone worked on this issue yet? We're getting a lot of InvalidSequenceTokenException because even though our services use separate streams, most of our services have several instances, which eventually hits the error because of the same race condition mentioned by @kubakrzempek.

@Lisenish
Copy link

Lisenish commented Nov 2, 2022

@sowmenappd In our case, we just migrated to pino with the logs emitted just as stdout and streamed to CW by AWS. I think it's a more reliable and stable approach, but in case you want to stream them by yourself there's pino-cloudwatch transport.

@sowmenappd
Copy link

@Lisenish thank you, yes this definitely seems like a far better approach. However, wouldn't the pino-cloudwatch method also suffer from the same issue if there are multiple instances of the same service?

@kubakrzempek
Copy link

I completely agree with you! Streaming the logs is definitely the way to go, especially when it comes to scaling. It just makes everything so much easier to manage and maintain.

@Lisenish
Copy link

Lisenish commented May 2, 2023

@sowmenappd I didn't use pino-cloudwatch actually so can't tell for sure, I just used default stdout and streamed stdout to CW by AWS without using any libraries. I didn't encounter any issues with this approach (although IIRC it creates a separate stream for each machine, which is actually a correct way too)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants