-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: Add snuba outcomes consumers to setup #426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!
@fpacifici @tkaemming can one of you review what I did with the |
Could you expand on the reasoning of using earliest ? I am not sure I get what is broken exactly so I cannot say whether using earliest is the right solution. |
Yeah, I think this is the best option in the absence of any real migration strategy. The biggest downsides that I can see here would be:
Both of these are pretty unlikely (but not impossible), and the second one really comes down to a decision between potential data duplication with resetting to Thanks for bring this to our attention. 👍 |
@fpacifici we were aiming to do the TSDB migration for the onpremise setup that we did in prod. We found that we already had used the redissnuba backend in onpremise, so that means we stopped writing some keys to redis while at the same time the snuba consumers were not set up. That means the tsdb data (for the models that can be derived from outcomes) is gone. However, the default topic retention is 7 days, so by adding the snuba consumer now and setting its auto-reset to "earliest" we may be able to recover some data in onpremise setup |
The second case makes sense, but why would this one happen? |
Unexpected implementation error. It shouldn't happen but Kafka does (or at least used to) allow you to set locally and commit invalid offsets. I think it's also theoretically possible on an unclean failover where an out of sync replica gets promoted to leader for a partition. |
@@ -15,6 +15,7 @@ x-sentry-defaults: &sentry_defaults | |||
- smtp | |||
- snuba-api | |||
- snuba-consumer | |||
- snuba-outcomes-consumer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this ?
This will force snuba-outcomes-consumer service to be running for any of the: web, cron, worker, event-consumer, post-process-forwarder and sentry-cleanup services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking we should run it. I am not sure about how our docker-compose is structured, but it seems like we only have two dependency lists, one for any snuba container and one for any sentry container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@untitaker we can have a dependency list for any service so if this can be a dependency of something else, like relay we can put it there too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's up to you how granular you want this to be, but I can't imagine this consumer being optional in any setup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using earliest
makes sense here as:
- It would allow us to have the last 7-days worth of data for already broken Sentry 10 installations
- Double counting is better than 0 counting as there is possibility to recover the underlying actual data with double counting. When you lose data, you have nothing to work off of.
Question: should we do the same for events consumer too as the above 2 applies there too and I don't think double processing events would create more records, just more work during recovery?
Potentially but I would only cross that road once we know that we had a bug that caused us to drop events. |
Now in the case of events, if the consumer ends up going to the earliest available offset, it means one of those two cases Ted mentioned above, which imply an extended downtime of the main feature of Sentry, thus during this downtime pretty much nothing works. At that point I am not sure whether we recover from latest or earliest makes a big difference. |
* feat: Add snuba outcomes consumers to setup * fix: Rename all references of snuba-consumer * ref: Rename back to snuba-consumer * fix: Change auto-offset-reset * fix: Attempt to fix CI
* feat: Add snuba outcomes consumers to setup * fix: Rename all references of snuba-consumer * ref: Rename back to snuba-consumer * fix: Change auto-offset-reset * fix: Attempt to fix CI
TSDB is currently broken because we've been using redissnuba backend without running the consumers.
Add outcomes consumers with an odd value for
auto-offset-reset
to attempt to recover TSDB data from outcomes.