Add a mechanism to enable leader-election when handling migrations #20265
Labels
customer-ufa
~engineering-initiated
Engineering-initiated story, such as a bug, refactor, or contributor experience improvement.
Problem
We are facing challenges with the current database migration strategy for Fleet device management in a complex deployment environment. Our infrastructure requires that services remain online at all times, making it difficult to scale services down to zero instances for migrations. The existing procedure mandates taking the servers offline to run migrations, which conflicts with our operational requirements and can lead to service disruptions.
Summary
This issue arises from a discussion with a customer regarding database migrations for Fleet device management. The customer has a complex deployment strategy where scaling services down to zero is difficult and undesirable. The current upgrade strategy for Fleet involves taking the existing servers offline and running database migrations using the Fleet application. However, the customer's infrastructure requires services to be up at all times, making it challenging to follow this procedure.
Context
The current upgrade strategy for Fleet involves:
fleet prepare db
.The customer outlined the following concerns and limitations:
Discussion Highlights
Customer's Understanding and Challenges:
Current Workarounds:
Feature Request:
Proposed Solution
Implement a distributed locking mechanism to coordinate database migrations. This could be achieved using:
Note
https://redis.io/docs/latest/develop/use/patterns/distributed-locks/
https://redis.io/glossary/redis-lock/
SKIP LOCKED
feature (available from version 8.0) to implement a distributed lock. Instances would attempt to acquire a lock by querying a specific table/row with theSKIP LOCKED
clause. The instance that successfully acquires the lock would proceed with the migration.Important
SKIP LOCKED
is only available at or above MySQL 8.0Important Considerations
Benefits
The text was updated successfully, but these errors were encountered: