-
|
We are in middle of a migration project where we want to migrate millions of documents from one mongo collection to another (or in certain cases add a new field to existing documents). During testing we are seeing that when the operation takes longer than 10 minutes (which is expected), K8s kills the pod since the health check did not pass for the pod. Am I missing something? Are there any best practises to follow while doing such large migrations. We expect the migration to go as long as 3-4 hours in a single stretch. |
Beta Was this translation helpful? Give feedback.
Replies: 0 comments 5 replies
-
|
Hello @vallishk , I will answer this question and I think I can provide an interesting approach to solve your issue. However, I find this question really interesting and will benefit a lot of people that will face the same issue. So in order to provide more visibility, can you please raise this question on stackoverflow? I will answer straightaway! 😃 Regards! |
Beta Was this translation helpful? Give feedback.
-
|
I put this on stackoverflow: I understand that you want to deploy your pod so that it runs the migration at startup, keeps the pod alive while the migration is running, but does not make it available for consumption until the process is complete, meaning Kubernetes won't send traffic to it. First of all, I want to let you know that in the next major version of Mongock there will be a feature specifically designed to handle long migrations. However, since this feature is not currently available, I suggest using the following approach. The key idea here is to use the two main Kubernetes probes (liveness and readiness) to control the pod’s state. You can set up the readiness probe to return "NOT READY" until the migration has completed, while the liveness probe continues to return "ALIVE" unless Mongock encounters a failure. This approach means that, technically, your API will be running before the migration is complete, but it will not actually receive traffic, as Kubernetes will only consider the pod ready once the migration finishes successfully, making in practice your API not available. To implement this, you will likely need to run Mongock asynchronously, allowing the API and the liveness and readiness endpoints to be available for Kubernetes to check. During this process, you should monitor Mongock's state to determine the correct responses for the liveness and readiness probes. In order to manage the Mongock's state you may use the Mongock events. You might consider using a startup probe, as it seems suitable for this scenario. However, this would require setting a long failureThreshold, which is not ideal because it's not reliable and it could also affect other deployments and introduce a potential security risk. |
Beta Was this translation helpful? Give feedback.
I put this on stackoverflow:
I understand that you want to deploy your pod so that it runs the migration at startup, keeps the pod alive while the migration is running, but does not make it available for consumption until the process is complete, meaning Kubernetes won't send traffic to it.
First of all, I want to let you know that in the next major version of Mongock there will be a feature specifically designed to handle long migrations. However, since this feature is not currently available, I suggest using the following approach.
The key idea here is to use the two main Kubernetes probes (liveness and readiness) to control the pod’s state. You can set up the readiness probe to return …