How to avoid K8s killing pod in a large migration #7

vallishk · 2024-10-03T10:38:32Z

vallishk
Oct 3, 2024

We are in middle of a migration project where we want to migrate millions of documents from one mongo collection to another (or in certain cases add a new field to existing documents). During testing we are seeing that when the operation takes longer than 10 minutes (which is expected), K8s kills the pod since the health check did not pass for the pod.

Am I missing something? Are there any best practises to follow while doing such large migrations. We expect the migration to go as long as 3-4 hours in a single stretch.

Answered by dieppa

Oct 5, 2024

I put this on stackoverflow:

I understand that you want to deploy your pod so that it runs the migration at startup, keeps the pod alive while the migration is running, but does not make it available for consumption until the process is complete, meaning Kubernetes won't send traffic to it.

First of all, I want to let you know that in the next major version of Mongock there will be a feature specifically designed to handle long migrations. However, since this feature is not currently available, I suggest using the following approach.

The key idea here is to use the two main Kubernetes probes (liveness and readiness) to control the pod’s state. You can set up the readiness probe to return …

View full answer

dieppa · 2024-10-03T14:28:33Z

dieppa
Oct 3, 2024
Maintainer

Hello @vallishk , I will answer this question and I think I can provide an interesting approach to solve your issue.

However, I find this question really interesting and will benefit a lot of people that will face the same issue. So in order to provide more visibility, can you please raise this question on stackoverflow? I will answer straightaway! 😃

Regards!

2 replies

vallishk Oct 4, 2024
Author

Here you go: https://stackoverflow.com/questions/79052908/how-to-avoid-k8s-killing-pod-in-a-large-migration

dieppa Oct 5, 2024
Maintainer

I post the answer on how to run Mongock asynchronously and keep track on the state to be able to reflect in the liveness and readiness probes. Please set the answer on stackoverflow as useful, that would help a lot 🙏🏼

dieppa · 2024-10-05T09:44:15Z

dieppa
Oct 5, 2024
Maintainer

I put this on stackoverflow:

I understand that you want to deploy your pod so that it runs the migration at startup, keeps the pod alive while the migration is running, but does not make it available for consumption until the process is complete, meaning Kubernetes won't send traffic to it.

First of all, I want to let you know that in the next major version of Mongock there will be a feature specifically designed to handle long migrations. However, since this feature is not currently available, I suggest using the following approach.

The key idea here is to use the two main Kubernetes probes (liveness and readiness) to control the pod’s state. You can set up the readiness probe to return "NOT READY" until the migration has completed, while the liveness probe continues to return "ALIVE" unless Mongock encounters a failure.

This approach means that, technically, your API will be running before the migration is complete, but it will not actually receive traffic, as Kubernetes will only consider the pod ready once the migration finishes successfully, making in practice your API not available.

To implement this, you will likely need to run Mongock asynchronously, allowing the API and the liveness and readiness endpoints to be available for Kubernetes to check. During this process, you should monitor Mongock's state to determine the correct responses for the liveness and readiness probes. In order to manage the Mongock's state you may use the Mongock events.

You might consider using a startup probe, as it seems suitable for this scenario. However, this would require setting a long failureThreshold, which is not ideal because it's not reliable and it could also affect other deployments and introduce a potential security risk.

1 reply

dieppa Oct 5, 2024
Maintainer

Regarding your question about how to run Mongock asynchronously to achieve this.

Regardless of if you are using springboot or not, you need the following

Using the builder
Run Mongock in a thread with something like this new Thread(mongockRunner::execute).start()
Create an event listener for the Mongock events: success and failure. Ideally the start event as well, to indicate Mongock process started, although it's not mandatory. Below I will add the code for both cases, standalone and springboot
Have two different endpoints for your liveness and readiness probes

The above is the foundation, with this you can have multiple approaches. I would go with the approach of having an shared object(MongockStateTracker) containing the Mongock state, which will be updated from the listeners.

The MongockStateTracker class would look like something like this:

public class MongockStateTracker {
    public enum State {
        NOT_STARTED, RUNNING, FINISHED_OK, FINISHED_FAILED
    }

    private State state = State.NOT_STARTED;

    public synchronized void setFinishedOk() {
        state = State.FINISHED_OK;
    }

    public synchronized void setFinishedFailed() {
        state = State.FINISHED_FAILED;
    }

    public synchronized void setRunning() {
        state = State.RUNNING;
    }

    public State getState() {
        return state;
    }

    public Boolean isNotFinished() {
        return getState() == State.NOT_STARTED || getState() == State.RUNNING;
    }
}

Then you need to have this logic for the liveness probe endpoint, which should be something like this(please be aware you need to inject the shared instance of MongockStateTracker)

if(stateTracker.isNotFinished() || stateTracker.getState() == MongockStateTracker.State.FINISHED_OK) {
           return "ALIVE";
       } else {
           return "NOT ALIVE";
       }

Then you need to have this logic for the readiness probe endpoint, which should be something like this(please be aware you need to inject the shared instance of MongockStateTracker)

if(stateTracker.getState() == MongockStateTracker.State.FINISHED_OK) {
            return "READY";
        } else {
            return "NOT READY";
        }

If you are using Spring boot, you the best way is to inject a ApplicationRunner bean to execute Mongock. It should be something like this

@Bean
    public ApplicationRunner mongockApplicationRunner(ApplicationContext springContext,
                                                      MongoTemplate mongoTemplate) {

        MongockRunner mongockRunner = MongockSpringboot.builder()
                .addMigrationScanPackage("YOUR_MIGRATION_PACKAGE_PATH")
                .setEventPublisher(springContext)
                .setSpringContext(springContext)
                //more setters
                .buildRunner();

        return args -> new Thread(mongockRunner::execute).start();

    }

...And finally you need to update the MongockStateTracker's state. I will show the code for both, standalone and springboot

Standalone: You need to add the consumers in the builder itself

MongockStandalone.builder()
//...more setters
				.setMigrationStartedListener(startedEvent -> stateTracker.setRunning())
				.setMigrationSuccessListener(successEvent -> stateTracker.setFinishedOk())
				.setMigrationFailureListener(failEvent -> stateTracker.setFinishedFailed());

In case you are using sping boot, you need to inject a bean for each listener like the following

    @Bean
    public ApplicationListener<SpringMigrationFailureEvent> failureEventListener() {
        return event -> stateTracker.setFinishedFailed();
    }

    @Bean
    public ApplicationListener<SpringMigrationSuccessEvent> successEventEventListener() {
        return event -> stateTracker.setFinishedOk();
    }

    @Bean
    public ApplicationListener<SpringMigrationStartedEvent> startedEventListener() {
        return event -> stateTracker.setRunning();
    }

That should do the work ;)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid K8s killing pod in a large migration #7

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to avoid K8s killing pod in a large migration #7

Uh oh!

vallishk Oct 3, 2024

Replies: 0 comments · 5 replies

Uh oh!

dieppa Oct 3, 2024 Maintainer

Uh oh!

vallishk Oct 4, 2024 Author

Uh oh!

dieppa Oct 5, 2024 Maintainer

Uh oh!

dieppa Oct 5, 2024 Maintainer

Uh oh!

dieppa Oct 5, 2024 Maintainer

vallishk
Oct 3, 2024

Replies: 0 comments 5 replies

dieppa
Oct 3, 2024
Maintainer

vallishk Oct 4, 2024
Author

dieppa Oct 5, 2024
Maintainer

dieppa
Oct 5, 2024
Maintainer

dieppa Oct 5, 2024
Maintainer