Skip to content

Conversation

@pschoenfelder
Copy link
Contributor

If an agent re-registers with mesos, but the task has already started being removed from the load balancer, don't reactivate it. Instead, clean it gracefully.

taskManager.createTaskCleanup(
new SingularityTaskCleanup(
null,
TaskCleanupType.PRIORITY_KILL,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the type we actually want here. Nothing really fit perfectly, so this seemed reasonable. Or we could add a type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

priority kill is an inta-kill, probably too harsh. We can have this shut down gracefully similar to when we replace it with a bounce. Could easily fall under DECOMISSIONING ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget this actually, saw your follow up question below. if we return a StatusUpdateResult.KILL_TASK in the end, that will take care of it all for us

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though not sure if it will take care of the cleaning + pending request. Would have to double check that. If not we'll need to stick to the cleanup being created + returning DONE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returning the KILL_TASK will put a timeout of 1 minute for graceful termination on the task, so I think we want to go with the task cleanup + DONE here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Optional.empty()
)
);
return StatusUpdateResult.IGNORED;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not sure of this, but since we want to clean gracefully I figure we don't want mesos to touch it.

return StatusUpdateResult.KILL_TASK;
} else if (
maybeTaskHistory.isPresent() &&
status.getReason() == Reason.REASON_AGENT_REREGISTERED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already covered by the enclosing isRecoveryStatusUpdate if statement

Copy link
Contributor Author

@pschoenfelder pschoenfelder Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything specific to re-registration in that method - will leave as is for now

.anyMatch(
lbUpdate ->
lbUpdate.getMethod() == LoadBalancerMethod.CANCEL ||
lbUpdate.getMethod() == LoadBalancerMethod.DELETE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the methods we cal in baragon, not a representation of an add or remove. What we want to be checking is for a LoadBalancerRequestType of REMOVE. taskManager has a getLoadBalancerState method for this

@ssalinas ssalinas added the staging Merged to staging branch label Sep 22, 2020
@ssalinas
Copy link
Member

🚢

@ssalinas ssalinas merged commit ce866fb into master Sep 22, 2020
@ssalinas ssalinas deleted the tasklostfix branch September 22, 2020 12:41
@ssalinas ssalinas added this to the 1.3.0 milestone Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

staging Merged to staging branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants