-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Liveness Readyness probes #65
Comments
Lets start by defining those 2 probes:
Lets try to list out the requirements. @rymsha you are probably more fit to do this than me, but I can try to start. For pod to be READY, all of these conditions must be met:
For pod to be ALIVE, all of these conditions must be met:
These are the things I can think off the top of my head. What do you guys think? |
It is truly hard to find good official docs, but not hard to google good explanations for the statement:
It feels like liveness probe was designated with node.js in mind (where deadlock would mean its single thread is stuck) Anyway, app-alive is one of the worst choices to use for liveness probe. Practically we could skip liveness probe altogether. |
Not been into this discussion, but we need a check that can be customized i.e. to visit a list of URLs before a pod is opened for traffic. |
This maybe an overkill for "readyness", because
Warm-up could be part of container initialisation, I guess. |
Thing is, that (unless infrastructure can provide a "slow start") hitting a cold node with serious traffic, the node may "hang" for several minutes until it is able to respond, the result a lot of unhappy users. |
Unfortunately there is no happiness probe in k8s yet. @gbbirkisson ALIVE need not to list any READY factors due to the same note from official docs
|
This has an interesting idea. Dedicated master nodes are READY even when ES cluster is YELLOW. There is one potential problem though - we run applications on masters. Worse - elected master does most of background tasks. Another problem is our incapability to distinguish between initial startup and ongoing readyness checks. |
I think we might be able to use the startup probe. 1.16 is available on GKE regular channel and in azure. So lets assume that is possible. That should help us a lot. Then we have:
So my attempt to revise the previous list would be something like: For startup probe to succeed, all of these conditions must be met:
For readiness probe to succeed, all of these conditions must be met:
For liveness probe to succeed, all of these conditions must be met:
|
Just one question @rymsha . We want the new alive app (lets call it app-k8s-probe) to be started as soon as possible. Would it be sufficient to put it in the deploy folder, or do we have to put it in the system folders? It probably depends on es, hazelcast and jetty at least. |
Anything put into Anything put into So, just putting an app in this or that folder won't help much. It depends on what we want from that app and how we write it. But I'm still trying to figure out this puzzle in my head. I believe app-k8s-probe should be available as late as possible, but not later. It sounds like opposite to what you wrote, but in practice it repeats what you meant. |
I'm not familiar with startup probe yet. Need to try it first. Hopefully it holds rolling update same way as readiness probe. If so then: For startup probe to succeed, all of these conditions must be met:
For readiness probe to succeed, all of these conditions must be met:
For liveness probe to succeed, all of these conditions must be met: ??? I still can't find the good reason to automatically kill pod. Anything that depends on jetty depends on hazelcast as well (for distributed sessions). So my idea about Jetty connector maybe not so good either. We should skip liveness probe for now. Leave it for non-legacy and fancy, but deadlock prone frameworks (like Node.js). |
Ok we can drop the liveness probe for now. I think the checks ...
... should be in the readiness probe. Simply because the startup probe will kill the pods, and the readiness probe will not. So then we do not risk the startup probe killing a node that is replicating indexes. |
I was thinking we could expose the app-k8s-probe on port 2609. Also I was thinking it would serve as an endpoint for all the probes. This is small example on how that could look like: @Component
public class K8sProbeReporter
implements StatusReporter
{
@Override
public String getName()
{
return "k8s.probes";
}
@Override
public MediaType getMediaType()
{
return MediaType.PLAIN_TEXT_UTF_8;
}
@Override
public void report( final StatusContext context )
throws IOException
{
Optional<String> type = context.getParameter( "type" );
switch ( type.orElseThrow( () -> new ProbeException( "Missing 'type' parameter" ) ) )
{
case "startup":
doStartupCheck();
break;
case "readiness":
doReadinessCheck();
break;
case "liveness":
doLivenessCheck();
break;
default:
throw new ProbeException( "Invalid 'type' parameter" );
}
context.getOutputStream().write( "OK".getBytes( StandardCharsets.UTF_8 ) );
}
private void doStartupCheck()
throws ProbeException
{
// We know jetty is running because otherwise we would not be serving this request
// Also because jetty is running, we know that hazelcast is OK
checkIfReposAreInitialized(); // This should tell us elastic search is ready
checkIfEssentialComponentsAreRunning();
// No checks threw exception, exit ...
}
private void doReadinessCheck()
throws ProbeException
{
if ( isClustered() && isDataNode() && thisNodeHasNeverSeenGreenState() )
{
checkIfClusterInGreenState();
}
checkIfLocalElasticSearchAcceptsRequests();
if ( !isClustered || isFrontendNode() )
{
checkIfHasDoneWarmup();
checkIf8080Responds();
}
// No checks threw exception, exit ...
}
private void doLivenessCheck()
{
// Jetty is responding, so no need to do other checks
}
@Override
public void report( final OutputStream stream )
throws IOException
{
throw new ProbeException( "This method should not be called!" );
}
public static class ProbeException
extends IOException
{
public ProbeException( String message )
{
super( message );
}
}
} |
I see how you attempt to solve it with |
As you notice, I think opposite. Warmup (Happiness probe): Nowadays software is made to give "good-enough" performance from the start, instead of "maximum possible" performance but after a few iterations.
So, my input on this: fix slow starts as much as possible, don't waste time figuring out how to write warmups and introducing even more entropy in our, already quite complex, cluster startup process. |
If we could drop lazy loading of bundles in server distro, but keep it for SDK - initial performance might be better. Also, warmup could be really easy to achieve, Just invoke root entry for each configured vhost before starting to serve requests |
Lets imagine we have 3 new nodes and 1 fails during startup (early on, we have seen that can happen). The cluster will then be in a YELLOW state for an hour. No node will be "READY" since they are all technically in the startup waiting for that one node to join the cluster. They will then all be restarted at the same time when all their readiness probes fail simultaneously. This is not good.
If replication takes more time then the startup probe timeout, it could result in data loss. As you said it can take a long time, the solution is not to just increase the timeout up to huge amounts. Like I said earlier: "... the startup probe will kill the pods, and the readiness probe will not." So I think it is safer to check the replication the readiness probe. Remember the next node during rolling update will not bounce until the readiness probe says its OK.
This is highly dependent on the way we implement the readiness probe. In the code snippet I put in this thread, the readiness probe does not care if the cluster goes to a yellow sate. So in that case shrinking the cluster will not affect the LB. |
1 out of 3 failed is surprisingly GREEN - because there will be master elected and all shards will find its replicas |
Killing a pod which is not yet considered started is not an issue - because there is certainly a full replica of it (it need to get data from somewhere). No data loss possible |
Would be nice to do a meeting about this - go through all the steps and then agree on a solution? |
The conclusion of the meeting was:
|
Now if you have:
During a rolling update, all the data nodes will become NOT READY because the cluster is yellow. This is desired behavior. But this will yank all of the nodes from the lb, and that is not good. How to deal with this? This will only happen with 1-2 data nodes but not with 3+. Obviously this would not be a problem if you have dedicated frontend nodes. |
HealthCheck endpoint: enonic/xp#9199 |
Umbrella ticket to discuss and implement liveness/readyness probes correctly because current approach with app-alive
doesn't work well.
Current problems
app-alive reports "ready" too early : when Elasticsearch still replicates shards (yellow). It leads do data loses during rolling update.
app-alive reports "not alive" on ALL nodes, just because Elasticsearch cluster lost its master due to its restart. Currently it is XP bug, but we still seek for better "alive" check than "es cluster is not red"
The text was updated successfully, but these errors were encountered: