Skip to content

Timed out waiting for scheduling events during reinstall #778

Open

Description

During a reinstall triggered by changes to installFlags, k0sctl 0.19.0 timed out while waiting for scheduling events and exited with an error:

failed to observe scheduling events after api start-up, you can ignore this check by using --force: context deadline exceeded\ndidn't find any 'Scheduled' kube-system events after ...

It seems that the reinstall phase includes k0sctl trying to look for fresh scheduling-related events in the kube-system namespace:

k0sctl/phase/reinstall.go

Lines 120 to 126 in 9246ddc

log.Infof("%s: waiting for the scheduler to become ready", h)
if err := retry.Timeout(context.TODO(), retry.DefaultTimeout, node.ScheduledEventsAfterFunc(h, time.Now())); err != nil {
if !Force {
return fmt.Errorf("failed to observe scheduling events after api start-up, you can ignore this check by using --force: %w", err)
}
log.Warnf("%s: failed to observe scheduling events after api start-up: %s", h, err)
}

// ScheduledEventsAfterFunc returns a function that returns an error unless a kube-system 'Scheduled' event has occurred after the given time
// The returned function is intended to be used with pkg/retry.
func ScheduledEventsAfterFunc(h *cluster.Host, since time.Time) retryFunc {
return func(_ context.Context) error {
output, err := h.ExecOutput(h.Configurer.KubectlCmdf(h, h.K0sDataDir(), "-n kube-system get events --field-selector reason=Scheduled -o json"), exec.HideOutput(), exec.Sudo(h))
if err != nil {
return fmt.Errorf("failed to get kube system events: %w", err)
}
events := &statusEvents{}
if err := json.Unmarshal([]byte(output), &events); err != nil {
return fmt.Errorf("failed to decode kubectl output for kube-system events: %w", err)
}
for _, e := range events.Items {
if e.EventTime.Before(since) {
log.Tracef("%s: skipping prior event for %s: %s < %s", h, e.InvolvedObject.Name, e.EventTime.Format(time.RFC3339), since.Format(time.RFC3339))
continue
}
log.Debugf("%s: found a 'Scheduled' event occuring after %s", h, since)
return nil
}
return fmt.Errorf("didn't find any 'Scheduled' kube-system events after %s", since)
}
}

This particular cluster has an almost empty kube-system namespace: there are just four CoreDNS pods in it. Nothing really happens there and it can take a while for new scheduling events to show up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions