-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ml-pipeline-persistenceagent fails a few times. #624
Comments
Let me take a look... |
@neuromage I get the same when deploying v0.4.0. and it does not get recoved. |
Hi There, I'm also facing issue with kubeflow: v0.4.1 Here is the log,
Any pointers / suggestions will be really helpful. Thanks |
That log message is harmless, in that it just means we're running on a cluster (we should probably re-word that). Are you seeing crashes/restarts for this process? What's the output when you run:
You should see something like this:
|
related issue: #676 |
Thanks for helping out. |
Thanks, this looks like #676. I will update that issue when I get to the bottom of the problem. |
@nareshganesan did you try kubeflow 0.4.1 ? I can't reproduce ml-pipeline-persistenceagent crashes with that version. I think it's been fixed (0.4.0 may have had the issue you're seeing though). |
They fixed this in 0.4.1 afaik |
Thanks for your time. I'm still facing the issue. Here are my steps.
logs from ml-pipeline-persistenceagent pod
Its restarted couple of times on a fresh cluster already.
I manually hit the health check url from one of my pod, and it works!
Please let me know. |
From your error, looks like this is unrelated to this current issue. Please create a new, separate issue for your problem: https://github.com/kubeflow/pipelines/issues/new |
…the cluster (kubeflow#625) * Use Anthos Config Management to continually sync the label config to the cluster * Define a kustomize package to generate the config map from the labels * ACM requires the directory structure follow a certain format so we created the directory label_sync/acm_repo * A simple shell script runs kustomize to dump the manifests to the directory. Related to kubeflow#624 * Address comments.
Kubeflow v0.4.0-rc.3
On GKE (with official doc instructions for CLI setup)
Everything sets up well, but the
ml-pipeline-persistenceagent
has failed 4x. On the fifth retry, the pod is up and running. Here's a log of one of its failures.If this is nothing to worry about, feel free to just close this issue.
The text was updated successfully, but these errors were encountered: