Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

studyjob-controller start failed #546

Closed
hackenzheng opened this issue Dec 17, 2018 · 8 comments
Closed

studyjob-controller start failed #546

hackenzheng opened this issue Dec 17, 2018 · 8 comments

Comments

@hackenzheng
Copy link

All of other pods run succeed, while the studyjob-controller pods always restart. The logs show "no matches for kind "TFJob" in version "kubeflow.org/v1beta1"" . I have confirmed the tf-operator installed successfully. It can start training use tfjob.
image

@jinchihe
Copy link
Member

The problem also happened in my env. Any work around or fix solution? Thanks.

@hackenzheng
Copy link
Author

I have tried deploye kubeflow pipeline on my local k8s cluster several times, it appeard occasionally and this error not block basic functions.

@xiaozhouX
Copy link
Contributor

I met the same problem few days ago, and solved it by replace deployment studyjob-controller's image with katib/studyjob-controller@sha256:870c260af5caa8823f9a64fa126a4ddb6ffd3e911417fe73aa924c3ee144ad8e.

@hackenzheng
Copy link
Author

ok, thanks!

@zoux86
Copy link

zoux86 commented Jan 1, 2019

@hackenzheng hi, i'm a new learner for kubeflow, and i also met the same problem.
but i don't know how to replace deployment studyjob-controller's image!!!
I would be very appreciated that if you tell me how to replace deployment studyjob-controller's image in detail.
Thanks!!!

@KaranKhirsariya
Copy link

I met the same problem few days ago, and solved it by replace deployment studyjob-controller's image with katib/studyjob-controller@sha256:870c260af5caa8823f9a64fa126a4ddb6ffd3e911417fe73aa924c3ee144ad8e.

Hi @xiaozhouX,
I am exploring end-to-end ml codelab and having the same issue. I do not understand how to do the above changes in deployment. Can you please elaborate or give any references on how to do it?

@hackenzheng
Copy link
Author

why not retag the docker image? @zoux86 @KaranKhirsariya

@jonathan1920
Copy link

Hi @KaranKhirsariya and @zoux86 ,

To replace the studyjob-controller's image, click on studyjob-controller in the workload section of the kubernetes part of GCP. Once on the studyjob-controller page, go near the top and click on the YAML tab. This will bring you to the YAML file. In the YAML file, click on the edit button at the top. Then change the values of image to katib/studyjob-controller@sha256:870c260af5caa8823f9a64fa126a4ddb6ffd3e911417fe73aa924c3ee144ad8e

Please let me know if this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants