-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running pipeline: cannot create tfjobs.kubeflow.org 403 #294
Comments
I think the issue is that you need to run this command or similar before running any pipelines: |
Here's the related issue: #220 |
Thanks! |
Hmm, I have not seen this (and can't repro). :( Jeremy, I have a vague memory of your saying there was some recent issue with tf-job -- could this be related? @jlewi |
@qimingj @gaoning777, could you please have a look? |
@qimingj @gaoning777 any update on this? |
The RBAC issue should be resolved with the latest version. @lluunn Could you give another try and if issue still persist, could you share me the pipeline definition and the kubeflow version? |
@IronPan Is there a pipelines test that covers firing off TFJob from pipelines? If not can we open up an issue to add such at test and use that to verify the fix is working? |
We currently don't have any samples covering tf-job except for @amygdala's sample. The sample's trainer container (gcr.io/google-samples/ml-pipeline-kubeflow-tf-taxi) was contributed by Amy and is out of pipeline's repo either. @amygdala, does your sample still work in latest kubeflow deployment? I am up for covering tf-job since this is the key component in Kubeflow. |
Barbara reported that there is a credentials issue (see below) with running my tf-job step in her setup, which was created using the launcher (in contrast to my original instructions, in which I created the GKE cluster nodes with (I remember that previously, there was some situation where I needed to run: |
Update part 1: I had no problems with tf-job using a cluster created with 'cloud-platform' scoped nodes and this pipelines bootstrapper: gcr.io/ml-pipeline/bootstrapper:0.1.7 . Next I'll play around with a launcher-created cluster. Maybe it needs some additional cluster role during setup. (cc @BasiaFusinska ) |
I think the issue is caused by tf-job pod failed to talk to gcp services because tf-job pod doesn't have right GCP service account set up in launcher.
We need to mount the kubeflow-user GCP service account here. |
You're probably right, I'll try that next. |
Closing this issue as a duplicate of #677 |
* add init.sh * Move repo checkout to init.sh * Update Dockerfile to use init.sh * Change image pull path * Add args to deploy config * Add PYTHONPATH * temp disable deployment deletion * add comment * update * create issue for deployment permission and deletion script * use command in yaml instead of args + ENTRYPOINT * change tag to live
reviews should be directed to folks who are active to optimize.
Steps:
The text was updated successfully, but these errors were encountered: