Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Backend] Use pod namespace as default when not in multi user mode #3009

Merged
merged 5 commits into from
Feb 8, 2020

Conversation

Bobgy
Copy link
Contributor

@Bobgy Bobgy commented Feb 7, 2020

Fixes #2908

The fix is the same as comment: #2908 (comment)

Problem of the bug is that, pod client need to default to pod namespace when it is not provided.

My fix: I think it's better to be explicit about which namespace we want to use, so I added getting namespace logic in handlers and removed default namespace logic in workflow client.
In addition to that, now I did a little more sanity checking depending on multi user mode or not:

  1. When in multi user mode, report error when namespace is not provided.
  2. Otherwise, use pod namespace as default
  3. Workflow client no longer default to pod namespace when provided namespace is empty.
  4. ReportWorkflow endpoint is a little different, because it should be called by persistence agent who gets data from argo workflows. I changed it to always expect namespace in the workflow.

Verification: I built an image and patched to kfp 0.2.2 standalone deployment, verified now pods can be retried successfully.

/assign @jingzhang36
/assign @IronPan


This change is Reviewable

@k8s-ci-robot k8s-ci-robot removed the lgtm label Feb 7, 2020
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Feb 7, 2020
@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 7, 2020

/retest

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 7, 2020

/retest

1 similar comment
@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 7, 2020

/retest

@@ -599,6 +607,9 @@ func (r *ResourceManager) ReportWorkflowResource(workflow *util.Workflow) error
}
runId := workflow.ObjectMeta.Labels[util.LabelKeyWorkflowRunId]
jobId := workflow.ScheduledWorkflowUUIDAsStringOrEmpty()
if len(workflow.Namespace) == 0 {
return util.NewInvalidInputError("Workflow missing namespace")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?
Are we sure this would work for "default" Kubernetes namespace?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. But I think it is clearer, because I think persistence agent is the only instance calling this. And it gets data using workflow client, so it will always carry namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified retry works. That includes coverage for persistence agent. Also, e2e tests pass.

@IronPan
Copy link
Member

IronPan commented Feb 8, 2020

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: IronPan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 8, 2020

/retest

2 similar comments
@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 8, 2020

/retest

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 8, 2020

/retest

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 8, 2020

/retest
Cloud build flakiness

@Bobgy
Copy link
Contributor Author

Bobgy commented Feb 8, 2020

strange, a lot of samples failed
/retest

@k8s-ci-robot k8s-ci-robot merged commit dd96b46 into kubeflow:master Feb 8, 2020
@Bobgy Bobgy deleted the be_fix_retry_failure branch March 30, 2020 03:19
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this pull request Dec 9, 2020
…ubeflow#3009)

* Use pod namespace as default when not in multi user mode

* Report error when POD_NAMESPACE not set

* Change report workflow endpoint to always require namespace

* Fix integration tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to retry run
4 participants