Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting first pipeline version of a pipeline and attempting pipeline run crashes ml-pipeline #4389

Closed
lyschoening opened this issue Aug 19, 2020 · 0 comments · Fixed by #4439
Labels
kind/bug status/triaged Whether the issue has been explicitly triaged

Comments

@lyschoening
Copy link
Contributor

lyschoening commented Aug 19, 2020

What steps did you take:

There may be multiple bugs here.

  1. I ran a pipeline on a specific pipeline version where there were multiple versions and the first pipeline version name had the same name as the pipeline, using the SDK with a code snippet looking roughly like this:
client = kfp.Client()
experiment = client.create_experiment(EXPERIMENT_NAME)
pipeline = client.get_pipeline(client.get_pipeline_id(PIPELINE_NAME))
client.run_pipeline(
            experiment_id=experiment.id,
            job_name='test run',
            params={},
            pipeline_id=pipeline.id,
            version_id=pipeline.default_version.id)
  1. According to the pipeline UI the run was done with the correct specific pipeline version, however the Argo workflow used the template of the first pipeline version.
  2. To debug, I deleted the first version of the pipeline via the UI.
  3. I attempted to run the pipeline again, using the same code as before.

What happened:

When calling client.run_pipeline() the ml-pipeline pod appears to crash instantly. This happens every time. Via the SDK I get the following error message:

ApiException: (503)
Reason: Service Unavailable
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'ea84b152-4123-429f-aa15-b5d8adff5b8d', 'Date': 'Wed, 19 Aug 2020 07:15:10 GMT', 'Transfer-Encoding': 'chunked'})
HTTP response body: Error: 'EOF'
Trying to reach: 'http://10.24.0.35:8888/apis/v1beta1/runs'

The pod fails with the following error:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x138 pc=0x11b8766]

goroutine 110 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000907cf8, 0xc000553740, 0x0, 0xc000553740)
	backend/src/common/util/workflow.go:66 +0x96
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00024c540, 0xc000914b60, 0xc0008e64e0, 0xc000326140, 0x2)
	backend/src/apiserver/resource/resource_manager.go:269 +0x227
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc000904010, 0x1941ba0, 0xc0008e64e0, 0xc0008e6510, 0xc000904010, 0x1, 0x1)
	backend/src/apiserver/server/run_server.go:43 +0xc5
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1941ba0, 0xc0008e64e0, 0x1696ce0, 0xc0008e6510, 0x1, 0x0, 0xc000072700, 0x0)
	bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1269 +0x86
main.apiServerInterceptor(0x1941ba0, 0xc0008e64e0, 0x1696ce0, 0xc0008e6510, 0xc0002784c0, 0xc000278500, 0x15a5ec0, 0x25ad330, 0x171cca0, 0xc000188e00)
	backend/src/apiserver/interceptor.go:30 +0xf4
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x16c95e0, 0xc000904010, 0x1941ba0, 0xc0008e64e0, 0xc000185810, 0x18060e0, 0x0, 0x0, 0xc0008b4140, 0x91)
	bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1271 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00090c000, 0x194ea40, 0xc00090c180, 0xc000188e00, 0xc0008cc270, 0x2527f80, 0x0, 0x0, 0x0)
	external/org_golang_google_grpc/server.go:966 +0x4a2
google.golang.org/grpc.(*Server).handleStream(0xc00090c000, 0x194ea40, 0xc00090c180, 0xc000188e00, 0x0)
	external/org_golang_google_grpc/server.go:1245 +0xd61
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc00046e070, 0xc00090c000, 0x194ea40, 0xc00090c180, 0xc000188e00)
	external/org_golang_google_grpc/server.go:685 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
	external/org_golang_google_grpc/server.go:683 +0xa1

What did you expect to happen:

  1. Running a pipeline should use the pipeline definition of the default pipeline version.
  2. Running a pipeline with a specific pipeline version should use the pipeline definition of that version.
  3. Deleting an older pipeline version should not cause future pipeline runs to crash ml-pipeline.

Environment:

How did you deploy Kubeflow Pipelines (KFP)?

KFP version: 1.0.0

KFP SDK version: master branch

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

@Ark-kun Ark-kun added the status/triaged Whether the issue has been explicitly triaged label Aug 19, 2020
ekesken added a commit to ekesken/pipelines that referenced this issue Aug 31, 2020
Fixes kubeflow#4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

Scenario described in kubeflow#4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue kubeflow#4389
still need to be addressed.
ekesken added a commit to ekesken/pipelines that referenced this issue Sep 1, 2020
Fixes kubeflow#4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

Scenario described in kubeflow#4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue kubeflow#4389
still need to be addressed.
ekesken added a commit to ekesken/pipelines that referenced this issue Sep 2, 2020
Fixes kubeflow#4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

It was same in CreateJob calls.

Scenario described in kubeflow#4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue kubeflow#4389
still need to be addressed.
k8s-ci-robot pushed a commit that referenced this issue Sep 2, 2020
…#4389 (#4439)

Fixes #4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

It was same in CreateJob calls.

Scenario described in #4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue #4389
still need to be addressed.
Bobgy pushed a commit to Bobgy/pipelines that referenced this issue Sep 4, 2020
…kubeflow#4389 (kubeflow#4439)

Fixes kubeflow#4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

It was same in CreateJob calls.

Scenario described in kubeflow#4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue kubeflow#4389
still need to be addressed.
Bobgy pushed a commit that referenced this issue Sep 4, 2020
…#4389 (#4439)

Fixes #4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

It was same in CreateJob calls.

Scenario described in #4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue #4389
still need to be addressed.
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this issue Dec 9, 2020
…kubeflow#4389 (kubeflow#4439)

Fixes kubeflow#4389 (partially).

When the workflow manifest file is deleted from s3 due to the retention policy, we were
getting this segmentation fault in the next createRun attempt for that pipeline:

```
I0831 06:36:53.916141       1 interceptor.go:29] /api.RunService/CreateRun handler starting
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x156e140]

goroutine 183 [running]:
github.com/kubeflow/pipelines/backend/src/common/util.(*Workflow).VerifyParameters(0xc000010610, 0xc00036b6b0, 0x0, 0xc00036b6b0)
        backend/src/common/util/workflow.go:66 +0x90
github.com/kubeflow/pipelines/backend/src/apiserver/resource.(*ResourceManager).CreateRun(0xc00088b5e0, 0xc00088b880, 0xc0009c3c50, 0xc000010450, 0x1)
        backend/src/apiserver/resource/resource_manager.go:326 +0x27c
github.com/kubeflow/pipelines/backend/src/apiserver/server.(*RunServer).CreateRun(0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c3c80, 0xc0000b8718, 0x2ddc6e9, 0xc00014e070)
        backend/src/apiserver/server/run_server.go:43 +0xce
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler.func1(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc0008cbb40, 0x1, 0x1, 0x7f9e4d6466d0)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1399 +0x86
main.apiServerInterceptor(0x1e7bc20, 0xc0009c3c50, 0x1aa80e0, 0xc0009c3c80, 0xc000778ca0, 0xc000778cc0, 0xc0004dcbd0, 0x4e7bba, 0x1a98e00, 0xc0009c3c50)
        backend/src/apiserver/interceptor.go:30 +0xf8
github.com/kubeflow/pipelines/backend/api/go_client._RunService_CreateRun_Handler(0x1ac4a20, 0xc0000b8718, 0x1e7bc20, 0xc0009c3c50, 0xc0009c6e40, 0x1c6bd70, 0x1e7bc20, 0xc0009c3c50, 0xc0004321c0, 0x66)
        bazel-out/k8-opt/bin/backend/api/linux_amd64_stripped/go_client_go_proto%/github.com/kubeflow/pipelines/backend/api/go_client/run.pb.go:1401 +0x158
google.golang.org/grpc.(*Server).processUnaryRPC(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0xc00071ab70, 0x2e14040, 0x0, 0x0, 0x0)
        external/org_golang_google_grpc/server.go:995 +0x466
google.golang.org/grpc.(*Server).handleStream(0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700, 0x0)
        external/org_golang_google_grpc/server.go:1275 +0xda6
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0004e9084, 0xc00064eb00, 0x1ea2840, 0xc00061cd80, 0xc00046c700)
        external/org_golang_google_grpc/server.go:710 +0x9f
created by google.golang.org/grpc.(*Server).serveStreams.func1
        external/org_golang_google_grpc/server.go:708 +0xa1
```

It was same in CreateJob calls.

Scenario described in kubeflow#4389 also seems causing the same issue.

With this PR, we aim not to have the segmentation fault at least, because in
our case it's expected that manifest files will be deleted after some time due
to the retention policy.

Other problems about right pipeline version picking described in issue kubeflow#4389
still need to be addressed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug status/triaged Whether the issue has been explicitly triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants