Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] Archiving experiment -> 524 error #9821

Closed
gkcalat opened this issue Aug 5, 2023 · 3 comments
Closed

[backend] Archiving experiment -> 524 error #9821

gkcalat opened this issue Aug 5, 2023 · 3 comments
Assignees
Labels
area/backend kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@gkcalat
Copy link
Member

gkcalat commented Aug 5, 2023

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    kfp-standalone-1 cluster
  • KFP version:
    2.0.0
  • KFP SDK version:
    2.0.1

Steps to reproduce

  1. Run the functional test with enabled experiment archiving call.
host is https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com
Creating experiment
Experiment details: https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com/#/experiments/details/40b49e68-5808-47e4-a64a-12884b697890
Experiment with id 40b49e68-5808-47e4-a64a-12884b697890 created
Creating Run from Pipeline Func
Experiment details: https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com/#/experiments/details/40b49e68-5808-47e4-a64a-12884b697890
Run details: https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com/#/runs/details/7f4a0f90-b9e8-44d9-ab30-aa0a3d2bf9d3
Run 7f4a0f90-b9e8-44d9-ab30-aa0a3d2bf9d3 created
Run succeeded in 63 seconds
Archiving experiment
Traceback (most recent call last):
  File "/home/ablai_kubeflow_org/src/pipelines/./test/kfp-functional-test/run_kfp_functional_test.py", line 92, in <module>
    main()
  File "/home/ablai_kubeflow_org/src/pipelines/./test/kfp-functional-test/run_kfp_functional_test.py", line 87, in main
    client.archive_experiment(experiment_id)
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp/client/client.py", line 623, in archive_experiment
    return self._experiment_api.archive_experiment(
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/api/experiment_service_api.py", line 65, in archive_experiment
    return self.archive_experiment_with_http_info(experiment_id, **kwargs)  # noqa: E501
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/api/experiment_service_api.py", line 145, in archive_experiment_with_http_info
    return self.api_client.call_api(
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 364, in call_api
    return self.__call_api(resource_path, method,
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 188, in __call_api
    raise e
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 181, in __call_api
    response_data = self.request(
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/api_client.py", line 407, in request
    return self.rest_client.POST(url,
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/rest.py", line 265, in POST
    return self.request("POST", url,
  File "/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp_server_api/rest.py", line 224, in request
    raise ApiException(http_resp=r)
kfp_server_api.exceptions.ApiException: (524)
Reason: status code 524
HTTP response headers: HTTPHeaderDict({'Content-Length': '1440', 'Content-Type': 'text/html; charset=utf-8', 'Date': 'Sat, 05 Aug 2023 00:28:11 GMT', 'Vary': 'Origin', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-Xss-Protection': '0', 'Set-Cookie': 'S=cloud_datalab_tunnel=I7lCyXLQueelexnHs44ORZeTaTV7NT4yU3g92AiQU6Q; Path=/; Max-Age=3600'})
HTTP response body: 
<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 524 ()!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/logos/errorpage/error_logo-150x54.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/logos/errorpage/error_logo-150x54-2x.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/logos/errorpage/error_logo-150x54-2x.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/logos/errorpage/error_logo-150x54-2x.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
  </style>
  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>
  <p><b>524.</b> <ins>That’s an error.</ins>
  <p>  <ins>That’s all we know.</ins>
  1. Try archiving the experiment via UI:
Failed to archive experiment: 40b49e68-5808-47e4-a64a-12884b697890 with error: "{"error":"Failed to archive experiment: InternalServerError: Failed to archive experiment 40b49e68-5808-47e4-a64a-12884b697890. error: 'Error 1205: Lock wait timeout exceeded; try restarting transaction': Error 1205: Lock wait timeout exceeded; try restarting transaction","code":13,"message":"Failed to archive experiment: InternalServerError: Failed to archive experiment 40b49e68-5808-47e4-a64a-12884b697890. error: 'Error 1205: Lock wait timeout exceeded; try restarting transaction': Error 1205: Lock wait timeout exceeded; try restarting transaction","details":[{"@type":"type.googleapis.com/google.rpc.Status","code":13,"message":"Internal Server Error"}]}"

This is related to #9770 and #6845.

Possible solution:

We may need to release a patch version with #9680 and #9730 and upgrade the kfp-standalone-1 cluster.

Confirmed this after building api-server image from master branch and updating the manifests of the deployment:

githubusercontent.com/kubeflow/testing/master/test-infra/kfp/endpoint)"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    62  100    62    0     0    529      0 --:--:-- --:--:-- --:--:--   529
/home/ablai_kubeflow_org/src/pipelines/.venv/lib/python3.9/site-packages/kfp/client/client.py:158: FutureWarning: This client only works with Kubeflow Pipeline v2.0.0-beta.2 and later versions.
  warnings.warn(
host is https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com
Creating experiment
Experiment details: https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com/#/experiments/details/553570b1-6dbf-4b14-a166-660c3cbb00ba
Experiment with id 553570b1-6dbf-4b14-a166-660c3cbb00ba created
Creating Run from Pipeline Func
Experiment details: https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com/#/experiments/details/553570b1-6dbf-4b14-a166-660c3cbb00ba
Run details: https://4e18c21c9d33d20f-dot-datalab-vm-staging.googleusercontent.com/#/runs/details/59119a83-8119-40b8-b67a-00725bfe9e03
Run 59119a83-8119-40b8-b67a-00725bfe9e03 created
Run succeeded in 31 seconds
Archiving experiment
Archived experiment with id 553570b1-6dbf-4b14-a166-660c3cbb00ba

Impacted by this bug? Give it a 👍.

@difince
Copy link
Member

difince commented Sep 5, 2023

@gkcalat Is this issue still relevant after fixing this by PR?

Copy link

github-actions bot commented Dec 5, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Dec 5, 2023
Copy link

github-actions bot commented Mar 4, 2024

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
Status: Closed
Development

No branches or pull requests

3 participants