Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add hotfix for uploading kubeflow pipelines #40

Merged
merged 3 commits into from
Apr 25, 2023

Conversation

PhilippeMoussalli
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli commented Apr 25, 2023

Kubeflow requires unique pipeline names when uploading pipeline.

Previous implementations relied on deleting the pipeline using the sdk by finding if it exits using either the client.list_pipelines() and filtering on the pipeline name or the client.get_pipeline_id(pipeline_name) and then finding existing versions from the the id, deleting them before deleting the pipeline.

Recent issues started appearing when pipelines were compiled with the v2 SDK. Both functions mentioned above stopped working. Error message:

Failed to list pipelines with context \u0026{0xc0001ee9a0}, options \u0026{100 0xc0012d1880}: InternalServerError: Failed to execute SQL for listing pipelines: Error 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(PARTITION BY PipelineId ORDER BY CreatedAtInSec DESC) rn FROM pipeline_versions' at line 1

The issue is not related to connection to the SQL server where the pipelines servers are used since methods like client.get_experiments() seem to work just fine. Also using the kfp v2 sdk seems to be also able to return the pipeline versions. I suspect something changed internally in the database when we were testing things the v2 sdk but it's not clear. The issue should be resolved when starting from a clean slate (new kfp cluster and database).

For now, I implemented a fix that requires passing the pipeline_id (found in the UI) in order to delete existing pipelines before uploading. I was able to circumvent the api calls that were causing the issue. Not an ideal permanent solution but a temporary workaround.

@RobbeSneyders
Copy link
Member

Would adding a timestamp to the pipeline name not be an easier solution?

@PhilippeMoussalli
Copy link
Contributor Author

Would adding a timestamp to the pipeline name not be an easier solution?

Mainly to avoid bloating the UI with too many submitted pipelines. Especially when debugging

@RobbeSneyders
Copy link
Member

Is redeploying the cluster a lot of work? Then we can rename it to Fondant as well 😅

@PhilippeMoussalli
Copy link
Contributor Author

Is redeploying the cluster a lot of work? Then we can rename it to Fondant as well sweat_smile

Not really, just need to make sure no one is working on it :)

Copy link
Contributor

@NielsRogge NielsRogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary fix is fine for me, but it's definitely weird as I'm still using the v1 SDK to compile pipelines.

Looks like existing_pipelines = client.list_pipelines(page_size=100).pipelines triggers the error, so for some reason we're not able to programmatically get the existing pipelines. Isn't this an authorization issue?

@PhilippeMoussalli
Copy link
Contributor Author

Temporary fix is fine for me, but it's definitely weird as I'm still using the v1 SDK to compile pipelines.

Looks like existing_pipelines = client.list_pipelines(page_size=100).pipelines triggers the error, so for some reason we're not able to programmatically get the existing pipelines. Isn't this an authorization issue?

I think we're still sticking to the v for the meanwhile. We will not update the SDK. I don't think it's an authorization issue since other API calls for listing experiments and runs work fine. I think something has gone wrong internally either in the internal registry of kubeflow or the database. Redeploying it again might be faster than attempting to fix the the issue.

@PhilippeMoussalli PhilippeMoussalli merged commit 18ff584 into main Apr 25, 2023
@RobbeSneyders RobbeSneyders deleted the hotfix/pipeline-upload branch May 4, 2023 07:34
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
Kubeflow requires unique pipeline names when uploading pipeline. 

Previous implementations relied on deleting the pipeline using the sdk
by finding if it exits using either the `client.list_pipelines()` and
filtering on the pipeline name or the
`client.get_pipeline_id(pipeline_name)` and then finding existing
versions from the the id, deleting them before deleting the pipeline.

Recent issues started appearing when pipelines were compiled with the v2
SDK. Both functions mentioned above stopped working. Error message:


`Failed to list pipelines with context \u0026{0xc0001ee9a0}, options
\u0026{100 0xc0012d1880}: InternalServerError: Failed to execute SQL for
listing pipelines: Error 1064: You have an error in your SQL syntax;
check the manual that corresponds to your MySQL server version for the
right syntax to use near '(PARTITION BY PipelineId ORDER BY
CreatedAtInSec DESC) rn FROM pipeline_versions' at line 1`

The issue is not related to connection to the SQL server where the
pipelines servers are used since methods like `client.get_experiments()`
seem to work just fine. Also using the kfp v2 sdk seems to be also able
to return the pipeline versions. I suspect something changed internally
in the database when we were testing things the v2 sdk but it's not
clear. The issue should be resolved when starting from a clean slate
(new kfp cluster and database).

For now, I implemented a fix that requires passing the `pipeline_id`
(found in the UI) in order to delete existing pipelines before
uploading. I was able to circumvent the api calls that were causing the
issue. Not an ideal permanent solution but a temporary workaround.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants