-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
example TFX taxi support on-prem cluster #721
Comments
@gaoning777 Current samples of Pipeline are strong reply on the GCP, such as the import E2E example TFX, I think we should enhance or create new one to show how to run Kubeflow Pipeline for on-prem clusters while using PV/PVC to storage data, I almost passed the case on on-prem clusters by updated the TFX Python classes files manually. What do you think for this? Update old one or create new one? Any plan? Thanks a lot! |
@IronPan I saw you agree to use PersistentVolume to decouple actual storage in #708 , in fact I executed a TFX taix sample by using PersistentVolume successfully, what do you think to create new one TFX taxi sample for on-prem cluster that uses PersistentVolume? If you agree, I will create this (including readme file). Thanks. |
@jinchihe @gaoning777 @IronPan I think updating existing document is good enough, perhaps we can update Pipeline samples one by one to include how to run them in on-prem cluster but not GKE. The document can be updated as follows, like: GKExxxx On-premxxxx |
@gyliu513 Thanks a lot. Agree with you, but I think at least we need to create new taxi-cab-classification-pipeline.py, may be better to use new name, such as taxi-cab-classification-pipeline_on_prem.py. @gaoning777 , WDYT? Thanks |
It might make sense to create a new example if the code is sufficiently different from the current example. However, can we try to write the example in a way that works for both on prem and Clouds? The fact that the current example is GCP only is a bug and something that should be fixed. Here's a related issue #677 for how to create better patterns for working with TFJob and make it cloud agnostic. Related to this: I'm wondering if there's a better K8s pattern for writing this example Currently the pattern is
Would it be better to instead orchestrate the steps more directly using K8s resources? e.g. we can define a K8s job for each of these steps. We can now just use pipelines to run some python code to create the K8s job and wait for it. There's some discussion here about how we could make this light weight. |
@swiftdiaries @johnugeorge @ramdootp for the onprem CUJ work. We have an onprem CUJ in the works (very shortly) where we had to tackle the same problem. Hence we could solve this issue |
We have a set of onprem / local specific components in the works as part of the onprem CUJ. As for the TFX example, I think having separate example code is better. The onprem wouldn't be using gcp secrets so having another tfx_taxi_cab_onprem.py makes sense. To attach volumes to containerops, we did Agree that there should be a better way to handle k8s resources, if we could wrap this inside the dsl it'd be really cool. It becomes messy when you're attaching 2 or more volumes to one op. We're addressing issues across all components in the onprem CUJ.I'll try to upload the slides today. It's very much a WIP and we'd would love to have some feedback around your experiences |
Thanks @jlewi for keeping up with the issue. |
I think the team has been working on building new features and we should allocate some amount of engineering efforts to make this a cross-platform product. |
@gaoning777, I am assigning the DSL sample issues to illustrate features to you. Please reassign to the most appropriate person as needed. |
Currently, all examples https://github.com/kubeflow/pipelines/tree/master/samples for pipeline are using GCP and we cannot run those examples when using on-prem clusters, it is better to add some examples to enable end user can test cases with local storage.
FYI @jinchihe
The text was updated successfully, but these errors were encountered: