Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error occurs when run TFX example in local kubeflow cluster #703

Closed
zoux86 opened this issue Jan 18, 2019 · 10 comments
Closed

An error occurs when run TFX example in local kubeflow cluster #703

zoux86 opened this issue Jan 18, 2019 · 10 comments

Comments

@zoux86
Copy link

zoux86 commented Jan 18, 2019

Because i am Unable to access Google, so i want to run pipelines in my local kubeflow examples。

I followed the guide to run pipeline examples.
https://www.kubeflow.org/docs/guides/pipelines/pipelines-quickstart/

and i successed in running the basic pipeline。

image

but when i run the ML pipeline example(KubeFlow pipeline using TFX OSS components), there was an error.

Because i want to run this example in my local environment, so i replace the gs with my local directories like this(i download these file in advance):
image

The error is here, it seems that it cannot find these local file.

image

so i run this code:
image

it works.

So i think the file is in the right place, i should do something else to run this example.
i have a tentative suggestion that i should replace the "GcsUri", but i don't know what to be replaced with.
image
SO, i hope someone can help me , i will be very appreciated !!!

in short, i have two questions:

(1) can the ML pipeline examples run in local kubeflow environment (without GCP).
(2) if the above answer is yes. how to modify the code to use local file.

@gaoning777
Copy link
Contributor

ML pipeline is a cross-platform product and can be surely deployed and run in local environments. However, the file not being detected is due to the fact that each component is a containerized operator. In other words, the code can only fetch the files inside the container.
There are two options:

  1. copy your file to some place(your own cloud, file system) that the kubernetes container has access to.
    or
  2. build a new image containing your local file.

@hongye-sun
Copy link
Contributor

It's a known issue that most of current ML components are assuming the input paths are gcs paths. We are working on a solution to pass artifact to container through cloud provider agnostic way.

Though it's not tested yet, the container code should be able to work with local path if the file is accessable in the container. Other than Ning's suggestion, if you just want to make it work locally. You might try using hostPath volume to make your local files visible to the container. You can use add_volume and add_volumne_mount add hostPath volume to the ContainerOp and change the path to the mounted path.

@zoux86
Copy link
Author

zoux86 commented Jan 20, 2019

@gaoning777 i followed you advice, and i build a new image. it works, but it cannot solve the problem totally. Because the second container can't find the first container's output.
so i want to mount volume in Container op.

In this question:

#477

i found that you gave an example how to mount a volume with both add_volume and add_volume_mount. but the page is not exist now. can you give the example again if you can find the example.

@zoux86
Copy link
Author

zoux86 commented Jan 20, 2019

@hongye-sun
hi, i'am very appreciated for your advice, and i followed your advice and used add_volume and add_volumne_mount, but it didn't work.
so i run a example to mount volume in Container op to test how to use add_volume and add_volumne_mount. i hope you can give me some advice again if you know.

Here is my steps:

first, I created a PersistentVolume using:
image

and it successed
image

then, I run an example, the code is here:
image

the result is here:
image
image

it seems to i didn't mount volume in Container op successfully, and the status of the pv(tfx-pv) is always Available.
image
(PS: i created '/nfs-data/tfx-pv/train.csv' in advance)

i have no idea now how to solve it , i hope you can give some advice, Thanks!!!

@jinchihe
Copy link
Member

I made that successfully before. I create PV/PVC firstly, and then edit the sample taxi-cab-classification-pipeline.py to attach the PVC. Copied the related data such as train.csv to the local storage before running the TFX sample.

@zoux86
Copy link
Author

zoux86 commented Jan 21, 2019

@jinchihe Thanks for your comment.
I followed your steps, but i didn't success. by the way i only created the pv. Should i create both pv and pvc?
i'm a new learner for k8s, i will be very appreciated if you can describe how to create pvc and attach the pvc in detail. Thanks!

@jinchihe
Copy link
Member

@zoux86 Yes, I think you should create both PV and PVC manually. There is no specific configuration in the pvc defination file.

# kubectl describe pvc pipeline-pvc -n kubeflow 
Name:          pipeline-pvc
Namespace:     kubeflow
StorageClass:  
Status:        Bound
Volume:        pipeline-pv
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed=yes
               pv.kubernetes.io/bound-by-controller=yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      10Gi
Access Modes:  RWX
Events:        <none>

@zoux86
Copy link
Author

zoux86 commented Jan 22, 2019

@jinchihe Thank you very much. i run the example succeeded finally.
i created pvc. And i found that in my environment, i should use NFS volume rather than the HostPath.
i will close the question.

@zoux86 zoux86 closed this as completed Jan 22, 2019
@gaoning777
Copy link
Contributor

@jinchihe thanks for explaining the volumes mount steps. We will add instructions on how to mount volumes and share among components.

@jinchihe
Copy link
Member

@gaoning777 I would like to discuss this in the #721, thanks.

Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023
* Revamp how Tekton pipelines to run notebooks work.

Notebook tests should build a docker image to run the notebook in.

* kubeflow/testing#613 currently the way we run notebook tests is
  by firing off a K8s job on the KF cluster which runs the notebook.

  * The K8s job uses init containers to pull in source code and install
    dependencies like papermill.

  * This is a bit brittle.

* To fix this we will instead use Tekton to build a docker image that
  takes the notebook image and then adds the notebook code to it.

  * Dockerfile.notebook_runner dockerfile to build the test image.

The pipeline to run the notebook consists of two tasks

  1. A Tekton Task to build a docker image to run the notebook in

  1. A tekton task that fires off a K8s job to run the notebook on the Kubeflow cluster.

Here's a list of changes to make this work

* tekton_client should provide methods to upload artifacts but not parse
  junits

* Add a tekton_client method to construct the full image URL based on
  the digest returned from kaniko

* Copy over the code for running the notebook tests from kubeflow/examples
and start modifying it.

* Create a simple CLI to wait for nomos to sync resources to the cluster
  * This is used in some syntactic sugar make rules to aid the dev-test loop

The mnist test isn't completing successfully yet because GoogleCloudPlatform/kubeflow-distribution#61 means the KF
deployments don't have proper GSA's to write to GCS.

Related to: kubeflow#613

* tekton_client.py can't use format strings yet because we are still running under python2.

* Remove f-style strings.

* Fix typo.

* Address PR comments.

* * copy-buckets should not abort on error as this prevents artifacts
  from being copied and thus the results from showing up in testgrid
  see kubeflow/testing#703
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
…r. (kubeflow#703) (kubeflow#706)

* Upgrade Tekton to 0.27 for pipelineloop controller. (kubeflow#703)

* Upgrade Tekton to 0.27 for pipelineloop controller. (kubeflow#703)

* Update pipelinelooprun.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants