-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create README.md #909
Create README.md #909
Conversation
TFX OSS instructions for running Taxi example.
/lgtm |
Install TFX and Kubeflow Pipelines SDK | ||
``` | ||
!pip3 install https://storage.googleapis.com/ml-pipeline/tfx/tfx-0.12.0rc0-py2.py3-none-any.whl | ||
!pip3 install https://storage.googleapis.com/ml-pipeline/release/0.1.10/kfp.tar.gz --upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JFYI: The latest version is 0.1.11 and in 0.1.12 will have improved experiment creation.
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
``` | ||
conda create -n tfx-kfp pip python=3.5.3 | ||
``` | ||
then activate the environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add how to activate:
conda activate tfx-kfp
- GCS storage bucket name (replace "my-bucket") | ||
- GCP project ID (replace "my-gcp-project") | ||
- Make sure the path to the taxi_utils.py is correct | ||
- Set the limit on the BigQuery query. The original dataset has 100M rows, which can take time to process. Set it to 20000 to run an sample test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, I changed it to use RAND() < 0.01. So we can say:
"Change the sampling rate, or alternately, replace it with a LIMIT clause to process a smaller dataset. We recommend using at least 20000 rows in your sample."
or something like this.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
TFX OSS instructions for running Taxi example.
TFX OSS instructions for running Taxi example.
This change is