-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataflow SDK to support launch beam python code or template #833
Conversation
/lgtm |
# limitations under the License. | ||
import re | ||
|
||
def is_gcs_path(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is in a storage directory, maybe we should add verification.
There are name requirements for GCS path: https://cloud.google.com/storage/docs/naming
For example, the following two requirements are important and easy to check:
- Bucket names must start and end with a number or letter.
- Bucket names must contain 3 to 63 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I purposely skip those checks in a lot of places because we are passing user provided data to GCP APIs and the APIs will validate the it and report bad request error to the user and we will show it in the log. The validation will be a dup work from the server code did and we have to maintain it if any rules are changed in the future. The check I have it here is for a logic in my code which differs from gcp path and local path.
This looks good to me once @gaoning777 are addressed. |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vicaire The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test kubeflow-pipeline-e2e-test |
/retest |
2 similar comments
/retest |
/retest |
…#833) * initial files for dataflow commands * dataflow commands * add launch python command * Support launch_python command * Use display API and default outputs to /tmp/output folder
This change is