-
Notifications
You must be signed in to change notification settings - Fork 6.6k
fix: added cli functionality to dataproc quickstart example #2734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
be445cf
f9de7dd
bd579fe
b0b8299
3c3da27
1a516fd
c78dbb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,25 +15,24 @@ | |
# limitations under the License. | ||
|
||
# [START dataproc_quickstart] | ||
"""This quickstart sample walks a user through creating a Cloud Dataproc | ||
cluster, submitting a PySpark job from Google Cloud Storage to the | ||
cluster, reading the output of the job and deleting the cluster, all | ||
using the Python client library. | ||
|
||
Usage: | ||
python quickstart.py --project_id <PROJECT_ID> --region <REGION> \ | ||
--cluster_name <CLUSTER_NAME> --job_file_path <GCS_JOB_FILE_PATH> | ||
""" | ||
|
||
import argparse | ||
import time | ||
|
||
from google.cloud import dataproc_v1 as dataproc | ||
from google.cloud import storage | ||
|
||
|
||
def quickstart(project_id, region, cluster_name, job_file_path): | ||
"""This quickstart sample walks a user through creating a Cloud Dataproc | ||
cluster, submitting a PySpark job from Google Cloud Storage to the | ||
cluster, reading the output of the job and deleting the cluster, all | ||
using the Python client library. | ||
|
||
Args: | ||
project_id (string): Project to use for creating resources. | ||
region (string): Region where the resources should live. | ||
cluster_name (string): Name to use for creating a cluster. | ||
job_file_path (string): Job in GCS to execute against the cluster. | ||
""" | ||
|
||
# Create the cluster client. | ||
cluster_client = dataproc.ClusterControllerClient(client_options={ | ||
'api_endpoint': '{}-dataproc.googleapis.com:443'.format(region) | ||
|
@@ -125,4 +124,23 @@ def quickstart(project_id, region, cluster_name, job_file_path): | |
operation.result() | ||
|
||
print('Cluster {} successfully deleted.'.format(cluster_name)) | ||
# [END dataproc_quickstart] | ||
|
||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't tested at all - if we are going to add CLI functionality, we should have some tests verifying that it still works. Alternatively, I'd rather avoid CLI and instead give the user somewhere at the top of the code to set the variables instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can add a test for this, but the motivation for CLI is to make this a runnable tool as-is without needing to modify code. |
||
description=__doc__, | ||
formatter_class=argparse.RawDescriptionHelpFormatter, | ||
) | ||
parser.add_argument('--project_id', type=str, | ||
bradmiro marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
help='Project to use for creating resources.') | ||
parser.add_argument('--region', type=str, | ||
help='Region where the resources should live.') | ||
parser.add_argument('--cluster_name', type=str, | ||
help='Name to use for creating a cluster') | ||
parser.add_argument('--job_file_path', type=str, | ||
help='Job in GCS to execute against the cluster.') | ||
|
||
args = parser.parse_args() | ||
quickstart(args.project_id, args.region, | ||
args.cluster_name, args.job_file_path) | ||
# [END dataproc_quickstart] |
Uh oh!
There was an error while loading. Please reload this page.