Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert dataproc-workflow REST API (POST) request into equivalent python (dataproc_v1) sdk code. #11086

Open
Rstar1998 opened this issue Jul 19, 2022 · 1 comment
Labels
api: dataproc Issues related to the Dataproc API. type: question Request for information or clarification. Not an issue.

Comments

@Rstar1998
Copy link

I want to automate creation of workflows in dataproc. I manually created a workflow and got its equivalent rest api request (POST) from the GCP console UI. I want its equivalent thing via dataproc python sdk. I saw the documentation , but it was very complex to understand and the examples with very few and not that complicated. Can someone help me translate following api request to python sdk equivalent ?

    "method": "POST",
    "body": {
        "id": "load1",
        "name": "",
        "labels": {},
        "placement": {
            "managedCluster": {
                "clusterName": "cluster-1",
                "config": {
                    "configBucket": "bucket1",
                    "gceClusterConfig": {
                        "serviceAccountScopes": [
                            "https://www.googleapis.com/auth/cloud-platform"
                        ],
                        "networkUri": "",
                        "subnetworkUri": "",
                        "internalIpOnly": false,
                        "zoneUri": "",
                        "metadata": {},
                        "tags": [],
                        "shieldedInstanceConfig": {
                            "enableSecureBoot": false,
                            "enableVtpm": false,
                            "enableIntegrityMonitoring": false
                        }
                    },
                    "masterConfig": {
                        "numInstances": 1,
                        "machineTypeUri": "n1-standard-4",
                        "diskConfig": {
                            "bootDiskType": "pd-standard",
                            "bootDiskSizeGb": "150",
                            "numLocalSsds": 0,
                            "localSsdInterface": "SCSI"
                        },
                        "minCpuPlatform": "",
                        "imageUri": ""
                    },
                    "softwareConfig": {
                        "imageVersion": "2.0-ubuntu18",
                        "properties": {
                            "dataproc:dataproc.allow.zero.workers": "true"
                        },
                        "optionalComponents": []
                    },
                    "initializationActions": []
                },
                "labels": {}
            }
        },
        "jobs": [
            {
                "pysparkJob": {
                    "mainPythonFileUri": "gs://temp.py",
                    "pythonFileUris": [],
                    "jarFileUris": [],
                    "fileUris": [],
                    "archiveUris": [],
                    "properties": {},
                    "args": [
                        "arg1"
                    ]
                },
                "stepId": "start_job",
                "labels": {},
                "prerequisiteStepIds": []
            },
            {
                "pysparkJob": {
                    "mainPythonFileUri": "gs://temp1.py",
                    "pythonFileUris": [],
                    "jarFileUris": [
                        "gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar"
                    ],
                    "fileUris": [],
                    "archiveUris": [],
                    "properties": {},
                    "args": [
                        "arg1"
                    ]
                },
                "stepId": "tb1",
                "labels": {},
                "prerequisiteStepIds": [
                    "start_job"
                ]
            },
            {
                "pysparkJob": {
                    "mainPythonFileUri": "gs://temp1.py",
                    "pythonFileUris": [],
                    "jarFileUris": [
                        "gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar"
                    ],
                    "fileUris": [],
                    "archiveUris": [],
                    "properties": {},
                    "args": [
                        "arg1"
                    ]
                },
                "stepId": "tb2",
                "labels": {},
                "prerequisiteStepIds": [
                    "start_job"
                ]
            },
            {
                "pysparkJob": {
                    "mainPythonFileUri": "gs://temp1.py",
                    "pythonFileUris": [],
                    "jarFileUris": [
                        "gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar"
                    ],
                    "fileUris": [],
                    "archiveUris": [],
                    "properties": {},
                    "args": [
                        "arg1"
                       
                    ]
                },
                "stepId": "tb3",
                "labels": {},
                "prerequisiteStepIds": [
                    "start_job"
                ]
            },
            {
                "pysparkJob": {
                    "mainPythonFileUri": "gs://temp1.py",
                    "pythonFileUris": [],
                    "jarFileUris": [
                        "gs://spark-lib/bigquery/spark-bigquery-latest_2.12.jar"
                    ],
                    "fileUris": [],
                    "archiveUris": [],
                    "properties": {},
                    "args": [
                        "arg1"
                    ]
                },
                "stepId": "tb4",
                "labels": {},
                "prerequisiteStepIds": [
                    "start_job"
                ]
            },
            {
                "pysparkJob": {
                    "mainPythonFileUri": "gs://end_job.py",
                    "pythonFileUris": [],
                    "jarFileUris": [],
                    "fileUris": [],
                    "archiveUris": [],
                    "properties": {},
                    "args": [
                        "arg1"
                    ]
                },
                "stepId": "end_job",
                "labels": {},
                "prerequisiteStepIds": [
                    "tb1",
                    "tb2",
                    "tb3",
                    "tb4"
                ]
            }
        ],
        "parameters": [],
        "dagTimeout": "1800s"
    },
    "path": "/v1/projects/project1/regions/region-name/workflowTemplates/",
    "params": {}
}```
@product-auto-label product-auto-label bot added the api: dataproc Issues related to the Dataproc API. label Jul 19, 2022
@meredithslota meredithslota added the type: question Request for information or clarification. Not an issue. label Jul 25, 2022
@parthea
Copy link
Contributor

parthea commented Apr 17, 2023

I'm going to transfer this issue to the google-cloud-python repository as we are preparing to move the code for google-cloud-dataproc to that repository in the next 1-2 weeks.

@parthea parthea transferred this issue from googleapis/python-dataproc Apr 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: dataproc Issues related to the Dataproc API. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants