Skip to content

'ImportError: No module named gcloud' error with Dataproc and Pyspark #1826

Closed
@sudeepag

Description

@sudeepag

When I try to run a Pyspark job on a Cloud Dataproc cluster, I get the following error:

ImportError: No module named gcloud

I have gcloud installed on all the nodes in the cluster (the master, as well as the worker nodes). Here are the version numbers -

Google Cloud SDK 111.0.0
bq 2.0.24
bq-nix 2.0.24
core 2016.05.20
core-nix 2016.05.05
gcloud
gsutil 4.19
gsutil-nix 4.19

However, when I submit the job after I ssh into the master node and run $ spark-submit filename.py, it runs perfectly fine.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions