Skip to content

'ImportError: No module named gcloud' error with Dataproc and Pyspark #1826

Closed
@sudeepag

Description

@sudeepag

When I try to run a Pyspark job on a Cloud Dataproc cluster, I get the following error:

ImportError: No module named gcloud

I have gcloud installed on all the nodes in the cluster (the master, as well as the worker nodes). Here are the version numbers -

Google Cloud SDK 111.0.0
bq 2.0.24
bq-nix 2.0.24
core 2016.05.20
core-nix 2016.05.05
gcloud
gsutil 4.19
gsutil-nix 4.19

However, when I submit the job after I ssh into the master node and run $ spark-submit filename.py, it runs perfectly fine.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions