Dataproc Initialization Actions

When creating a Google Cloud Dataproc cluster, you can specify initialization actions in executables and/or scripts that Cloud Dataproc will run on all nodes in your Cloud Dataproc cluster immediately after the cluster is set up.

How are initialization actions used?

Initialization actions are stored in a Google Cloud Storage bucket and can be passed as a paramater to the gcloud command or the clusters.create API when creating a Dataproc cluster. For example, to specify an initialization action when creating a cluster with the gcloud command, you can run:

gcloud dataproc clusters create CLUSTER-NAME
[--initialization-actions [GCS_URI,...]]
[--initialization-action-timeout TIMEOUT]

For convenience, a copy of initialization actions in this repository are stored in the following Cloud Storage bucket which is publicly-accessible:

gs://dataproc-initialization-actions

The folder structure of this Cloud Storage bucket mirrors this repository. You should be able to use this Cloud Storage bucket (and the initialization scripts within it) for your clusters.

Why these samples are provided

These samples are provided to show how various packages and components can be installed on Cloud Dataproc clusters. You should understand how these samples work before running them on your clusters. The initialization actions provided in this repository are provided without support and you use them at your own risk.

Actions provided

This repository presently offers the following actions for use with Cloud Dataproc clusters.

Install packages/software on the cluster
Configure the cluster
- Configure a nice shell environment
- Share a NFS consistency cache
- Share a Google Cloud SQL Hive Metastore
- Setup Google Stackdriver monitoring for a cluster

For more information

For more information, review the Dataproc documentation. You can also pose questions to the Stack Overflow comminity with the tag google-cloud-dataproc. See our other Google Cloud Platform github repos for sample applications and scaffolding for other frameworks and use cases.

Contributing changes

See CONTRIBUTING.md

Licensing

See LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
cloud-sql-proxy		cloud-sql-proxy
conda		conda
datalab		datalab
flink		flink
ganglia		ganglia
hue		hue
ipython-notebook		ipython-notebook
jupyter		jupyter
kafka		kafka
list-consistency-cache		list-consistency-cache
oozie		oozie
post-init		post-init
presto		presto
stackdriver		stackdriver
tez		tez
user-environment		user-environment
util		util
zeppelin		zeppelin
zookeeper		zookeeper
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
favicon.ico		favicon.ico

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dataproc Initialization Actions

How are initialization actions used?

Why these samples are provided

Actions provided

For more information

Contributing changes

Licensing

About

Uh oh!

Releases

Packages

Languages

License

jasonjho/dataproc-initialization-actions

Folders and files

Latest commit

History

Repository files navigation

Dataproc Initialization Actions

How are initialization actions used?

Why these samples are provided

Actions provided

For more information

Contributing changes

Licensing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages