Add dataproc executor resource config #1160

terryyylim · 2020-11-13T08:02:14Z

Signed-off-by: Terence terencelimxp@gmail.com

What this PR does / why we need it:
Currently one job can allocate too many resources (by default it will take amount of executors equal to partitions), which can break parallelization. This PR allows configuring of resource allocation per one job, so that many jobs can share a cluster.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

Users can now configure amount of resources to allocate for their spark ingestion jobs.

woop · 2020-11-14T13:17:38Z

/retest

woop · 2020-11-14T13:53:19Z

/retest

woop · 2020-11-14T14:05:24Z

sdk/python/feast/pyspark/launchers/gcloud/dataproc.py

+        staging_location: str,
+        region: str,
+        project_id: str,
+        executor_instances: str,


Why positional arguments? Are these required? We don't have defaults?

woop · 2020-11-15T05:43:27Z

Users can now configure amount of resources to allocate for their spark ingestion jobs.

How do I as a user know how to set this configuration and what the possible options are?

pyalex · 2020-11-17T04:07:51Z

@woop I think it's not for users, but rather for feast admins and those options will be configured on jobservice side

woop · 2020-11-17T05:36:27Z

@woop I think it's not for users, but rather for feast admins and those options will be configured on jobservice side

Yea I know, but my question is how do I know what configuration can be set? We need to start documenting the configuration options.

terryyylim · 2020-11-17T06:30:11Z

/retest

tests/e2e/conftest.py

pyalex · 2020-11-17T08:48:00Z

sdk/python/feast/pyspark/launchers/gcloud/dataproc.py

+        staging_location: str,
+        region: str,
+        project_id: str,
+        executor_instances: str = "2",


can we rather rely on defaults in constants.py than here?

pyalex · 2020-11-17T08:48:40Z

sdk/python/feast/pyspark/launcher.py

@@ -59,6 +62,9 @@ def _dataproc_launcher(config: Config) -> JobLauncher:
        config.get(CONFIG_SPARK_STAGING_LOCATION),
        config.get(CONFIG_SPARK_DATAPROC_REGION),
        config.get(CONFIG_SPARK_DATAPROC_PROJECT),
+        config.get(CONFIG_SPARK_DATAPROC_EXECUTOR_INSTANCES),


please use named arguments when amount of arguments is so high

Signed-off-by: Terence <terencelimxp@gmail.com>

feast-ci-bot · 2020-11-19T05:26:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pyalex, terryyylim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pyalex]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pyalex · 2020-11-19T05:27:07Z

/lgtm

* Add dataproc executor resource config Signed-off-by: Terence <terencelimxp@gmail.com> * Add default spark job executor values Signed-off-by: Terence <terencelimxp@gmail.com> * Fix e2e tests Signed-off-by: Terence <terencelimxp@gmail.com> * Shift spark configurations Signed-off-by: Terence <terencelimxp@gmail.com> * Update constants and docstrings Signed-off-by: Terence <terencelimxp@gmail.com>

terryyylim added the kind/housekeeping label Nov 13, 2020

terryyylim requested review from davidheryanto, khorshuheng, pyalex, woop and zhilingc as code owners November 13, 2020 08:02

feast-ci-bot added the size/M label Nov 13, 2020

woop reviewed Nov 14, 2020

View reviewed changes

terryyylim force-pushed the dataproc-resource-config branch 2 times, most recently from 66095dc to bbf7ba1 Compare November 17, 2020 02:56

terryyylim force-pushed the dataproc-resource-config branch from bbf7ba1 to 037cd6e Compare November 17, 2020 05:13

terryyylim force-pushed the dataproc-resource-config branch from 037cd6e to a2ca39b Compare November 17, 2020 07:33

pyalex reviewed Nov 17, 2020

View reviewed changes

terryyylim force-pushed the dataproc-resource-config branch 2 times, most recently from f13bde2 to 8c809a4 Compare November 17, 2020 09:31

terryyylim added 4 commits November 19, 2020 12:46

Add dataproc executor resource config

5d48562

Signed-off-by: Terence <terencelimxp@gmail.com>

Add default spark job executor values

7c47a4d

Signed-off-by: Terence <terencelimxp@gmail.com>

Fix e2e tests

41c6818

Signed-off-by: Terence <terencelimxp@gmail.com>

Shift spark configurations

ff534fd

Signed-off-by: Terence <terencelimxp@gmail.com>

terryyylim force-pushed the dataproc-resource-config branch from 8c809a4 to ff534fd Compare November 19, 2020 04:48

Update constants and docstrings

93db247

Signed-off-by: Terence <terencelimxp@gmail.com>

pyalex approved these changes Nov 19, 2020

View reviewed changes

feast-ci-bot added the approved label Nov 19, 2020

feast-ci-bot assigned pyalex Nov 19, 2020

feast-ci-bot added the lgtm label Nov 19, 2020

feast-ci-bot merged commit 4d67fbb into feast-dev:master Nov 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add dataproc executor resource config #1160

Add dataproc executor resource config #1160

terryyylim commented Nov 13, 2020

woop commented Nov 14, 2020

woop commented Nov 14, 2020

woop Nov 14, 2020

woop commented Nov 15, 2020

pyalex commented Nov 17, 2020

woop commented Nov 17, 2020

terryyylim commented Nov 17, 2020

pyalex Nov 17, 2020

pyalex Nov 17, 2020

feast-ci-bot commented Nov 19, 2020

pyalex commented Nov 19, 2020

Add dataproc executor resource config #1160

Add dataproc executor resource config #1160

Conversation

terryyylim commented Nov 13, 2020

woop commented Nov 14, 2020

woop commented Nov 14, 2020

woop Nov 14, 2020

Choose a reason for hiding this comment

woop commented Nov 15, 2020

pyalex commented Nov 17, 2020

woop commented Nov 17, 2020

terryyylim commented Nov 17, 2020

pyalex Nov 17, 2020

Choose a reason for hiding this comment

pyalex Nov 17, 2020

Choose a reason for hiding this comment

feast-ci-bot commented Nov 19, 2020

pyalex commented Nov 19, 2020