-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dataproc executor resource config #1160
Add dataproc executor resource config #1160
Conversation
/retest |
1 similar comment
/retest |
staging_location: str, | ||
region: str, | ||
project_id: str, | ||
executor_instances: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why positional arguments? Are these required? We don't have defaults?
How do I as a user know how to set this configuration and what the possible options are? |
66095dc
to
bbf7ba1
Compare
@woop I think it's not for users, but rather for feast admins and those options will be configured on jobservice side |
bbf7ba1
to
037cd6e
Compare
Yea I know, but my question is how do I know what configuration can be set? We need to start documenting the configuration options. |
/retest |
037cd6e
to
a2ca39b
Compare
staging_location: str, | ||
region: str, | ||
project_id: str, | ||
executor_instances: str = "2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rather rely on defaults in constants.py
than here?
sdk/python/feast/pyspark/launcher.py
Outdated
@@ -59,6 +62,9 @@ def _dataproc_launcher(config: Config) -> JobLauncher: | |||
config.get(CONFIG_SPARK_STAGING_LOCATION), | |||
config.get(CONFIG_SPARK_DATAPROC_REGION), | |||
config.get(CONFIG_SPARK_DATAPROC_PROJECT), | |||
config.get(CONFIG_SPARK_DATAPROC_EXECUTOR_INSTANCES), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use named arguments when amount of arguments is so high
f13bde2
to
8c809a4
Compare
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence <terencelimxp@gmail.com>
8c809a4
to
ff534fd
Compare
Signed-off-by: Terence <terencelimxp@gmail.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pyalex, terryyylim The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
* Add dataproc executor resource config Signed-off-by: Terence <terencelimxp@gmail.com> * Add default spark job executor values Signed-off-by: Terence <terencelimxp@gmail.com> * Fix e2e tests Signed-off-by: Terence <terencelimxp@gmail.com> * Shift spark configurations Signed-off-by: Terence <terencelimxp@gmail.com> * Update constants and docstrings Signed-off-by: Terence <terencelimxp@gmail.com>
Signed-off-by: Terence terencelimxp@gmail.com
What this PR does / why we need it:
Currently one job can allocate too many resources (by default it will take amount of executors equal to partitions), which can break parallelization. This PR allows configuring of resource allocation per one job, so that many jobs can share a cluster.
Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: