-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-37137][PYTHON] Inline type hints for python/pyspark/conf.py #34411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@xinrong-databricks @ueshin Can you help me take a look? I refer to |
|
Thanks, @ByronHsu! It is expected that type hints in |
|
@zero323 @xinrong-databricks Thanks for your advice. I have revised my code. |
|
Also, I am curious about the advantage of inline-type rather than stub files. |
Both have pros and cons. The biggest advantage here, is that it enables checking actual implementation (other type checkers might check code, even if stubs), so it makes easier to detect discrepancies and you get additional coverage for your code as bonus. |
| self._conf = {} | ||
|
|
||
| def set(self, key, value): | ||
| def set(self, key: str, value: str) -> "SparkConf": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could be less restrictive here, but I am not sure if that's a good idea, since it would depend on __str__ or __repr__ implementation for the value. This has some weird consequences, like:
>>> key, value = "foo", None
>>> conf = sc.getConf()
>>> conf.set(key, value)
<pyspark.conf.SparkConf object at 0x7f4870aa5ee0>
>>> conf.get(key) == value
FalseThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do we need to change the type of value from str to Optional[str]? Or could we open another ticket for this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin @HyukjinKwon WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ueshin @HyukjinKwon Could you help me review this patch? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tough call .. I would keep it as just str for now though .. for None it should be mapped to null on JVM ideally.
|
ok to test |
|
add to whitelist |
|
Test build #144842 has finished for PR 34411 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
python/pyspark/conf.py
Outdated
| if key not in cast(Dict[str, str], self._conf): | ||
| return None | ||
| return self._conf[key] | ||
| return cast(Dict[str, str], self._conf)[key] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated note ‒ shouldn't we use get here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, same as above ‒ single assert might be a better option.
python/pyspark/conf.py
Outdated
| self._jconf.set(key, str(value)) | ||
| else: | ||
| self._conf[key] = str(value) | ||
| cast(Dict[str, str], self._conf)[key] = str(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably go with assert here
diff --git a/python/pyspark/conf.py b/python/pyspark/conf.py
index 09c8e63d09..a8538b06e4 100644
--- a/python/pyspark/conf.py
+++ b/python/pyspark/conf.py
@@ -136,7 +136,8 @@ class SparkConf(object):
if self._jconf is not None:
self._jconf.set(key, str(value))
else:
- cast(Dict[str, str], self._conf)[key] = str(value)
+ assert self._conf is not None
+ self._conf[key] = str(value)
return self
def setIfMissing(self, key: str, value: str) -> "SparkConf":but I guess it is fine for now.
zero323
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM.
|
@zero323 I have updated the code. Thank you for your precious advice! |
|
Test build #144894 has finished for PR 34411 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Merged to master. |
What changes were proposed in this pull request?
Inline type hints for python/pyspark/conf.py
Why are the changes needed?
Currently, Inline type hints for python/pyspark/conf.pyi doesn't support type checking within function bodies. So we inline type hints to support that.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Exising test.