-
Notifications
You must be signed in to change notification settings - Fork 28.5k
[SPARK-2871] [PySpark] Add missing API #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QA tests have started for PR 1791. This patch merges cleanly. |
QA results for PR 1791: |
QA tests have started for PR 1791. This patch merges cleanly. |
QA tests have started for PR 1791. This patch merges cleanly. |
QA results for PR 1791: |
} | ||
|
||
/** | ||
* Convert a RDD of Java objects to and RDD of serialized Python objects, that is usable by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Convert a RDD of Java objects to and RDD of serialized Python objects
=>
Convert an RDD of Java objects to an RDD of serialized Python objects ?
QA tests have started for PR 1791. This patch merges cleanly. |
The histogram() had been implemented in pure Python, it will support integer better, also it will support RDD of strings and other comparable objects. This was inspired by #1783 et, and much improved. |
QA results for PR 1791: |
|
||
inc = (maxv - minv) / buckets | ||
# keep them as integer if possible | ||
if inc * buckets != maxv - minv: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was smart!
QA tests have started for PR 1791. This patch merges cleanly. |
QA results for PR 1791: |
meanApprox() and sumApprox()
QA tests have started for PR 1791. This patch merges cleanly. |
QA results for PR 1791: |
QA tests have started for PR 1791. This patch merges cleanly. |
Jenkins, test this please |
QA tests have started for PR 1791. This patch merges cleanly. |
QA results for PR 1791: |
@@ -812,23 +842,39 @@ def func(iterator): | |||
|
|||
return self.mapPartitions(func).fold(zeroValue, combOp) | |||
|
|||
def max(self): | |||
def max(self, comp=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain what "comp" is in the doc comment
@davies I looked over all of this now and made some comments, but you should have Josh check too. Just to be clear though, I don't think this can make it into 1.1, so we can hold off on it for a while while we fix issues for 1.1. But these are great APIs to have. |
1. implement lookup(), similar to that in Scala 2. handle None, nan, inf in histogram, add many tests 3. remove collectPartitions() 4. improve docs
@mateiz thanks for review this, I had addressed all you comments. @JoshRosen could you take a look a this again? |
The description had been updated to list all the added APIs. |
Conflicts: python/pyspark/rdd.py python/pyspark/tests.py
Conflicts: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala python/pyspark/rdd.py
QA tests have started for PR 1791 at commit
|
QA tests have started for PR 1791 at commit
|
QA tests have started for PR 1791 at commit
|
QA tests have finished for PR 1791 at commit
|
QA tests have finished for PR 1791 at commit
|
QA tests have finished for PR 1791 at commit
|
Most of useful parts have been merged separately, so close this. |
Add the following APIs:
SparkContext.conf
SparkContext.isLocal
SparkContext.startTime