Skip to content

[SPARK-2871] [PySpark] Add missing API #1791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from
Closed

Conversation

davies
Copy link
Contributor

@davies davies commented Aug 5, 2014

Add the following APIs:

SparkContext.conf
SparkContext.isLocal
SparkContext.startTime

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17953/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA results for PR 1791:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17953/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17958/consoleFull

@davies davies changed the title [WIP] [PySpark] Add missing API [PySpark] Add missing API Aug 5, 2014
@SparkQA
Copy link

SparkQA commented Aug 5, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17959/consoleFull

@davies davies changed the title [PySpark] Add missing API [SPARK-2871] [PySpark] Add missing API Aug 6, 2014
@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1791:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17958/consoleFull

}

/**
* Convert a RDD of Java objects to and RDD of serialized Python objects, that is usable by
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convert a RDD of Java objects to and RDD of serialized Python objects
=>
Convert an RDD of Java objects to an RDD of serialized Python objects ?

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18001/consoleFull

@davies
Copy link
Contributor Author

davies commented Aug 6, 2014

The histogram() had been implemented in pure Python, it will support integer better, also it will support RDD of strings and other comparable objects.

This was inspired by #1783 et, and much improved.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1791:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18001/consoleFull


inc = (maxv - minv) / buckets
# keep them as integer if possible
if inc * buckets != maxv - minv:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was smart!

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18032/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1791:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18032/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18055/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1791:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class BoundedFloat(float):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18055/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18058/consoleFull

@davies
Copy link
Contributor Author

davies commented Aug 14, 2014

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Aug 14, 2014

QA tests have started for PR 1791. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18498/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 14, 2014

QA results for PR 1791:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
class BoundedFloat(float):

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18498/consoleFull

@@ -812,23 +842,39 @@ def func(iterator):

return self.mapPartitions(func).fold(zeroValue, combOp)

def max(self):
def max(self, comp=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe explain what "comp" is in the doc comment

@mateiz
Copy link
Contributor

mateiz commented Aug 14, 2014

@davies I looked over all of this now and made some comments, but you should have Josh check too. Just to be clear though, I don't think this can make it into 1.1, so we can hold off on it for a while while we fix issues for 1.1. But these are great APIs to have.

1. implement lookup(), similar to that in Scala
2. handle None, nan, inf in histogram, add many tests
3. remove collectPartitions()
4. improve docs
@davies
Copy link
Contributor Author

davies commented Aug 14, 2014

@mateiz thanks for review this, I had addressed all you comments.

@JoshRosen could you take a look a this again?

@davies
Copy link
Contributor Author

davies commented Aug 14, 2014

The description had been updated to list all the added APIs.

Conflicts:
	python/pyspark/rdd.py
	python/pyspark/tests.py
@davies
Copy link
Contributor Author

davies commented Aug 22, 2014

@mateiz @JoshRosen some APIs has been splitted out as separated PRs: #2091, #2092, #2093, #2094, #2095

Conflicts:
	core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
	python/pyspark/rdd.py
@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have started for PR 1791 at commit 28fd368.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have started for PR 1791 at commit 1ac98d6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have started for PR 1791 at commit 657a09b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have finished for PR 1791 at commit 28fd368.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class BoundedFloat(float):

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have finished for PR 1791 at commit 1ac98d6.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2014

QA tests have finished for PR 1791 at commit 657a09b.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor Author

davies commented Aug 27, 2014

Most of useful parts have been merged separately, so close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants