Skip to content

Conversation

@dorx
Copy link
Contributor

@dorx dorx commented Jul 22, 2014

Utilities for generating random RDDs.

RandomRDD and RandomVectorRDD are created instead of using sc.parallelize(range:Range) because Range objects in Scala can only have size <= Int.MaxValue.

The object RandomRDDGenerators can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments.

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16942/consoleFull

@dorx
Copy link
Contributor Author

dorx commented Jul 22, 2014

@falaki @jkbradley @mengxr

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA results for PR 1520:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait DistributionGenerator extends Pseudorandom with Serializable {
class UniformGenerator() extends DistributionGenerator {
class StandardNormalGenerator() extends DistributionGenerator {
class PoissonGenerator(val mean: Double) extends DistributionGenerator {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16942/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change @return to Returns. Otherwise the summary will be empty in the generated docs.

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17060/consoleFull

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.i.d -> i.i.d. and in other places

@mengxr
Copy link
Contributor

mengxr commented Jul 25, 2014

@dorx Besides comments, could you mark distribution generators and methods that requires distribution generators @Experimental? Part of the reason is that we don't have the API in Python and whether we should implement the same in Python is not clear.

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17197/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA results for PR 1520:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait DistributionGenerator extends Pseudorandom with Serializable {
class UniformGenerator extends DistributionGenerator {
class StandardNormalGenerator extends DistributionGenerator {
class PoissonGenerator(val mean: Double) extends DistributionGenerator {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17197/consoleFull

@dorx
Copy link
Contributor Author

dorx commented Jul 25, 2014

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA tests have started for PR 1520. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17205/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 25, 2014

QA results for PR 1520:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds the following public classes (experimental):
trait DistributionGenerator extends Pseudorandom with Serializable {
class UniformGenerator extends DistributionGenerator {
class StandardNormalGenerator extends DistributionGenerator {
class PoissonGenerator(val mean: Double) extends DistributionGenerator {

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17205/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Jul 27, 2014

LGTM. Merged into master. Thanks for adding random RDD generators!!

@asfgit asfgit closed this in 81fcdd2 Jul 27, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Utilities for generating random RDDs.

RandomRDD and RandomVectorRDD are created instead of using `sc.parallelize(range:Range)` because `Range` objects in Scala can only have `size <= Int.MaxValue`.

The object `RandomRDDGenerators` can be transformed into a generator class to reduce the number of auxiliary methods for optional arguments.

Author: Doris Xin <doris.s.xin@gmail.com>

Closes apache#1520 from dorx/randomRDD and squashes the following commits:

01121ac [Doris Xin] reviewer comments
6bf27d8 [Doris Xin] Merge branch 'master' into randomRDD
a8ea92d [Doris Xin] Reviewer comments
063ea0b [Doris Xin] Merge branch 'master' into randomRDD
aec68eb [Doris Xin] newline
bc90234 [Doris Xin] units passed.
d56cacb [Doris Xin] impl with RandomRDD
92d6f1c [Doris Xin] solution for Cloneable
df5bcff [Doris Xin] Merge branch 'generator' into randomRDD
f46d928 [Doris Xin] WIP
49ed20d [Doris Xin] alternative poisson distribution generator
7cb0e40 [Doris Xin] fix for data inconsistency
8881444 [Doris Xin] RandomRDDGenerator: initial design
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants