@@ -25,7 +25,79 @@ displayTitle: <a href="mllib-guide.html">MLlib</a> - Statistics Functionality
2525\newcommand{\zero}{\mathbf{0}}
2626\] `
2727
28- ## Data Generators
28+ ## Random data generation
29+
30+ Random data generation is useful for randomized algorithms, prototyping, and performance testing.
31+ MLlib supports generating random RDDs with i.i.d. values drawn from a given distribution:
32+ uniform, standard normal, or Poisson.
33+
34+ <div class =" codetabs " >
35+ <div data-lang =" scala " markdown =" 1 " >
36+ [ ` RandomRDDs ` ] ( api/scala/index.html#org.apache.spark.mllib.random.RandomRDDs ) provides factory
37+ methods to generate random double RDDs or vector RDDs.
38+ The following example generates a random double RDD, whose values follows the standard normal
39+ distribution ` N(0, 1) ` , and then map it to ` N(1, 4) ` .
40+
41+ {% highlight scala %}
42+ import org.apache.spark.SparkContext
43+ import org.apache.spark.mllib.random.RandomRDDs._
44+
45+ val sc: SparkContext = ...
46+
47+ // Generate a random double RDD that contains 1 million i.i.d. values drawn from the
48+ // standard normal distribution ` N(0, 1) ` , evenly distributed in 10 partitions.
49+ val u = normalRDD(sc, 1000000L, 10)
50+ // Apply a transform to get a random double RDD following ` N(1, 4) ` .
51+ val v = u.map(x => 1.0 + 2.0 * x)
52+ {% endhighlight %}
53+ </div >
54+
55+ <div data-lang =" java " markdown =" 1 " >
56+ [ ` RandomRDDs ` ] ( api/java/index.html#org.apache.spark.mllib.random.RandomRDDs ) provides factory
57+ methods to generate random double RDDs or vector RDDs.
58+ The following example generates a random double RDD, whose values follows the standard normal
59+ distribution ` N(0, 1) ` , and then map it to ` N(1, 4) ` .
60+
61+ {% highlight java %}
62+ import org.apache.spark.SparkContext;
63+ import org.apache.spark.api.JavaDoubleRDD;
64+ import static org.apache.spark.mllib.random.RandomRDDs.* ;
65+
66+ JavaSparkContext jsc = ...
67+
68+ // Generate a random double RDD that contains 1 million i.i.d. values drawn from the
69+ // standard normal distribution ` N(0, 1) ` , evenly distributed in 10 partitions.
70+ JavaDoubleRDD u = normalJavaRDD(jsc, 1000000L, 10);
71+ // Apply a transform to get a random double RDD following ` N(1, 4) ` .
72+ JavaDoubleRDD v = u.map(
73+ new Function<Double, Double>() {
74+ public Double call(Double x) {
75+ return 1.0 + 2.0 * x;
76+ }
77+ });
78+ {% endhighlight %}
79+ </div >
80+
81+ <div data-lang =" python " markdown =" 1 " >
82+ [ ` RandomRDDs ` ] ( api/python/pyspark.mllib.random.RandomRDDs-class.html ) provides factory
83+ methods to generate random double RDDs or vector RDDs.
84+ The following example generates a random double RDD, whose values follows the standard normal
85+ distribution ` N(0, 1) ` , and then map it to ` N(1, 4) ` .
86+
87+ {% highlight python %}
88+ from pyspark.mllib.random import RandomRDDs
89+
90+ sc = ... # SparkContext
91+
92+ # Generate a random double RDD that contains 1 million i.i.d. values drawn from the
93+ # standard normal distribution ` N(0, 1) ` , evenly distributed in 10 partitions.
94+ u = RandomRDDs.uniformRDD(sc, 1000000L, 10)
95+ # Apply a transform to get a random double RDD following ` N(1, 4) ` .
96+ v = u.map(lambda x: 1.0 + 2.0 * x)
97+ {% endhighlight %}
98+ </div >
99+
100+ </div >
29101
30102## Stratified Sampling
31103
0 commit comments