Skip to content

Commit f5ff4a8

Browse files
brkyvzmengxr
authored andcommitted
[SPARK-7383] [ML] Feature Parity in PySpark for ml.features
Implemented python wrappers for Scala functions that don't exist in `ml.features` Author: Burak Yavuz <brkyvz@gmail.com> Closes #5991 from brkyvz/ml-feat-PR and squashes the following commits: adcca55 [Burak Yavuz] add regex tokenizer to __all__ b91cb44 [Burak Yavuz] addressed comments bd39fd2 [Burak Yavuz] remove addition b82bd7c [Burak Yavuz] Parity in PySpark for ml.features
1 parent c796be7 commit f5ff4a8

File tree

5 files changed

+851
-43
lines changed

5 files changed

+851
-43
lines changed

mllib/src/main/scala/org/apache/spark/ml/feature/PolynomialExpansion.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ import org.apache.spark.sql.types.DataType
3131
* which is available at [[http://en.wikipedia.org/wiki/Polynomial_expansion]], "In mathematics, an
3232
* expansion of a product of sums expresses it as a sum of products by using the fact that
3333
* multiplication distributes over addition". Take a 2-variable feature vector as an example:
34-
* `(x, y)`, if we want to expand it with degree 2, then we get `(x, y, x * x, x * y, y * y)`.
34+
* `(x, y)`, if we want to expand it with degree 2, then we get `(x, x * x, y, x * y, y * y)`.
3535
*/
3636
@AlphaComponent
3737
class PolynomialExpansion extends UnaryTransformer[Vector, Vector, PolynomialExpansion] {

mllib/src/main/scala/org/apache/spark/ml/feature/Tokenizer.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ class Tokenizer extends UnaryTransformer[String, Seq[String], Tokenizer] {
4242

4343
/**
4444
* :: AlphaComponent ::
45-
* A regex based tokenizer that extracts tokens either by repeatedly matching the regex(default)
45+
* A regex based tokenizer that extracts tokens either by repeatedly matching the regex(default)
4646
* or using it to split the text (set matching to false). Optional parameters also allow filtering
4747
* tokens using a minimal length.
4848
* It returns an array of strings that can be empty.

0 commit comments

Comments
 (0)