Skip to content

Commit 9425b1b

Browse files
committed
Merge remote-tracking branch 'upstream/master' into parquet-avro-hive
# Conflicts: # dev/deps/spark-deps-hadoop-2.7-hive-2.3 # dev/deps/spark-deps-hadoop-3.2-hive-2.3 # pom.xml # sql/core/src/test/scala/org/apache/spark/sql/ExplainSuite.scala
2 parents 3a2a6ad + 71d261a commit 9425b1b

File tree

811 files changed

+27877
-16124
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

811 files changed

+27877
-16124
lines changed

.github/workflows/build_and_test.yml

Lines changed: 14 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -285,6 +285,8 @@ jobs:
285285
lint:
286286
name: Linters, licenses, dependencies and documentation generation
287287
runs-on: ubuntu-20.04
288+
container:
289+
image: dongjoon/apache-spark-github-action-image:20201025
288290
steps:
289291
- name: Checkout Spark repository
290292
uses: actions/checkout@v2
@@ -315,10 +317,6 @@ jobs:
315317
key: docs-maven-${{ hashFiles('**/pom.xml') }}
316318
restore-keys: |
317319
docs-maven-
318-
- name: Install Java 8
319-
uses: actions/setup-java@v1
320-
with:
321-
java-version: 8
322320
- name: Install Python 3.6
323321
uses: actions/setup-python@v2
324322
with:
@@ -328,33 +326,24 @@ jobs:
328326
run: |
329327
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
330328
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
331-
pip3 install flake8 'sphinx<3.1.0' numpy pydata_sphinx_theme ipython nbsphinx mypy numpydoc
332-
- name: Install R 4.0
333-
uses: r-lib/actions/setup-r@v1
334-
with:
335-
r-version: 4.0
329+
python3.6 -m pip install flake8 'sphinx<3.1.0' numpy pydata_sphinx_theme ipython nbsphinx mypy numpydoc
336330
- name: Install R linter dependencies and SparkR
337331
run: |
338-
sudo apt-get install -y libcurl4-openssl-dev
339-
# dependencies for usethis 1.6.3.
340-
sudo Rscript -e "install.packages(c('clipr', 'cli', 'crayon', 'desc', 'fs', 'gh', 'glue', 'purrr', 'rematch2', 'rlang', 'rprojroot', 'whisker', 'withr', 'yaml', 'git2r', 'rstudioapi'), repos='https://cloud.r-project.org/')"
341-
sudo Rscript -e "install.packages('https://cran.r-project.org/src/contrib/Archive/usethis/usethis_1.6.3.tar.gz', repos=NULL, type='source')"
342-
sudo Rscript -e "install.packages(c('devtools'), repos='https://cloud.r-project.org/')"
343-
sudo Rscript -e "devtools::install_github('jimhester/lintr@v2.0.0')"
332+
apt-get install -y libcurl4-openssl-dev libgit2-dev libssl-dev libxml2-dev
333+
Rscript -e "install.packages(c('devtools'), repos='https://cloud.r-project.org/')"
334+
Rscript -e "devtools::install_github('jimhester/lintr@v2.0.0')"
344335
./R/install-dev.sh
345-
- name: Install Ruby 2.7 for documentation generation
346-
uses: actions/setup-ruby@v1
347-
with:
348-
ruby-version: 2.7
349336
- name: Install dependencies for documentation generation
350337
run: |
351338
# pandoc is required to generate PySpark APIs as well in nbsphinx.
352-
sudo apt-get install -y libcurl4-openssl-dev pandoc
339+
apt-get install -y libcurl4-openssl-dev pandoc
353340
# TODO(SPARK-32407): Sphinx 3.1+ does not correctly index nested classes.
354341
# See also https://github.com/sphinx-doc/sphinx/issues/7551.
355-
pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsphinx numpydoc
342+
python3.6 -m pip install 'sphinx<3.1.0' mkdocs numpy pydata_sphinx_theme ipython nbsphinx numpydoc
343+
apt-get update -y
344+
apt-get install -y ruby ruby-dev
356345
gem install jekyll jekyll-redirect-from rouge
357-
sudo Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
346+
Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
358347
- name: Scala linter
359348
run: ./dev/lint-scala
360349
- name: Java linter
@@ -370,6 +359,8 @@ jobs:
370359
- name: Run documentation build
371360
run: |
372361
cd docs
362+
export LC_ALL=C.UTF-8
363+
export LANG=C.UTF-8
373364
jekyll build
374365
375366
java-11:
@@ -417,7 +408,7 @@ jobs:
417408
- name: Build with SBT
418409
run: |
419410
./dev/change-scala-version.sh 2.13
420-
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pscala-2.13 compile test:compile
411+
./build/sbt -Pyarn -Pmesos -Pkubernetes -Phive -Phive-thriftserver -Phadoop-cloud -Pkinesis-asl -Pdocker-integration-tests -Pkubernetes-integration-tests -Pspark-ganglia-lgpl -Pscala-2.13 compile test:compile
421412
422413
hadoop-2:
423414
name: Hadoop 2 build with SBT

.github/workflows/publish_snapshot.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ on:
66

77
jobs:
88
publish-snapshot:
9+
if: github.repository == 'apache/spark'
910
runs-on: ubuntu-latest
1011
strategy:
1112
fail-fast: false

LICENSE-binary

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -521,7 +521,6 @@ Common Development and Distribution License (CDDL) 1.1
521521
------------------------------------------------------
522522

523523
javax.el:javax.el-api https://javaee.github.io/uel-ri/
524-
javax.servlet:javax.servlet-api https://javaee.github.io/servlet-spec/
525524
javax.servlet.jsp:jsp-api
526525
javax.transaction:jta http://www.oracle.com/technetwork/java/index.html
527526
javax.xml.bind:jaxb-api https://github.com/javaee/jaxb-v2
@@ -553,6 +552,7 @@ Eclipse Public License (EPL) 2.0
553552
--------------------------------
554553

555554
jakarta.annotation:jakarta-annotation-api https://projects.eclipse.org/projects/ee4j.ca
555+
jakarta.servlet:jakarta.servlet-api https://projects.eclipse.org/projects/ee4j.servlet
556556
jakarta.ws.rs:jakarta.ws.rs-api https://github.com/eclipse-ee4j/jaxrs-api
557557
org.glassfish.hk2.external:jakarta.inject
558558

R/pkg/DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Authors@R: c(person("Shivaram", "Venkataraman", role = c("aut", "cre"),
1111
email = "felixcheung@apache.org"),
1212
person(family = "The Apache Software Foundation", role = c("aut", "cph")))
1313
License: Apache License (== 2.0)
14-
URL: https://www.apache.org/ https://spark.apache.org/
14+
URL: https://www.apache.org https://spark.apache.org
1515
BugReports: https://spark.apache.org/contributing.html
1616
SystemRequirements: Java (>= 8, < 12)
1717
Depends:

R/pkg/R/DataFrame.R

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -880,7 +880,7 @@ setMethod("toJSON",
880880

881881
#' Save the contents of SparkDataFrame as a JSON file
882882
#'
883-
#' Save the contents of a SparkDataFrame as a JSON file (\href{http://jsonlines.org/}{
883+
#' Save the contents of a SparkDataFrame as a JSON file (\href{https://jsonlines.org/}{
884884
#' JSON Lines text format or newline-delimited JSON}). Files written out
885885
#' with this method can be read back in as a SparkDataFrame using read.json().
886886
#'
@@ -2277,16 +2277,17 @@ setMethod("mutate",
22772277

22782278
# For named arguments, use the names for arguments as the column names
22792279
# For unnamed arguments, use the argument symbols as the column names
2280-
args <- sapply(substitute(list(...))[-1], deparse)
22812280
ns <- names(cols)
2282-
if (!is.null(ns)) {
2283-
lapply(seq_along(args), function(i) {
2284-
if (ns[[i]] != "") {
2285-
args[[i]] <<- ns[[i]]
2286-
}
2281+
if (is.null(ns)) ns <- rep("", length(cols))
2282+
named_idx <- nzchar(ns)
2283+
if (!all(named_idx)) {
2284+
# SPARK-31517: deparse uses width.cutoff on wide input and the
2285+
# output is length>1, so need to collapse it to scalar
2286+
colsub <- substitute(list(...))[-1L]
2287+
ns[!named_idx] <- sapply(which(!named_idx), function(ii) {
2288+
paste(gsub("^\\s*|\\s*$", "", deparse(colsub[[ii]])), collapse = " ")
22872289
})
22882290
}
2289-
ns <- args
22902291

22912292
# The last column of the same name in the specific columns takes effect
22922293
deDupCols <- list()
@@ -3444,7 +3445,8 @@ setMethod("as.data.frame",
34443445
#' @note attach since 1.6.0
34453446
setMethod("attach",
34463447
signature(what = "SparkDataFrame"),
3447-
function(what, pos = 2L, name = deparse(substitute(what), backtick = FALSE),
3448+
function(what, pos = 2L,
3449+
name = paste(deparse(substitute(what), backtick = FALSE), collapse = " "),
34483450
warn.conflicts = TRUE) {
34493451
args <- as.list(environment()) # capture all parameters - this must be the first line
34503452
newEnv <- assignNewEnv(args$what)

R/pkg/R/SQLContext.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ setMethod("toDF", signature(x = "RDD"),
374374
#' Create a SparkDataFrame from a JSON file.
375375
#'
376376
#' Loads a JSON file, returning the result as a SparkDataFrame
377-
#' By default, (\href{http://jsonlines.org/}{JSON Lines text format or newline-delimited JSON}
377+
#' By default, (\href{https://jsonlines.org/}{JSON Lines text format or newline-delimited JSON}
378378
#' ) is supported. For JSON (one record per file), set a named property \code{multiLine} to
379379
#' \code{TRUE}.
380380
#' It goes through the entire dataset once to determine the schema.

R/pkg/R/install.R

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,11 @@
3939
#' version number in the format of "x.y" where x and y are integer.
4040
#' If \code{hadoopVersion = "without"}, "Hadoop free" build is installed.
4141
#' See
42-
#' \href{http://spark.apache.org/docs/latest/hadoop-provided.html}{
42+
#' \href{https://spark.apache.org/docs/latest/hadoop-provided.html}{
4343
#' "Hadoop Free" Build} for more information.
4444
#' Other patched version names can also be used, e.g. \code{"cdh4"}
4545
#' @param mirrorUrl base URL of the repositories to use. The directory layout should follow
46-
#' \href{http://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}.
46+
#' \href{https://www.apache.org/dyn/closer.lua/spark/}{Apache mirrors}.
4747
#' @param localDir a local directory where Spark is installed. The directory contains
4848
#' version-specific folders of Spark packages. Default is path to
4949
#' the cache directory:
@@ -64,7 +64,7 @@
6464
#'}
6565
#' @note install.spark since 2.1.0
6666
#' @seealso See available Hadoop versions:
67-
#' \href{http://spark.apache.org/downloads.html}{Apache Spark}
67+
#' \href{https://spark.apache.org/downloads.html}{Apache Spark}
6868
install.spark <- function(hadoopVersion = "2.7", mirrorUrl = NULL,
6969
localDir = NULL, overwrite = FALSE) {
7070
sparkHome <- Sys.getenv("SPARK_HOME")

R/pkg/R/mllib_classification.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -425,7 +425,7 @@ setMethod("write.ml", signature(object = "LogisticRegressionModel", path = "char
425425
#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models.
426426
#' Only categorical data is supported.
427427
#' For more details, see
428-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html}{
428+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html}{
429429
#' Multilayer Perceptron}
430430
#'
431431
#' @param data a \code{SparkDataFrame} of observations and labels for model fitting.
@@ -574,7 +574,7 @@ setMethod("write.ml", signature(object = "MultilayerPerceptronClassificationMode
574574
#' @rdname spark.naiveBayes
575575
#' @aliases spark.naiveBayes,SparkDataFrame,formula-method
576576
#' @name spark.naiveBayes
577-
#' @seealso e1071: \url{https://cran.r-project.org/package=e1071}
577+
#' @seealso e1071: \url{https://cran.r-project.org/web/packages/e1071/index.html}
578578
#' @examples
579579
#' \dontrun{
580580
#' data <- as.data.frame(UCBAdmissions)

R/pkg/R/mllib_clustering.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ setMethod("write.ml", signature(object = "BisectingKMeansModel", path = "charact
204204
#' @return \code{spark.gaussianMixture} returns a fitted multivariate gaussian mixture model.
205205
#' @rdname spark.gaussianMixture
206206
#' @name spark.gaussianMixture
207-
#' @seealso mixtools: \url{https://cran.r-project.org/package=mixtools}
207+
#' @seealso mixtools: \url{https://cran.r-project.org/web/packages/mixtools/index.html}
208208
#' @examples
209209
#' \dontrun{
210210
#' sparkR.session()
@@ -483,7 +483,7 @@ setMethod("write.ml", signature(object = "KMeansModel", path = "character"),
483483
#' @return \code{spark.lda} returns a fitted Latent Dirichlet Allocation model.
484484
#' @rdname spark.lda
485485
#' @aliases spark.lda,SparkDataFrame-method
486-
#' @seealso topicmodels: \url{https://cran.r-project.org/package=topicmodels}
486+
#' @seealso topicmodels: \url{https://cran.r-project.org/web/packages/topicmodels/index.html}
487487
#' @examples
488488
#' \dontrun{
489489
#' text <- read.df("data/mllib/sample_lda_libsvm_data.txt", source = "libsvm")

R/pkg/R/mllib_recommendation.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ setClass("ALSModel", representation(jobj = "jobj"))
3030
#' to make predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models.
3131
#'
3232
#' For more details, see
33-
#' \href{http://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
33+
#' \href{https://spark.apache.org/docs/latest/ml-collaborative-filtering.html}{MLlib:
3434
#' Collaborative Filtering}.
3535
#'
3636
#' @param data a SparkDataFrame for training.

R/pkg/R/mllib_regression.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -475,7 +475,7 @@ setMethod("write.ml", signature(object = "IsotonicRegressionModel", path = "char
475475
#' @param ... additional arguments passed to the method.
476476
#' @return \code{spark.survreg} returns a fitted AFT survival regression model.
477477
#' @rdname spark.survreg
478-
#' @seealso survival: \url{https://cran.r-project.org/package=survival}
478+
#' @seealso survival: \url{https://cran.r-project.org/web/packages/survival/index.html}
479479
#' @examples
480480
#' \dontrun{
481481
#' df <- createDataFrame(ovarian)

R/pkg/R/mllib_stat.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ setClass("KSTest", representation(jobj = "jobj"))
4949
#' @rdname spark.kstest
5050
#' @aliases spark.kstest,SparkDataFrame-method
5151
#' @name spark.kstest
52-
#' @seealso \href{http://spark.apache.org/docs/latest/mllib-statistics.html#hypothesis-testing}{
52+
#' @seealso \href{https://spark.apache.org/docs/latest/mllib-statistics.html#hypothesis-testing}{
5353
#' MLlib: Hypothesis Testing}
5454
#' @examples
5555
#' \dontrun{

R/pkg/R/mllib_tree.R

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,9 @@ print.summary.decisionTree <- function(x) {
127127
#' \code{write.ml}/\code{read.ml} to save/load fitted models.
128128
#' For more details, see
129129
# nolint start
130-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression}{
130+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression}{
131131
#' GBT Regression} and
132-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier}{
132+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier}{
133133
#' GBT Classification}
134134
# nolint end
135135
#'
@@ -343,9 +343,9 @@ setMethod("write.ml", signature(object = "GBTClassificationModel", path = "chara
343343
#' save/load fitted models.
344344
#' For more details, see
345345
# nolint start
346-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-regression}{
346+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-regression}{
347347
#' Random Forest Regression} and
348-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier}{
348+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier}{
349349
#' Random Forest Classification}
350350
# nolint end
351351
#'
@@ -568,9 +568,9 @@ setMethod("write.ml", signature(object = "RandomForestClassificationModel", path
568568
#' save/load fitted models.
569569
#' For more details, see
570570
# nolint start
571-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-regression}{
571+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-regression}{
572572
#' Decision Tree Regression} and
573-
#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier}{
573+
#' \href{https://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier}{
574574
#' Decision Tree Classification}
575575
# nolint end
576576
#'

R/pkg/R/stats.R

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,8 @@ setMethod("corr",
109109
#'
110110
#' Finding frequent items for columns, possibly with false positives.
111111
#' Using the frequent element count algorithm described in
112-
#' \url{https://doi.org/10.1145/762471.762473}, proposed by Karp, Schenker, and Papadimitriou.
112+
#' \url{https://dl.acm.org/doi/10.1145/762471.762473}, proposed by Karp, Schenker,
113+
#' and Papadimitriou.
113114
#'
114115
#' @param x A SparkDataFrame.
115116
#' @param cols A vector column names to search frequent items in.

R/pkg/inst/worker/worker.R

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -196,25 +196,26 @@ if (isEmpty != 0) {
196196
outputs <- list()
197197
for (i in seq_len(length(data))) {
198198
# Timing reading input data for execution
199-
inputElap <- elapsedSecs()
199+
computeStart <- elapsedSecs()
200200
output <- compute(mode, partition, serializer, deserializer, keys[[i]],
201201
colNames, computeFunc, data[[i]])
202202
computeElap <- elapsedSecs()
203203
if (serializer == "arrow") {
204204
outputs[[length(outputs) + 1L]] <- output
205205
} else {
206206
outputResult(serializer, output, outputCon)
207+
outputComputeElapsDiff <- outputComputeElapsDiff + (elapsedSecs() - computeElap)
207208
}
208-
outputElap <- elapsedSecs()
209-
computeInputElapsDiff <- computeInputElapsDiff + (computeElap - inputElap)
210-
outputComputeElapsDiff <- outputComputeElapsDiff + (outputElap - computeElap)
209+
computeInputElapsDiff <- computeInputElapsDiff + (computeElap - computeStart)
211210
}
212211

213212
if (serializer == "arrow") {
214213
# See https://stat.ethz.ch/pipermail/r-help/2010-September/252046.html
215214
# rbind.fill might be an alternative to make it faster if plyr is installed.
215+
outputStart <- elapsedSecs()
216216
combined <- do.call("rbind", outputs)
217217
SparkR:::writeSerializeInArrow(outputCon, combined)
218+
outputComputeElapsDiff <- elapsedSecs() - outputStart
218219
}
219220
}
220221
} else {

R/pkg/tests/fulltests/test_sparkSQL.R

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2884,6 +2884,15 @@ test_that("mutate(), transform(), rename() and names()", {
28842884
expect_equal(nrow(result), 153)
28852885
expect_equal(ncol(result), 2)
28862886
detach(airquality)
2887+
2888+
# ensure long inferred names are handled without error (SPARK-26199)
2889+
# test implicitly assumes eval(formals(deparse)$width.cutoff) = 60
2890+
# (which has always been true as of 2020-11-15)
2891+
newDF <- mutate(
2892+
df,
2893+
df$age + 12345678901234567890 + 12345678901234567890 + 12345678901234
2894+
)
2895+
expect_match(tail(columns(newDF), 1L), "234567890", fixed = TRUE)
28872896
})
28882897

28892898
test_that("read/write ORC files", {
@@ -3273,6 +3282,12 @@ test_that("attach() on a DataFrame", {
32733282
stat3 <- summary(df[, "age", drop = F])
32743283
expect_equal(collect(stat3)[8, "age"], "30")
32753284
expect_error(age)
3285+
3286+
# attach method uses deparse(); ensure no errors from a very long input
3287+
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop <- df # nolint
3288+
attach(abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop)
3289+
expect_true(any(grepl("abcdefghijklmnopqrstuvwxyz", search())))
3290+
detach("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnop")
32763291
})
32773292

32783293
test_that("with() on a DataFrame", {

R/pkg/vignettes/sparkr-vignettes.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1007,7 +1007,7 @@ perplexity
10071007

10081008
#### Alternating Least Squares
10091009

1010-
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](https://dl.acm.org/citation.cfm?id=1608614).
1010+
`spark.als` learns latent factors in [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) via [alternating least squares](https://dl.acm.org/doi/10.1109/MC.2009.263).
10111011

10121012
There are multiple options that can be configured in `spark.als`, including `rank`, `reg`, and `nonnegative`. For a complete list, refer to the help file.
10131013

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ and Structured Streaming for stream processing.
99

1010
<https://spark.apache.org/>
1111

12-
[![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.7-hive-2.3)
12+
[![Jenkins Build](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-3.2/badge/icon)](https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-3.2)
1313
[![AppVeyor Build](https://img.shields.io/appveyor/ci/ApacheSoftwareFoundation/spark/master.svg?style=plastic&logo=appveyor)](https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark)
1414
[![PySpark Coverage](https://img.shields.io/badge/dynamic/xml.svg?label=pyspark%20coverage&url=https%3A%2F%2Fspark-test.github.io%2Fpyspark-coverage-site&query=%2Fhtml%2Fbody%2Fdiv%5B1%5D%2Fdiv%2Fh1%2Fspan&colorB=brightgreen&style=plastic)](https://spark-test.github.io/pyspark-coverage-site)
1515

0 commit comments

Comments
 (0)