Skip to content

[SPARKR][DOCS] update R API doc for subset/extract #16721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -1831,6 +1831,8 @@ setMethod("[", signature(x = "SparkDataFrame"),
#' Return subsets of SparkDataFrame according to given conditions
#' @param x a SparkDataFrame.
#' @param i,subset (Optional) a logical expression to filter on rows.
#' For extract operator [[ and replacement operator [[<-, the indexing parameter for
#' a single Column.
#' @param j,select expression for the single Column or a list of columns to select from the SparkDataFrame.
#' @param drop if TRUE, a Column will be returned if the resulting dataset has only one column.
#' Otherwise, a SparkDataFrame will always be returned.
Expand All @@ -1841,6 +1843,7 @@ setMethod("[", signature(x = "SparkDataFrame"),
#' @export
#' @family SparkDataFrame functions
#' @aliases subset,SparkDataFrame-method
#' @seealso \link{withColumn}
#' @rdname subset
#' @name subset
#' @family subsetting functions
Expand All @@ -1858,6 +1861,10 @@ setMethod("[", signature(x = "SparkDataFrame"),
#' subset(df, df$age %in% c(19, 30), 1:2)
#' subset(df, df$age %in% c(19), select = c(1,2))
#' subset(df, select = c(1,2))
#' # Columns can be selected and set
#' df[["age"]] <- 23
#' df[[1]] <- df$age
#' df[[2]] <- NULL # drop column
#' }
#' @note subset since 1.5.0
setMethod("subset", signature(x = "SparkDataFrame"),
Expand Down Expand Up @@ -1982,7 +1989,7 @@ setMethod("selectExpr",
#' @aliases withColumn,SparkDataFrame,character-method
#' @rdname withColumn
#' @name withColumn
#' @seealso \link{rename} \link{mutate}
#' @seealso \link{rename} \link{mutate} \link{subset}
#' @export
#' @examples
#'\dontrun{
Expand All @@ -1993,6 +2000,10 @@ setMethod("selectExpr",
#' # Replace an existing column
#' newDF2 <- withColumn(newDF, "newCol", newDF$col1)
#' newDF3 <- withColumn(newDF, "newCol", 42)
#' # Use extract operator to set an existing or new column
#' df[["age"]] <- 23
#' df[[2]] <- df$col1
#' df[[2]] <- NULL # drop column
#' }
#' @note withColumn since 1.4.0
setMethod("withColumn",
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/mllib_classification.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))

#' Logistic Regression Model
#'
#' Fits an logistic regression model against a Spark DataFrame. It supports "binomial": Binary logistic regression
#' Fits an logistic regression model against a SparkDataFrame. It supports "binomial": Binary logistic regression
#' with pivoting; "multinomial": Multinomial logistic (softmax) regression without pivoting, similar to glmnet.
#' Users can print, make predictions on the produced model and save the model to the input path.
#'
Expand Down
6 changes: 3 additions & 3 deletions R/pkg/R/mllib_clustering.R
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ setClass("LDAModel", representation(jobj = "jobj"))

#' Bisecting K-Means Clustering Model
#'
#' Fits a bisecting k-means clustering model against a Spark DataFrame.
#' Fits a bisecting k-means clustering model against a SparkDataFrame.
#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make
#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models.
#'
Expand Down Expand Up @@ -189,7 +189,7 @@ setMethod("write.ml", signature(object = "BisectingKMeansModel", path = "charact

#' Multivariate Gaussian Mixture Model (GMM)
#'
#' Fits multivariate gaussian mixture model against a Spark DataFrame, similarly to R's
#' Fits multivariate gaussian mixture model against a SparkDataFrame, similarly to R's
#' mvnormalmixEM(). Users can call \code{summary} to print a summary of the fitted model,
#' \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml}
#' to save/load fitted models.
Expand Down Expand Up @@ -314,7 +314,7 @@ setMethod("write.ml", signature(object = "GaussianMixtureModel", path = "charact

#' K-Means Clustering Model
#'
#' Fits a k-means clustering model against a Spark DataFrame, similarly to R's kmeans().
#' Fits a k-means clustering model against a SparkDataFrame, similarly to R's kmeans().
#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make
#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models.
#'
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/R/mllib_regression.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))

#' Generalized Linear Models
#'
#' Fits generalized linear model against a Spark DataFrame.
#' Fits generalized linear model against a SparkDataFrame.
#' Users can call \code{summary} to print a summary of the fitted model, \code{predict} to make
#' predictions on new data, and \code{write.ml}/\code{read.ml} to save/load fitted models.
#'
Expand Down Expand Up @@ -259,7 +259,7 @@ setMethod("write.ml", signature(object = "GeneralizedLinearRegressionModel", pat

#' Isotonic Regression Model
#'
#' Fits an Isotonic Regression model against a Spark DataFrame, similarly to R's isoreg().
#' Fits an Isotonic Regression model against a SparkDataFrame, similarly to R's isoreg().
#' Users can print, make predictions on the produced model and save the model to the input path.
#'
#' @param data SparkDataFrame for training.
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/vignettes/sparkr-vignettes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -923,9 +923,9 @@ The main method calls of actual computation happen in the Spark JVM of the drive

Two kinds of RPCs are supported in the SparkR JVM backend: method invocation and creating new objects. Method invocation can be done in two ways.

* `sparkR.invokeJMethod` takes a reference to an existing Java object and a list of arguments to be passed on to the method.
* `sparkR.callJMethod` takes a reference to an existing Java object and a list of arguments to be passed on to the method.

* `sparkR.invokeJStatic` takes a class name for static method and a list of arguments to be passed on to the method.
* `sparkR.callJStatic` takes a class name for static method and a list of arguments to be passed on to the method.

The arguments are serialized using our custom wire format which is then deserialized on the JVM side. We then use Java reflection to invoke the appropriate method.

Expand Down