From 20582717fb8714384932fefe07db783b65374a26 Mon Sep 17 00:00:00 2001 From: "Adam B. Smith" Date: Wed, 10 Apr 2024 13:15:36 -0500 Subject: [PATCH] Add `trainESM()` --- DESCRIPTION | 4 +- NAMESPACE | 1 + NEWS.md | 5 +- R/enmSdmX.r | 3 +- R/trainESM.r | 145 +++++++++++++++++++++++++++++ R/trainGlm.r | 4 +- man/enmSdmX.Rd | 3 +- man/examples/trainESM_examples.r | 81 +++++++++++++++++ man/trainESM.Rd | 151 +++++++++++++++++++++++++++++++ man/trainGlm.Rd | 8 +- 10 files changed, 394 insertions(+), 11 deletions(-) create mode 100644 R/trainESM.r create mode 100644 man/examples/trainESM_examples.r create mode 100644 man/trainESM.Rd diff --git a/DESCRIPTION b/DESCRIPTION index d0882e8..8a1a60f 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: enmSdmX Type: Package Title: Species Distribution Modeling and Ecological Niche Modeling -Version: 1.1.4 -Date: 2023-03-06 +Version: 1.1.5 +Date: 2023-04-10 Authors@R: c( person( diff --git a/NAMESPACE b/NAMESPACE index 264bc50..4f37fdd 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -41,6 +41,7 @@ export(squareCellRast) export(summaryByCrossValid) export(trainBRT) export(trainByCrossValid) +export(trainESM) export(trainGAM) export(trainGLM) export(trainMaxEnt) diff --git a/NEWS.md b/NEWS.md index 2aa584e..b7db9ac 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,8 @@ +# enmSdmX 1.1.5 2023-04-10 +- Added function `trainESM()` for ensembles of small models. + # enmSdmX 1.1.3 2023-03-06 -- `trainGLM()`, `trainNS()`, and `trainEnmSdm()` now have options to automatically center and scale predictors. +- `trainGLM()`, `trainNS()`, and `predictEnmSdm()` now have options to automatically center and scale predictors. # enmSdmX 1.1.3 2023-02-02 - Removed dependency on `dismo`, replaced where possible by `predicts`; copied `gbm.step()` and `predict()` method for MaxEnt to `enmSdmX` as a momentary fix; would love a professional solution! diff --git a/R/enmSdmX.r b/R/enmSdmX.r index a928173..032af24 100644 --- a/R/enmSdmX.r +++ b/R/enmSdmX.r @@ -22,7 +22,8 @@ #' @section Model calibration: #' \code{\link{trainByCrossValid}}: and \code{\link{summaryByCrossValid}}: Implement a \code{trainXYZ} function across calibration folds (which are distinct from evaluation folds). \cr #' \code{\link{trainBRT}}: Boosted regression trees (BRTs) \cr -#' \code{\link{trainGAM}}: Generalized additive models (GAMs) \cr +#' \code{\link{trainESM}}: Ensembles of small models (ESMs) \cr +#' \code{\link{trainGLM}}: Generalized linear models (GLMs) \cr #' \code{\link{trainGLM}}: Generalized linear models (GLMs) \cr #' \code{\link{trainMaxEnt}}: MaxEnt models \cr #' \code{\link{trainMaxNet}}: MaxNet models diff --git a/R/trainESM.r b/R/trainESM.r new file mode 100644 index 0000000..f6491fe --- /dev/null +++ b/R/trainESM.r @@ -0,0 +1,145 @@ +#' Calibrate an ensemble of small models +#' +#' This function calibrates a set of "ensembles of small models" (ESM), which are designed for modeling species with few occurrence records. In the original formulation, each model has two covariates interacting additively. Models are calibrated using all possible combinations of covariates. By default, this function does the same, but can also include univariate models, models with two covariates plus their interaction term, and models with quadratic and corresponding linear terms. This function will \emph{only} train generalized linear models. Extending the types of algorithms is planned! +#' +#' @param data Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites. +#' @param resp Character or integer. Name or column index of response variable. Default is to use the first column in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in \code{data} as predictors. +#' @param scale Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations. +#' @param univariate,quadratic,interaction \code{TRUE} or \code{FALSE}: Whether or not to include univariate models, quadratic models, and/or models with 2-way interactions (default is \code{FALSE}). +#' @param interceptOnly If \code{TRUE}, include an intercept-only model (default is \code{FALSE}). +#' @param method Character: Name of function used to solve the GLM. For "normal" GLMs, this can be \code{'glm.fit'} (default), \code{'brglmFit'} (from the \pkg{brglm2} package), or another function. +#' @param w Weights. Any of: +#' \itemize{ +#' \item \code{TRUE}: Causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) +#' \item \code{FALSE}: Each datum is assigned a weight of 1. +#' \item A numeric vector of weights, one per row in \code{data}. +#' \item The name of the column in \code{data} that contains site weights. +#' } +#' @param family Character or function. Name of family for data error structure (see \code{\link[stats]{family}}). Default is to use the 'binomial' family. +#' @param ... Arguments to pass to \code{\link[stats]{glm}} +#' @param verbose Logical. If \code{TRUE} then display progress. +#' +#' @return A list object with several named elements: +#' \itemize{ +#' \item \code{models}: A list with each ESM model. +#' \item \code{tuning}: A \code{data.frame} with one row per model, in the order as they appear in \code{$models}. +#' } +#' +#' @references +#' Breiner, F.T., Guisan, A., Bergamini, A., and Nobis, M.P. 2015. Overcoming limitations of modeling rare species by using ensembles of small models. \emph{Methods in Ecology and Evolution} 6:1210-1218.. \doi{10.1111/2041-210X.12403} +#' Lomba, A., L. Pellissier, C. Randin, J. Vicente, J. Horondo, and A. Guisan. 2010. Overcoming the rare species modeling complex: A novel hierarchical framework applied to an Iberian endemic plant. \emph{Biological Conservation} 143:2647-2657. \doi{10.1016/j.biocon.2010.07.007} +#' +#' @seealso \code{\link[enmSdmX]{trainBRT}}, \code{\link[enmSdmX]{trainGAM}}, \code{\link[enmSdmX]{trainGLM}}, \code{\link[enmSdmX]{trainMaxEnt}}, \code{\link[enmSdmX]{trainMaxNet}}, \code{\link[enmSdmX]{trainNS}}, \code{\link[enmSdmX]{trainRF}}, \code{\link[enmSdmX]{trainByCrossValid}} +#' +#' @example man/examples/trainESM_examples.r +#' +#' @export +trainESM <- function( + data, + resp = names(data)[1], + preds = names(data)[2:ncol(data)], + univariate = FALSE, + quadratic = FALSE, + interaction = FALSE, + interceptOnly = FALSE, + method = 'glm.fit', + scale = NA, + w = TRUE, + family = stats::binomial(), + ..., + verbose = FALSE +) { + + # response and predictors + if (inherits(resp, c('integer', 'numeric'))) resp <- names(data)[resp] + if (inherits(preds, c('integer', 'numeric'))) preds <- names(data)[preds] + + # weights and scaling + w <- .calcWeights(w, data = data, resp = resp, family = family) + if (is.na(scale) || scale) { + scaleds <- .scalePredictors(scale, preds, data) + data <- scaleds$data + scales <- scaleds$scales + } + + # intercept-only + if (interceptOnly) { + tuning <- data.frame(intercept = 1, pred1 = NA, pred2 = NA) + } + + # bivariate + if (!exists('tuning', inherits = FALSE)) tuning <- data.frame() + for (i in 1:(length(preds) - 1)) { + for (j in (i + 1):length(preds)) { + tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds[i], pred2 = preds[j])) + } + } + + # univariate + if (univariate) { + tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds, pred2 = NA)) + } + + # interaction + if (interaction) { + + tuning$pred3 <- NA + for (i in 1:(length(preds) - 1)) { + for (j in (i + 1):length(preds)) { + tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds[i], pred2 = preds[j], pred3 = paste0(preds[i], ':', preds[j]))) + } + } + + } + + # quadratic + if (quadratic) { + + if (!any(names(tuning) == 'pred3')) tuning$pred3 <- NA + tuning$pred4 <- NA + + if (univariate) tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds, pred2 = paste0('I(', preds, '^2)'), pred3 = NA, pred4 = NA)) + + for (i in 1:(length(preds) - 1)) { + for (j in (i + 1):length(preds)) { + tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds[i], pred2 = preds[j], pred3 = paste0('I(', preds[i], '^2)'), pred4 = NA)) + } + } + for (i in 1:(length(preds) - 1)) { + for (j in (i + 1):length(preds)) { + tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds[i], pred2 = preds[j], pred3 = paste0('I(', preds[j], '^2)'), pred4 = NA)) + } + } + for (i in 1:(length(preds) - 1)) { + for (j in (i + 1):length(preds)) { + tuning <- rbind(tuning, data.frame(intercept = 1, pred1 = preds[i], pred2 = preds[j], pred3 = paste0('I(', preds[i], '^2)'), pred4 = paste0('I(', preds[i], '^2)'))) + } + } + } + + models <- list() + tuning$model <- NA + for (i in 1:nrow(tuning)) { + + form <- paste0(resp, ' ~') + for (j in 1:(ncol(tuning) - 1)) { + if (j == 1) { + if (!is.na(tuning[i, j])) form <- paste(form, tuning[i, j]) + } else { + if (!is.na(tuning[i, j])) form <- paste0(form, ' + ', tuning[i, j]) + } + } + tuning$model[i] <- form + + if (verbose) omnibus::say(form) + + form <- stats::as.formula(form) + args <- list(formula = form, data = data, weights = w, method = method, family = family, ...) + models[[length(models) + 1]] <- do.call(stats::glm, args = args) + + } + + list(models = models, tuning = tuning) + +} diff --git a/R/trainGlm.r b/R/trainGlm.r index 003ec51..a25882b 100644 --- a/R/trainGlm.r +++ b/R/trainGlm.r @@ -7,7 +7,7 @@ #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. #' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. -#' @param scale Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do it's operations. +#' @param scale Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations. #' @param family Name of family for data error structure (see \code{\link[stats]{family}}). Default is to use the 'binomial' family. #' @param construct Logical. If \code{TRUE} (default) then construct model from individual terms entered in order from lowest to highest AICc up to limits set by \code{presPerTermInitial} or \code{maxTerms} is met. If \code{FALSE} then the "full" model consists of all terms allowed by \code{quadratic} and \code{interaction}. #' @param select Logical. If \code{TRUE} (default) then calculate AICc for all possible subsets of models and return the model with the lowest AICc of these. This step if performed \emph{after} model construction (if any). @@ -51,8 +51,8 @@ trainGLM <- function( select = TRUE, quadratic = TRUE, interaction = TRUE, - method = 'glm.fit', interceptOnly = TRUE, + method = 'glm.fit', presPerTermInitial = 10, presPerTermFinal = 10, maxTerms = 8, diff --git a/man/enmSdmX.Rd b/man/enmSdmX.Rd index 5c789c6..d1a83f8 100644 --- a/man/enmSdmX.Rd +++ b/man/enmSdmX.Rd @@ -35,7 +35,8 @@ Create an issue on \href{https://github.com/adamlilith/enmSdmX/issues}{GitHub}. \code{\link{trainByCrossValid}}: and \code{\link{summaryByCrossValid}}: Implement a \code{trainXYZ} function across calibration folds (which are distinct from evaluation folds). \cr \code{\link{trainBRT}}: Boosted regression trees (BRTs) \cr - \code{\link{trainGAM}}: Generalized additive models (GAMs) \cr + \code{\link{trainESM}}: Ensembles of small models (ESMs) \cr + \code{\link{trainGLM}}: Generalized linear models (GLMs) \cr \code{\link{trainGLM}}: Generalized linear models (GLMs) \cr \code{\link{trainMaxEnt}}: MaxEnt models \cr \code{\link{trainMaxNet}}: MaxNet models diff --git a/man/examples/trainESM_examples.r b/man/examples/trainESM_examples.r new file mode 100644 index 0000000..f51c714 --- /dev/null +++ b/man/examples/trainESM_examples.r @@ -0,0 +1,81 @@ +# NB: The examples below show a very basic modeling workflow. They have been +# designed to work fast, not produce accurate, defensible models. They can +# take a few minutes to run. + +library(mgcv) +library(sf) +library(terra) +set.seed(123) + +### setup data +############## + +# environmental rasters +rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') +madClim <- rast(rastFile) + +# coordinate reference system +wgs84 <- getCRS('WGS84') + +# lemur occurrence data +data(lemurs) +occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] +occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) + +occs <- elimCellDuplicates(occs, madClim) + +occEnv <- extract(madClim, occs, ID = FALSE) +occEnv <- occEnv[complete.cases(occEnv), ] + +# create 10000 background sites (or as many as raster can support) +bgEnv <- terra::spatSample(madClim, 20000) +bgEnv <- bgEnv[complete.cases(bgEnv), ] +bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] + +# collate occurrences and background sites +presBg <- data.frame( + presBg = c( + rep(1, nrow(occEnv)), + rep(0, nrow(bgEnv)) + ) +) + +env <- rbind(occEnv, bgEnv) +env <- cbind(presBg, env) + +predictors <- c('bio1', 'bio12') + +### calibrate models +#################### + +# "traditional" ESMs with just 2 linear predictors +# just one model in this case because we have just 2 predictors +esm1 <- trainESM( + data = env, + resp = 'presBg', + preds = predictors, + family = stats::binomial(), + scale = TRUE, + w = TRUE +) + +str(esm1, 1) +esm1$tuning + +# extended ESM with other kinds of terms +esm2 <- trainESM( + data = env, + resp = 'presBg', + preds = predictors, + univariate = TRUE, + quadratic = TRUE, + interaction = TRUE, + interceptOnly = TRUE, + family = stats::binomial(), + scale = TRUE, + w = TRUE, + verbose = TRUE +) + +str(esm2, 1) +esm2$tuning diff --git a/man/trainESM.Rd b/man/trainESM.Rd new file mode 100644 index 0000000..5da455d --- /dev/null +++ b/man/trainESM.Rd @@ -0,0 +1,151 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/trainESM.r +\name{trainESM} +\alias{trainESM} +\title{Calibrate an ensemble of small models} +\usage{ +trainESM( + data, + resp = names(data)[1], + preds = names(data)[2:ncol(data)], + univariate = FALSE, + quadratic = FALSE, + interaction = FALSE, + interceptOnly = FALSE, + method = "glm.fit", + scale = NA, + w = TRUE, + family = stats::binomial(), + ..., + verbose = FALSE +) +} +\arguments{ +\item{data}{Data frame or matrix. Response variable and environmental predictors (and no other fields) for presences and non-presence sites.} + +\item{resp}{Character or integer. Name or column index of response variable. Default is to use the first column in \code{data}.} + +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in \code{data} as predictors.} + +\item{univariate, quadratic, interaction}{\code{TRUE} or \code{FALSE}: Whether or not to include univariate models, quadratic models, and/or models with 2-way interactions (default is \code{FALSE}).} + +\item{interceptOnly}{If \code{TRUE}, include an intercept-only model (default is \code{FALSE}).} + +\item{method}{Character: Name of function used to solve the GLM. For "normal" GLMs, this can be \code{'glm.fit'} (default), \code{'brglmFit'} (from the \pkg{brglm2} package), or another function.} + +\item{scale}{Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations.} + +\item{w}{Weights. Any of: +\itemize{ +\item \code{TRUE}: Causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) + \item \code{FALSE}: Each datum is assigned a weight of 1. + \item A numeric vector of weights, one per row in \code{data}. + \item The name of the column in \code{data} that contains site weights. +}} + +\item{family}{Character or function. Name of family for data error structure (see \code{\link[stats]{family}}). Default is to use the 'binomial' family.} + +\item{...}{Arguments to pass to \code{\link[stats]{glm}}} + +\item{verbose}{Logical. If \code{TRUE} then display progress.} +} +\value{ +A list object with several named elements: +\itemize{ + \item \code{models}: A list with each ESM model. + \item \code{tuning}: A \code{data.frame} with one row per model, in the order as they appear in \code{$models}. +} +} +\description{ +This function calibrates a set of "ensembles of small models" (ESM), which are designed for modeling species with few occurrence records. In the original formulation, each model has two covariates interacting additively. Models are calibrated using all possible combinations of covariates. By default, this function does the same, but can also include univariate models, models with two covariates plus their interaction term, and models with quadratic and corresponding linear terms. This function will \emph{only} train generalized linear models. Extending the types of algorithms is planned! +} +\examples{ +# NB: The examples below show a very basic modeling workflow. They have been +# designed to work fast, not produce accurate, defensible models. They can +# take a few minutes to run. + +library(mgcv) +library(sf) +library(terra) +set.seed(123) + +### setup data +############## + +# environmental rasters +rastFile <- system.file('extdata/madClim.tif', package='enmSdmX') +madClim <- rast(rastFile) + +# coordinate reference system +wgs84 <- getCRS('WGS84') + +# lemur occurrence data +data(lemurs) +occs <- lemurs[lemurs$species == 'Eulemur fulvus', ] +occs <- vect(occs, geom=c('longitude', 'latitude'), crs=wgs84) + +occs <- elimCellDuplicates(occs, madClim) + +occEnv <- extract(madClim, occs, ID = FALSE) +occEnv <- occEnv[complete.cases(occEnv), ] + +# create 10000 background sites (or as many as raster can support) +bgEnv <- terra::spatSample(madClim, 20000) +bgEnv <- bgEnv[complete.cases(bgEnv), ] +bgEnv <- bgEnv[1:min(10000, nrow(bgEnv)), ] + +# collate occurrences and background sites +presBg <- data.frame( + presBg = c( + rep(1, nrow(occEnv)), + rep(0, nrow(bgEnv)) + ) +) + +env <- rbind(occEnv, bgEnv) +env <- cbind(presBg, env) + +predictors <- c('bio1', 'bio12') + +### calibrate models +#################### + +# "traditional" ESMs with just 2 linear predictors +# just one model in this case because we have just 2 predictors +esm1 <- trainESM( + data = env, + resp = 'presBg', + preds = predictors, + family = stats::binomial(), + scale = TRUE, + w = TRUE +) + +str(esm1, 1) +esm1$tuning + +# extended ESM with other kinds of terms +esm2 <- trainESM( + data = env, + resp = 'presBg', + preds = predictors, + univariate = TRUE, + quadratic = TRUE, + interaction = TRUE, + interceptOnly = TRUE, + family = stats::binomial(), + scale = TRUE, + w = TRUE, + verbose = TRUE +) + +str(esm2, 1) +esm2$tuning +} +\references{ +Breiner, F.T., Guisan, A., Bergamini, A., and Nobis, M.P. 2015. Overcoming limitations of modeling rare species by using ensembles of small models. \emph{Methods in Ecology and Evolution} 6:1210-1218.. \doi{10.1111/2041-210X.12403} +Lomba, A., L. Pellissier, C. Randin, J. Vicente, J. Horondo, and A. Guisan. 2010. Overcoming the rare species modeling complex: A novel hierarchical framework applied to an Iberian endemic plant. \emph{Biological Conservation} 143:2647-2657. \doi{10.1016/j.biocon.2010.07.007} +} +\seealso{ +\code{\link[enmSdmX]{trainBRT}}, \code{\link[enmSdmX]{trainGAM}}, \code{\link[enmSdmX]{trainGLM}}, \code{\link[enmSdmX]{trainMaxEnt}}, \code{\link[enmSdmX]{trainMaxNet}}, \code{\link[enmSdmX]{trainNS}}, \code{\link[enmSdmX]{trainRF}}, \code{\link[enmSdmX]{trainByCrossValid}} +} diff --git a/man/trainGlm.Rd b/man/trainGlm.Rd index cf98f74..31cd176 100644 --- a/man/trainGlm.Rd +++ b/man/trainGlm.Rd @@ -13,8 +13,8 @@ trainGLM( select = TRUE, quadratic = TRUE, interaction = TRUE, - method = "glm.fit", interceptOnly = TRUE, + method = "glm.fit", presPerTermInitial = 10, presPerTermFinal = 10, maxTerms = 8, @@ -33,7 +33,7 @@ trainGLM( \item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} -\item{scale}{Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do it's operations.} +\item{scale}{Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations.} \item{construct}{Logical. If \code{TRUE} (default) then construct model from individual terms entered in order from lowest to highest AICc up to limits set by \code{presPerTermInitial} or \code{maxTerms} is met. If \code{FALSE} then the "full" model consists of all terms allowed by \code{quadratic} and \code{interaction}.} @@ -43,10 +43,10 @@ trainGLM( \item{interaction}{Logical. Used only if \code{construct} is \code{TRUE}. If \code{TRUE} (default) then include 2-way interaction terms (including interactions between factor predictors).} -\item{method}{Character: Name of function used to solve the GLM. For "normal" GLMs, this can be \code{'glm.fit'} (default), \code{'brglmFit'} (from the \pkg{brglm2} package), or another function.} - \item{interceptOnly}{If \code{TRUE} (default) and model selection is enabled, then include an intercept-only model.} +\item{method}{Character: Name of function used to solve the GLM. For "normal" GLMs, this can be \code{'glm.fit'} (default), \code{'brglmFit'} (from the \pkg{brglm2} package), or another function.} + \item{presPerTermInitial}{Positive integer. Minimum number of presences needed per model term for a term to be included in the model construction stage. Used only is \code{construct} is TRUE.} \item{presPerTermFinal}{Positive integer. Minimum number of presence sites per term in initial starting model. Used only if \code{select} is \code{TRUE}.}