Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GBM fails when doing quantile regression #309

Open
zachmayer opened this issue Nov 5, 2015 · 9 comments
Open

GBM fails when doing quantile regression #309

zachmayer opened this issue Nov 5, 2015 · 9 comments

Comments

@zachmayer
Copy link
Collaborator

library(caret)
library(gbm)
data(iris)
X <- iris[,2:4]
Y <- iris[,1]

gbmFit1 <- train(
  X, Y,
  method = "gbm", verbose=FALSE,
  distribution = list(name="quantile",alpha=0.25),
  trControl = trainControl(method = "cv")
)

I think maybe caret isn't properly handling the predictions coming from the quantile regression GBM, but am not sure.

@zachmayer
Copy link
Collaborator Author

> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-apple-darwin14.5.0 (64-bit)
Running under: OS X 10.10.5 (Yosemite)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plyr_1.8.3      gbm_2.1.1       survival_2.38-3 caret_6.0-58    ggplot2_1.0.1   lattice_0.20-33

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.1        magrittr_1.5       MASS_7.3-44        munsell_0.4.2      colorspace_1.2-6   foreach_1.4.3      minqa_1.2.4       
 [8] stringr_1.0.0      car_2.1-0          tools_3.2.2        nnet_7.3-11        pbkrtest_0.4-2     grid_3.2.2         gtable_0.1.2      
[15] nlme_3.1-122       mgcv_1.8-8         quantreg_5.19      MatrixModels_0.4-1 iterators_1.0.8    lme4_1.1-10        digest_0.6.8      
[22] Matrix_1.2-2       nloptr_1.0.4       reshape2_1.4.1     codetools_0.2-14   stringi_1.0-1      compiler_3.2.2     scales_0.3.0      
[29] stats4_3.2.2       SparseM_1.7        proto_0.3-10    

@topepo
Copy link
Owner

topepo commented Nov 6, 2015

I think that it is fixed now. Please test. Also:

  • it will only work for one quantile value
  • it will calculate performance on that quantile. For your example, the RMSE and other metrics will be comparing the actual to the 25th percentile

@zachmayer
Copy link
Collaborator Author

Thanks! I'll check it out.

@topepo
Copy link
Owner

topepo commented Nov 19, 2015

Did this work?

@zachmayer
Copy link
Collaborator Author

I installed master with: devtools::install_github('topepo/caret/pkg/caret@master') and re-ran:

library(caret)
library(gbm)
data(iris)
X <- iris[,2:4]
Y <- iris[,1]

gbmFit1 <- train(
  X, Y,
  method = "gbm", verbose=FALSE,
  distribution = list(name="quantile",alpha=0.25),
  trControl = trainControl(method = "cv")
)

But I still got an error:

Error in { : 
  task 1 failed - "arguments imply differing number of rows: 3, 0"

I think the problem is that I'm providing distribution as a list: list(name="quantile",alpha=0.25), rather than a character variable: quantile.

This will also be a problem for pairwise metrics, e.g. distribution=list(name="pairwise",group=iris$Species,metric='mrr')

@zachmayer
Copy link
Collaborator Author

Interesting. It works if you specify trainControl(method = 'none'), but fails if you specify trainControl(method = 'cv', number=5).

@zachmayer
Copy link
Collaborator Author

I tried all the GBM distributions, with interesting results:

set.seed(1)
library(caret)
library(gbm)
dat <- twoClassSim()
X <- dat[,1:15]
Y <- as.integer(dat[,16]) - 1

ctrl <- trainControl(method = 'cv', number=5)

Working:

train(
  X, Y, method='gbm', distribution='gaussian', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, Y, method='gbm', distribution='laplace', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, Y, method='gbm', distribution='tdist', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, Y, method='gbm', distribution='poisson', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, factor(Y), method='gbm', distribution='bernoulli', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, factor(Y), method='gbm', distribution='huberized', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, factor(Y), method='gbm', distribution='adaboost', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, Y, method='gbm', distribution=list(name="tdist", df=8), verbose=FALSE,
  trControl=ctrl, tuneLength=1
)

Not working:

train(
  X, Y, method='gbm', distribution=list(name="quantile",alpha=0.25), verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, Y, method='gbm', distribution=list(name="pairwise", group=1, metric='mrr'), verbose=FALSE,
  trControl=ctrl, tuneLength=1
)
train(
  X, Surv(Y), method='gbm', distribution='coxph', verbose=FALSE,
  trControl=ctrl, tuneLength=1
)

So quantile, pairwise, and survival models don't work at the moment.

@zachmayer
Copy link
Collaborator Author

FYI, here's the gbm.fit code for all of the above models:

gbm.fit(X, Y, distribution='gaussian', verbose=FALSE)
gbm.fit(X, Y, distribution='laplace', verbose=FALSE)
gbm.fit(X, Y, distribution='tdist', verbose=FALSE)
gbm.fit(X, Y, distribution='poisson', verbose=FALSE)
gbm.fit(X, factor(Y), distribution='bernoulli', verbose=FALSE)
gbm.fit(X, factor(Y), distribution='huberized', verbose=FALSE)
gbm.fit(X, factor(Y), distribution='adaboost', verbose=FALSE)
gbm.fit(X, Y, distribution=list(name="tdist", df=8), verbose=FALSE)
gbm.fit(X, Y, distribution=list(name="quantile",alpha=0.25), verbose=FALSE)
gbm.fit(X, Y, distribution=list(name="pairwise", group=1, metric='mrr'), verbose=FALSE)
gbm.fit(X, Surv(Y), distribution='coxph', verbose=FALSE)

You can see they produce models

@scworland
Copy link

scworland commented Jul 25, 2016

Was this ever resolved? I am still receiving a similar error when using certain distributions and gbm. This will work:

devtools::install_github("gbm-developers/gbm")
fit1 <- gbm.fit(x,y,distribution="gamma")

but this returns an error:

library(caret)
fit2 <- train(x,y, method='gbm', distribution='gamma', trControl=ctrl, tuneLength=1)

task 1 failed - "arguments imply differing number of rows: 3, 0"

The last model will run fine if the distribution is changed to 'gaussian'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants