Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the missing value in xgbTree training with caret #573

Open
jianqin123 opened this issue Jan 12, 2017 · 4 comments
Open

the missing value in xgbTree training with caret #573

jianqin123 opened this issue Jan 12, 2017 · 4 comments

Comments

@jianqin123
Copy link

I try to build a binary classifier modeling with xgbTree and tuning parames with caret.If I put the data having missing value into train funciton,I get the error information.

Error in na.fail.default(list(label = c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,  : 
 missing values in object 

If I put the data pre-processed by na.omit() or set na.action=na.pass in train function ,there is no error.In this page https://github.com/topepo/caret/pull/512/commits/b68df5794c2742ba2b1767850a356c6e81b044fa ,It tells it is the source code makes this error.So how can I use this to solve my problem .na.omit and set na.action=na.pass is not I intended.
Thanks in advance !

@topepo
Copy link
Owner

topepo commented Jan 12, 2017

As of version 6.0-71:

A significant bug was fixed where the internals of how R creates a model matrix was ignoring na.action when the default was set to na.fail (issue #461). This means that train will now immediately fail if there are any missing data. To use imputation, use na.action = na.pass and the imputation method of your choice in the preProcess argument. Also, a warning is issued if the user asks for imputation but uses the formula method and excludes missing data in na.action

So you should use use na.action = na.pass if xgboost can deal with the missing data

@jianqin123
Copy link
Author

If I use na.action =na.pass,the information shows the models is trained by the same number samples as pre-rocessed by na.omit.

I set na.action=na.pass in trainXgbModel ,and 2 rows in train1[1:100,] have missing values

> xgb1<-trainXgbModel(train1[1:100,],3,0.05,0.1,3:10,0.7,0.8,2.5)
> xgb1
eXtreme Gradient Boosting 

 **98 samples**
233 predictors
  2 classes: 'n', 'p'

remove na.action=na.pass

> xgb1<-trainXgbModel(na.omit(train1[1:100,]),3,0.05,0.1,3:10,0.7,0.8,2.5)
> xgb1
eXtreme Gradient Boosting 

 **98 samples**
233 predictors
  2 classes: 'n', 'p' 

then the sample number both equals to 98,So I wonder if set na.action=na.pass is equal to na.omit(train1[1:100,]).

@topepo
Copy link
Owner

topepo commented Jan 13, 2017

That's helpful but I'll need a reproducible example to figure out what is going on.

@strakaps
Copy link

This thread was very useful, thanks! @topepo has there been any movement with this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants