-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing prediction methods to be more standardized #167
Conversation
Catchup to upstream
…nto robustCaretList
@zachmayer This is a bit of a large PR, so just ping me with a comment when you have a chance to review it. I can answer any questions you have hopefully and am also willing to make whatever changes you think make sense. |
I'd also recommend changing the Git Check settings to only pick up git checks on the "NOT_CRAN" version of |
@zachmayer Just a ping to see about moving this forward. I think it'll solve some of the recently reported issues like #172 . |
This file has a couple untested lines: https://coveralls.io/builds/3539104/source?filename=R%2FcaretEnsemble.R Could you add one or 2 more tests to hit the red lines? Meanwhile, I've gotta go figure out why our travis builds are failing. |
args <- lapply(args, eval, parent.frame()) # convert from symbols to objects | ||
if(exists("newdata", args)){ | ||
# tmp <- args$newdata | ||
if(anyNA(args$newdata)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this line means that we will never get to the "Missing data found..." keepNA
argument below, but I can't be sure because the data could be complete, but for some functions it could still produce an NA prediction. This seems pretty difficult to identify a test case for.
Fixed the build issue, wooooooo |
Also, instead of merging master, you can also rebase off master and then force push. It's a little bit harder to do, but makes for a slightly nicer commit history. |
OK, I think I did the rebase, but I'm not sure. At any rate we're even with master again. I think what needs to happen is I need to properly deprecate |
👍 from me. I'm re-testing right now, and will merge when all the tests pass |
OK one last push to test my deprecation of |
@zachmayer It builds, but the coverage check is a little wonky. I can't get those deprecated bits to be tested, but I don't want to junk them because legacy code may use them (I'm speaking selfishly here). If we merge this, I'm happy to figure out the merge conflicts it might cause with the other PR. |
I edited the thresholds and re-ran |
@zachmayer Looks like we're good to go. Thanks! After this gets merged in, I'll re-base and start tackling the two issues that cropped up related to prediction #172 and #171. This closes #167. |
Awesome! I'm away for the weekend but will merge asap so we can move on! |
Fixing prediction methods to be more standardized
This PR addresses #165 and #158 by making sure that
predict.caretEnsemble
andpredict.caretStack
provide similar output and have similar inputs. There are also large changes to the unit tests to make sure that the tests now test the correct structure for the output from these two functions.predict.caretStack
is particularly heavily modified here -- it now can produce prediction errors based on the model disagreement about each observation, mimicking the behavior ofpredict.caretEnsemble
. It uses the variable importance of each models' predictions in the stacked model to define the weights. When these predictions are transformed before ensembling, no standard error is returned or calculated, but the variable importance can still be returned as aweights
attribute to the outputted vector.