-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support early stopping of prediction in CLI #565
Conversation
Just some questions:
|
Hello, I cannot take a look now but I'll do it first thing tomorrow in the morning, thanks for the PR! |
Early stopping at prediction time for classification does the following:
Choosing appropriate values for the parameters:
Let me know if there's anything unclear, thanks for the CLI integration :) |
@cbecker |
Yes, it works for me. I am using the pure C++ API whenever I can, and avoid the C api. About the latter, one thing I'm afraid of is that the predict() parameters are growing, and if we need to change something in the early prediction code we need to change all the python and R api, because otherwise it will crash because of missing C function arguments. Right now, if we were to change something about early stopping, we just need to modify Therefore I think it is safer and cleaner the way it is now, but if you think it's advantageous to have it in a single function that works for me too. |
@cbecker |
I see, that makes sense. I don't know about the speed penalty though. We'd be doing this for every sample we want to classify, right? |
@cbecker |
@@ -175,8 +175,11 @@ class Booster { | |||
} else { | |||
is_raw_score = false; | |||
} | |||
auto param = ConfigBase::Str2Map(parameter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really concerned about this. Do you know how long Str2Map() takes? We'll be doing this for each instance we want to classify. IMO it would be worth benchmarking the after/before to see how this 'hurts' performance when only a few trees are being used (with early stopping disabled)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just saw your comment about one-time-init, I will have a second look now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't call the c_api one-time for one sample, right ?
The prediction is called one-time for many instances.
I agree this makes sense when many samples are classified in a single call, which I guess is the goal of this function. For a bit I was confused with the PredictRaw() function, thanks for the clarification. |
yes, it makes sense. |
* fix multi-threading. * fix name style. * support in CLI version. * remove warnings. * Not default parameters. * fix if...else... . * fix bug. * fix warning. * refine c_api. * fix R-package. * fix R's warning. * fix tests. * fix pep8 .
@cbecker
I push many changes.
I think the type of early_stoping_instance can be auto inferred by the objective function.
Another problem is, should users really need to create the early_stop_instance by themselves ? Passing parameters
round_period
andthreshold_margin
to the predict function are not enough ?