-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
When plotting an extended pdf (i.e. a pdf that also makes a prediction for the number of events) together with a RooFit dataset, users expect that the plotted pdf is normalized to the number of predicted events, in order to see what the actual prediction is and meaningfully compare to the data.
However, the pdf is automatically scaled to the observed number of events in the dataset, so there is no way to get visual feedback on whether the predicted number of events makes sense. This is useful for models that don't make any prediction on the total number of events, but misleading otherwise. See also the following forum post where this came up:
https://root-forum.cern.ch/t/roofit-background-component-not-staying-constant-despite-rooconstvar-setconstant-true/63650/2
Usually, this problem stays under the radar because post-fit event number predictions usually match the observed number of events. But in the case where the shape is correlated with the normalization, this is not the case, and the resulting small difference between post-fit and observed number of events can make you hesitant and open a forum post at best, or in the worst case the scaling to the observed number of events makes you believe the fit was good while it was actually not.
One can work around this by introducing additional scale factors (see the Normalization command argument for RooAbsReal::plotOn()), but this should not be necessary. It would be better if extended pdfs are by default normalized to the predicted number of events when plotting, not to the observed number of events.
This is a good first issue to help with RooFit development, since the required changes should be little. The challenge is mostly to figure out where the scaling to the number of events in the datasets happens, and then add a code path for the case where the RooAbsPdf makes a prediction on the expected events (i.e. the return value of RooAbsPdf::expectedEvents() is non-zero.
Reproducer:
RooRealVar x{"x", "x", 0, 1};
RooRealVar n{"n", "n", 5000, 0, 20000};
x.setBins(1);
RooUniform pdf{"pdf", "pdf", x};
std::unique_ptr<RooAbsData> data{pdf.generateBinned(x, 10000)};
RooExtendPdf extPdf{"ext_pdf", "ext_pdf", pdf, n};
auto c1 = new TCanvas{"c1", "c1"};
auto frame = x.frame();
data->plotOn(frame);
extPdf.plotOn(frame);
// Should be the value of n
std::cout << frame->getCurve()->GetY()[1] << std::endl;
frame->Draw();
c1->SaveAs("plot.png");The output is:
10000
But we expect the curve to represent the actual number of expected events, which is 5000 in this reproducer.
