Closed
Description
In LogisticRegression
and PoissonRegression
(which use the same L-BFGS base), we have a parameter IterationsToRemember
that refers to the number of gradients to accumulate in the history. While this terminology makes sense, it's not what we usually encounter in the field.
In the literature, we see this referred to as the "history size" (see e.g. wikipedia).
In the various toolkits that expose an L-BFGS solver, they use:
Scikit Learn: Doesn't expose it.
Spark: NumberOfCorrections
TensorFlow: num_correction_pairs
PyTorch: history_size
I would vote for HistorySize
, with the docs explaining what it is.
What do you all think? Any big feelings around this?