Description
Consider this class, the settings object for TextLoader
:
This descends from this class.
Why establish this class relationship? Well, because we want to distinguish between arguments that are "core" vs. not, and so that should be retained when we save the "header" of the text file, vs. those that might vary from iteration to iteration.
This class is used in two places for this purpose, in two places exactly.
machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs
Lines 1193 to 1195 in faffd17
machinelearning/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs
Lines 1230 to 1236 in faffd17
Back when these classes were written and meant to support a command line and GUI tool only, it was acceptable to use class relationships for this purpsoe -- we did not expose this class to the users via an API. Now that we do expose it through an API, this little "trick" is no longer acceptable and causes confusion. There are only three "special" non-core arguments to account for, surely we can handle their presence through some mechanism other than this odd pollution of our type hierarchy (which is visible to users necessarily), and instead just handle it in the code for the saving/loading of the header itself. (That is, the load/save code could just account for the three arguments directly, instead of working in this strange way through the command line processor.)
The end result of this should be there should be only one class, Arguments
, containing everything that is now in these two classes. It is also essential that the arguments presently occurring only in the Arguments
class at present be excluded from the header and header parsing code.
There are several ways we could imagine doing this.
-
The most obvious is to just special case this code in the
TextLoader
code itself. -
Another possibility is we add another attribute to the command line processing code itself, to capture those arguments that are meant to capture purely runtime and not behavioral considerations. Indeed, this happens in other contexts: those components that benefit from GPU acceleration might, naturally, have the GPU device ID as a configuration parameter, but if we were to ask this component to describe its configuration, behaviorally we might want it to exclude that configuration, since that is not portable from one computer or platform to another. (Which is the purpose of the current arrangement.) Rather than special casing this, as suggested above, we could have another (internal!!) attribute to flag such arguments as these.
I give two options because O am not too particular as to how. I might favor 1 until we gain more experience in scenario 2 so as to justify a more general solution. Though, perhaps we have already reached that point, since I know scenario 2 that I have described has already come up, though in situations less central and important than the text loader.
/cc @stephentoub