TextLoader Arguments/ArgumentsCore classes

Consider this class, the settings object for `TextLoader`:

https://github.com/dotnet/machinelearning/blob/faffd179c961f120ec4e6babb06bbfb2cca6a6ea/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs#L382

This descends from this class.

https://github.com/dotnet/machinelearning/blob/faffd179c961f120ec4e6babb06bbfb2cca6a6ea/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs#L337

Why establish this class relationship? Well, because we want to distinguish between arguments that are "core" vs. not, and so that should be retained when we save the "header" of the text file, vs. those that might vary from iteration to iteration.

This class is used in two places for this purpose, in two places exactly.

https://github.com/dotnet/machinelearning/blob/faffd179c961f120ec4e6babb06bbfb2cca6a6ea/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs#L1193-L1195

https://github.com/dotnet/machinelearning/blob/faffd179c961f120ec4e6babb06bbfb2cca6a6ea/src/Microsoft.ML.Data/DataLoadSave/Text/TextLoader.cs#L1230-L1236

Back when these classes were written and meant to support a command line and GUI tool only, it was acceptable to use class relationships for this purpsoe -- we did not expose this class to the users via an API. Now that we do expose it through an API, this little "trick" is no longer acceptable and causes confusion. There are only three "special" non-core arguments to account for, surely we can handle their presence through some mechanism other than this odd pollution of our type hierarchy (which is visible to users necessarily), and instead just handle it in the code for the saving/loading of the header itself. (That is, the load/save code could just account for the three arguments directly, instead of working in this strange way through the command line processor.)

The end result of this should be there should be only one class, `Arguments`, containing everything that is now in these two classes. It is also essential that the arguments presently occurring *only* in the `Arguments` class at present be excluded from the header and header parsing code.

There are several ways we could imagine doing this.

1. The most obvious is to just special case this code in the `TextLoader` code itself.

2. Another possibility is we add another attribute to the command line processing code itself, to capture those arguments that are meant to capture purely runtime and not behavioral considerations. Indeed, this happens in other contexts: those components that benefit from GPU acceleration might, naturally, have the GPU device ID as a configuration parameter, but if we were to ask this component to describe its configuration, behaviorally we might want it to *exclude* that configuration, since that is not portable from one computer or platform to another. (Which is the purpose of the current arrangement.) Rather than special casing this, as suggested above, we could have another (internal!!) attribute to flag such arguments as these.

I give two options because O am not too particular as to how. I might favor 1 until we gain more experience in scenario 2 so as to justify a more general solution. Though, perhaps we have already reached that point, since I know scenario 2 that I have described has already come up, though in situations less central and important than the text loader.

/cc @stephentoub 

	var argsNew = new Arguments();
	// Copy the non-core arguments to the new args (we already know that all the core arguments are default).
	var parsed = CmdParser.ParseArguments(host, CmdParser.GetSettings(host, args, new Arguments()), argsNew);
	ch.Assert(parsed);
	// Copy the core arguments to the new args.
	if (!CmdParser.ParseArguments(host, loader.GetSettingsString(), argsNew, typeof(ArgumentsCore), msg => ch.Error(msg)))
	goto LDone;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TextLoader Arguments/ArgumentsCore classes #2046

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// Verify that the current schema-defining arguments are default.
	// Get settings just for core arguments, not everything.
	string tmp = CmdParser.GetSettings(host, args, new ArgumentsCore());

TextLoader Arguments/ArgumentsCore classes #2046

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions