-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
Problem Description
Azure ML Python SDK documentation has provided numerous options to pass data between training pipelines, but currently the recommended option appears to be azureml.data.OutputFileDatasetConfig.
However, azureml.data.OutputFileDatasetConfig has a limitation that it cannot be accepted as a valid input for the inputs parameter for all the classes in azureml.pipeline.steps - e.g. PythonScriptStep and HyperDriveStep.
To define the OutputFileDatasetConfig as an input of a pipeline step, the function as_input() has to be called on the object, and the function is not called if the OutputFileDatasetConfig is used as an output of a pipeline step.
This is extremely convoluted, as it clearly suggests that the OutputFileDatasetConfig was originally designed only as an output to a pipeline step.
Proposed solution
- The name of the class should be changed -
OutputFileDatasetConfigsuggests that it is meant only as an output, and it is some kind of a config file to be used by internal classes (which it clearly is not). If the intention is to use it also as the input to downstream pipeline steps then the name should reflect that. - Allow this class to be used in the
inputsparameter for all classes inazureml.pipeline.steps. Theazureml.pipeline.core.PipelineDataclass allows the user to specify it as both the input and output of a pipeline step. However, it is not the recommended approach.PipelineDatais also a much better name for a class that transfer data between pipeline steps. - Alternatively to point 2. above, please remove the
inputsandoutputsparameters for all classes inazureml.pipeline.stepsand enforce that inputs be declared withas_input()and outputs asas_output().