Description
Problem Description
Azure ML Python SDK documentation has provided numerous options to pass data between training pipelines, but currently the recommended option appears to be azureml.data.OutputFileDatasetConfig
.
However, azureml.data.OutputFileDatasetConfig
has a limitation that it cannot be accepted as a valid input for the inputs
parameter for all the classes in azureml.pipeline.steps
- e.g. PythonScriptStep
and HyperDriveStep
.
To define the OutputFileDatasetConfig
as an input of a pipeline step, the function as_input()
has to be called on the object, and the function is not called if the OutputFileDatasetConfig
is used as an output of a pipeline step.
This is extremely convoluted, as it clearly suggests that the OutputFileDatasetConfig
was originally designed only as an output to a pipeline step.
Proposed solution
- The name of the class should be changed -
OutputFileDatasetConfig
suggests that it is meant only as an output, and it is some kind of a config file to be used by internal classes (which it clearly is not). If the intention is to use it also as the input to downstream pipeline steps then the name should reflect that. - Allow this class to be used in the
inputs
parameter for all classes inazureml.pipeline.steps
. Theazureml.pipeline.core.PipelineData
class allows the user to specify it as both the input and output of a pipeline step. However, it is not the recommended approach.PipelineData
is also a much better name for a class that transfer data between pipeline steps. - Alternatively to point 2. above, please remove the
inputs
andoutputs
parameters for all classes inazureml.pipeline.steps
and enforce that inputs be declared withas_input()
and outputs asas_output()
.