Skip to content

Please rework the pipeline interactions with azureml.data.OutputFileDatasetConfig #23565

Closed

Description

Problem Description

Azure ML Python SDK documentation has provided numerous options to pass data between training pipelines, but currently the recommended option appears to be azureml.data.OutputFileDatasetConfig.

However, azureml.data.OutputFileDatasetConfig has a limitation that it cannot be accepted as a valid input for the inputs parameter for all the classes in azureml.pipeline.steps - e.g. PythonScriptStep and HyperDriveStep.

To define the OutputFileDatasetConfig as an input of a pipeline step, the function as_input() has to be called on the object, and the function is not called if the OutputFileDatasetConfig is used as an output of a pipeline step.

This is extremely convoluted, as it clearly suggests that the OutputFileDatasetConfig was originally designed only as an output to a pipeline step.

Proposed solution

  1. The name of the class should be changed - OutputFileDatasetConfig suggests that it is meant only as an output, and it is some kind of a config file to be used by internal classes (which it clearly is not). If the intention is to use it also as the input to downstream pipeline steps then the name should reflect that.
  2. Allow this class to be used in the inputs parameter for all classes in azureml.pipeline.steps. The azureml.pipeline.core.PipelineData class allows the user to specify it as both the input and output of a pipeline step. However, it is not the recommended approach. PipelineData is also a much better name for a class that transfer data between pipeline steps.
  3. Alternatively to point 2. above, please remove the inputs and outputs parameters for all classes in azureml.pipeline.steps and enforce that inputs be declared with as_input() and outputs as as_output().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.ML-PipelinesAreaPathService AttentionWorkflow: This issue is responsible by Azure service team.customer-reportedIssues that are reported by GitHub users external to the Azure organization.needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK teamquestionThe issue doesn't require a change to the product in order to be resolved. Most issues start as that

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions