Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-pythonic names for container component's output breaks loading of graph components #4514

Closed
Udiknedormin opened this issue Sep 18, 2020 · 1 comment

Comments

@Udiknedormin
Copy link
Contributor

What steps did you take:

  1. Define yaml for container component having an output with non-pythonic name (e.g. out-1).
  2. Define yaml for graph component containing that container as its task and passing its output as the graph output.
  3. Compile inside of pipeline.

Could be illustrated by the following test to component_bridge_tests.py:

def test_nonpythonic_container_output_handled_by_graph(self):
    component_a = textwrap.dedent('''\
      inputs: []
      outputs:
        - {name: out1, type: str}
      implementation:
        graph:
          tasks:
            some-container:
              arguments: {}
              componentRef:
                spec:
                  outputs:
                  - {name: out-1, type: str}
                  implementation:
                    container:
                      image: busybox
                      command: [bash, -c, 'mkdir -p "$(dirname "$0")"; date > "$0"', {outputPath: out-1}]
          outputValues:
            out1:
              taskOutput:
                taskId: some-container
                outputName: out-1
    ''')
    component_b = textwrap.dedent('''\
        inputs:
        - {name: in1, type: str}
        implementation:
          container:
            image: busybox
            command: [echo, {inputValue: in1}]
    ''')
    task_factory_a = load_component_from_text(component_a)
    task_factory_b = load_component_from_text(component_b)
    a_task = task_factory_a()
    b_task = task_factory_b(in1=a_task.outputs['out1'])

What happened:

Compiler fails with KeyError on the non-pythonic output name.

What did you expect to happen:

Should compile just fine, just like it does for pythonic names (e.g. out1 or out_1).

Environment:

Environment-independent.

KFP SDK version: 1.0.0

Anything else you would like to add:

From _component_bridge.py (for current master's HEAD: d4e7398917):

# Previously, ContainerOp had strict requirements for the output names, so we had to
# convert all the names before passing them to the ContainerOp constructor.
# Outputs with non-pythonic names could not be accessed using their original names.
# Now ContainerOp supports any output names, so we're now using the original output names.
# However to support legacy pipelines, we're also adding output references with pythonic names.
# TODO: Add warning when people use the legacy output names.
output_names = [output_spec.name for output_spec in component_spec.outputs or []] # Stabilizing the ordering
output_name_to_python = generate_unique_name_conversion_table(output_names, _sanitize_python_function_name)
for output_name in output_names:
    pythonic_output_name = output_name_to_python[output_name]
    # Note: Some component outputs are currently missing from task.outputs (e.g. MLPipeline UI Metadata)
    if pythonic_output_name not in task.outputs and output_name in task.outputs:
        task.outputs[pythonic_output_name] = task.outputs[output_name]

Looks like the container component defines its outputs by both original and pythonic names.

In _components.py, however, iteration goes through all inputs and pythonic_output_name_to_original is used to map the names to their original versions:

task_outputs_with_original_names = {
  pythonic_output_name_to_original[pythonic_output_name]: output_value
  for pythonic_output_name, output_value in task_outputs_with_pythonic_names.items()
}

Combining the two: if the component bridge is used:

  1. Non-pythonic output names gets listed on the output list twice: by its original name, and by its pythonic name.
  2. pythonic_output_name_to_original gets a mapping pythonic-name --> original-name.
  3. Iteration on all outputs takes both variants.
  4. Pythonic-name variant passes successfully, original-name variant causes KeyError.

Suggested solution:
Make pythonizing the value optional, by adding a default value (di.get(key, default_value) instead of di[key]) equal to the value got. In case of pythonized names, it will still result in the original name. In case of original names, it won't change them. The doubled key will get reduced due to being inside dictionary-constructor.

task_outputs_with_original_names = {
  pythonic_output_name_to_original.get(pythonic_output_name, pythonic_output_name): output_value
  for pythonic_output_name, output_value in task_outputs_with_pythonic_names.items()
}

/kind bug

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 18, 2020

Thank you for discovering, researching and fixing this issue.

Bobgy pushed a commit that referenced this issue Oct 10, 2020
…ts. Fixes #4514. (#4515)

* add tests for pythonic and non-pythonic component outputs

* fix: graph for non-pythonic container output's names

Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.

This commit fixes this problem by using default value for the lookup.

* remove depythonification of outputs - not needed anymore
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this issue Dec 9, 2020
…ts. Fixes kubeflow#4514. (kubeflow#4515)

* add tests for pythonic and non-pythonic component outputs

* fix: graph for non-pythonic container output's names

Loading container component from component.yaml creates both
pythonic and original output names. Graph component iterated over
all outputs, using pythonic-to-output conversion on all. If some
of the names are not identical to their pythonic versions, they
rised KeyError on the lookup table.

This commit fixes this problem by using default value for the lookup.

* remove depythonification of outputs - not needed anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants