Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Python functions defined in a pipeline to be run as entrypoint for operations. #979

Closed
FaustineLi opened this issue Mar 18, 2019 · 5 comments
Assignees

Comments

@FaustineLi
Copy link

It would be great if lightweight components could be created in a pipeline .py file from functions defined inside that pipeline file. I am envisioning something like this:

def sum_num(num):
      sum = 0
      for i in range(num):
            sum += i
      return sum

op1 = dsl.ContainerOp(
    name='sum',
    image='<some_base_image>'
    function_entrypoint=sum_num,
    arguments=[{'num': 5}]
  )

For many tasks it would reduce the number of containers to build and maintain while speeding up the iteration loop. Currently there's support for a dockerless workflow in notebooks. Since our organization is trying to move code out of notebooks for production work, I would love to see it supported in other ways. I don't see this replacing large components, but the ability to author a component that does a short, simple thing seems really powerful.

@Ark-kun Ark-kun self-assigned this Mar 19, 2019
@Ark-kun
Copy link
Contributor

Ark-kun commented Mar 19, 2019

JFYI, Python supports putting everything in a function.
You can put functions, classes, import statements etc inside one main function:

def my_component_func(num1: int, num2: int) -> int:
  import numpy as mp
  def sum_num(num):
      sum = 0
      for i in range(num):
            sum += i
      return sum
  return sum_num(num1) + sum_num(num2)

Does that solve your problem?

BTW, you can probably put the my_component_func in a separate .py file and then just import it.

from my_module import my_component_func
my_op = func_to_container_op(my_component_func)

JFYI, you can use my_op = func_to_container_op(my_component_func, output_component_file='component.yaml') to write the component to a file on disk that can be shared between different pipelines or people.

@FaustineLi
Copy link
Author

Awesome, I had no idea that func_to_container_op was a thing!

@Ark-kun
Copy link
Contributor

Ark-kun commented Mar 22, 2019

It would be great if lightweight components could be created in a pipeline .py file from functions defined inside that pipeline file.
Awesome, I had no idea that func_to_container_op was a thing!

Interesting. Did you get the term "lightweight components" from the documentation or did you just call the feature like that by yourself?

The Lightweight python components are shown in this sample notebook which is referenced in the (scarce) documentation.

@gaoning777
Copy link
Contributor

Note that we also have a component decorator(

def component(func):
). Maybe use that instead of the func_to_container_op because we currently use the func_to_container_op internally.

@vicaire
Copy link
Contributor

vicaire commented Mar 26, 2019

Looks like this is resolved. Please reopen if this is not the case.

@vicaire vicaire closed this as completed Mar 26, 2019
Linchin pushed a commit to Linchin/pipelines that referenced this issue Apr 11, 2023
* Update the testing branches for kubeflow and manifests

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>

* review: Use a better regex

Signed-off-by: Kimonas Sotirchos <kimwnasptd@arrikto.com>
magdalenakuhn17 pushed a commit to magdalenakuhn17/pipelines that referenced this issue Oct 22, 2023
* add generic URI for Storage (kubeflow#826)

* handle query parameters

* add http as available storage type

* update the README

* cleanup README

* add unit tests

* use urlparse to get file name

* fail loudly when no filename in uri

* inclue http(s) in SupportedStorageURIPrefixList

* use regex to check http(s) uri in storage

* fix storageURI validation

* go fmt

* draft for zip & tar archives

* fix imports

* update tests

* support for gzip

* draft version of URI examples

* newline changes

* unit test for http(s) storageUri validation

* use mimetypes.guess_type to derive MIME type from url

* fix Content-Type validations in storage initializer

* update sample README for new ingress access instructions
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
…ackage pull (kubeflow#979)

* update sdk dependency and releases

* update sdk dependency and releases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants