-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK - Lightweight - Added support for file outputs #2221
SDK - Lightweight - Added support for file outputs #2221
Conversation
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects. This is useful when the data is expected to be big. Example 1 (writing big amount of data to output file with provided path): ```python @func_to_container_op def write_big_data(big_file_path: OutputPath(str)): with open(big_file_path) as big_file: for i in range(1000000): big_file.write('Hello world\n') ``` Example 2 (writing big amount of data to provided output file stream): ```python @func_to_container_op def write_big_data(big_file: OutputTextFile(str)): for i in range(1000000): big_file.write('Hello world\n') ```
Good job! General question: is that possible to use OutputPath/OutputTextFile/OutputBinaryFile with return statement and type hints? Or, can we merge OutputPath and InputPath into one class, say ArtifactPath, and use it in both the component producing it and the component consuming it. I vaguely feel it would be a more consistent experience. WDYT? |
This is not possible since the input/output paths must be known to the system at compile time.
The function signature needs to tell the system which path parameters are inputs and which are outputs. |
An example of function using both file inputs and outputs: @func_to_container_op
def write_big_data(input_file: InputTextFile(str), output_file: OutputTextFile(str)):
while True:
line = input_file.readline()
if line is None:
break
output_file.write('Hello ' + line) |
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Ark-kun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
in example 2, i find it weird that we're using the passed location parameter as the object we're calling |
class OutputPath: | ||
'''When creating component from function, OutputPath should be used as function parameter annotation to tell the system that the function wants to output data by writing it into a file with the given path instead of returning the data from the function.''' | ||
def __init__(self, type=None): | ||
self.type = type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the input/output type.
Before
def func(data: list):
after
def func(data_file: InputFile(list)):
@Ark-kun Hello! In other words: you are showing how to pass big binary file using I appreciate any advises. |
Hi @timofal I think the canonical way of approaching your use case is the following
|
@numerology I checked documentation. Docker image for |
Another way is perhaps to write a component yaml spec which refers to your docker image and use similar placeholders there. See examples in our first party components:
|
Lightweight components now allow function to mark some outputs that it wants to produce by writing data to files, not returning it as in-memory data objects.
This is useful when the data is expected to be big.
Example 1 (writing big amount of data to output file with provided path):
Example 2 (writing big amount of data to provided output file stream):
This change is