-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pipeline step failed with exit status code 2: failed to save outputs #750
Comments
I am getting exactly the same error.
|
So my issue was i had specified wrong path, specifically |
Hey, I fixed the pipeline and it now runs run_preprocess.py >> run_train.py. The problem was in the run_preprocess.py file. You have to write a file to the local file path inside your docker container. In this case I wrote the
Kubeflow pipelines manages its workflow by using A small pitfall to keep in mind is that inside the code of the second container, run_train.py in this case, the variable I really only added the I wonder whether this way of chaining output to input is intended to pass parameters or small values to the next container OR if kubeflow wants you to pass the entire preprocessed dataset. |
@DurivetMatthias Could you share your workflow yaml file? |
@vincent-pli the code is at https://github.com/DurivetMatthias/examples/blob/add_pipelines/financial_time_series/tensorflow_model/ml_pipeline.py Make sure that when the preprocessing script ends, there is a file at the specified path in output_files. |
@DurivetMatthias thanks, it helped |
Thanks @DurivetMatthias for the fix, closing the issue. |
Hey,
Seems like kfp is expecting /mainctrfs/data to be a file instead of a
directory,
There's a bunch of code that I can't see from here but I would advise you
to log the file structure of your docker container just before the step
fails, then you should notice that atleast 1 file isn't where you promised
it would be.
(try os.system("ls -l"))
Hope you find the problem :)
Op di 12 nov. 2019 om 11:34 schreef RochanMehrotra <notifications@github.com
…:
@DurivetMatthias <https://github.com/DurivetMatthias> @Svendegroote91
<https://github.com/Svendegroote91> I'm getting somewhat same error:
This step is in Error state with this message: failed to save outputs:
read /mainctrfs/data: is a directory
My pipeline is :
import kfp
def generate_data(output_uri, output_uri_in_file,
volume,
step_name='generate_data',
mount_output_to='/data'):
return kfp.dsl.ContainerOp(
name=step_name,
image='rochanmehrotra/testing_kf:generate_data',
arguments=[
'--output1-path', output_uri,
'--output1-path-file', output_uri_in_file,
],
command=['python3', '/component/src/data_generator.py'],
file_outputs={
'output_file': output_uri,
'output_uri_in_file': output_uri_in_file,
'xtrain':"/data/x_train.npy",
'ytrain':"/data/y_train.npy",
'xtest':"/data/x_test.npy",
'ytest':"/data/y_test.npy"
},
pvolumes={mount_output_to: volume}
)
def train(output_uri, output_uri_in_file,
volume,
step_name='train',
mount_output_to='/data'):
return kfp.dsl.ContainerOp(
name=step_name,
image='rochanmehrotra/testing_kf:train',
arguments=[
'--model-path', output_uri,
'--output1-path-file', output_uri_in_file,
],
command=['python3', '/component/src/train.py'],
file_outputs={
'output_file': output_uri,
'output_uri_in_file': output_uri_in_file,
},
pvolumes={mount_output_to: volume}
)
def evaluate(output_uri, output_uri_in_file,
volume,
step_name='evaluate',
mount_output_to='/data'):
return kfp.dsl.ContainerOp(
name=step_name,
image='rochanmehrotra/testing_kf:evaluate',
arguments=[
'--model-path', output_uri,
'--output1-path-file', output_uri_in_file,
],
command=['python3', '/component/src/evaluate.py'],
file_outputs={
'output_file': output_uri,
'output_uri_in_file': output_uri_in_file,
},
pvolumes={mount_output_to: volume}
)
@kfp.dsl.pipeline(name='mlp pipeline', description='')
def mlp_pipeline(
rok_url,
pvc_size='4Gi'):
vop = kfp.dsl.VolumeOp(
name='create-volume',
resource_name='mlp_pipeline',
annotations={"rok/origin": rok_url},
size=pvc_size
)
component_1 = generate_data(
output_uri='/data',
output_uri_in_file='/data',
volume=vop.volume
)
component_2 = train(
#output_uri='/data',
#output_uri_in_file='/data/output1_path_file',
output_uri=component_1.outputs['output_file'],
output_uri_in_file=component_1.outputs['output_uri_in_file'],
volume=vop.volume
).after(component_1)
component_3 = evaluate(
output_uri='/data/model.h5',
output_uri_in_file='/data/output1_path_file',
volume=vop.volume
).after(component_2)
if __name__ == '__main__':
import kfp.compiler as compiler
compiler.Compiler().compile(mlp_pipeline, 'mlp_pipeline.tar.gz')
the code for generatedata.py/rochanmehrotra/testing_kf:generate_data:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout
import argparse
from pathlib import Path
import os
parser = argparse.ArgumentParser(description='My program description')
parser.add_argument('--output1-path', type=str, help='Path of the local file or GCS blob where the Output 1 data should be written.')
parser.add_argument('--output1-path-file', type=str, help='Path of the local file where the Output 1 URI data should be written.')
args = parser.parse_args()
dir_path = os.path.dirname(os.path.realpath(__file__))
print("dir_path=>",dir_path)
if not os.path.exists(args.output1_path):
os.mkdir(args.output1_path)
if not os.path.exists(args.output1_path_file):
os.mkdir(args.output1_path_file)
# Generate dummy data
x_train = np.random.random((1000, 20))
y_train = np.random.randint(2, size=(1000, 1))
x_test = np.random.random((100, 20))
y_test = np.random.randint(2, size=(100, 1))
np.save(args.output1_path+"/x_train.npy",x_train)
np.save(args.output1_path+"/y_train.npy",y_train)
np.save(args.output1_path+"/x_test.npy",x_test )
np.save(args.output1_path+"/y_test.npy",y_test )
paths="{}/x_train.npy \n{}/y_train.npy \n{}/x_test.npy \n{}/y_test.npy".format(args.output1_path,args.output1_path,args.output1_path,args.output1_path)
file1 = open(args.output1_path+"/output1_path_file","a")
file1.write(paths)
file1.close()
from os import walk
f = []
for (dirpath, dirnames, filenames) in walk(args.output1_path):
print(dirpath)
print(dirnames)
f.extend(filenames)
print(f)
the output is of generate data is :
Using TensorFlow backend.
dir_path=> /component/src
/data
['lost+found']
/data/lost+found
[]
['x_test.npy', 'output1_path_file', 'x_train.npy', 'y_train.npy', 'y_test.npy']
but it says stage failed with error:
This step is in Error state with this message: failed to save outputs: read /mainctrfs/data: is a directory```
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#750?email_source=notifications&email_token=AGJPCF3VXRXAHPDALJDXSMDQTKBDBA5CNFSM4GTDSI7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDZZPSQ#issuecomment-552835018>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGJPCF5EJPSFC2AEYMEL6VDQTKBDBANCNFSM4GTDSI7A>
.
|
…kubeflow#750 (kubeflow#814) * wip * fix cleanup job * fix cleanup cronjob permission * add cleanup job for kf-ci-v1 cluster * update namespace
…eflow#750) * Add bert triton example * fix init * Triton reference * Rename to triton * Add bert transformer to e2e image build * Generate for triton sdk types * Add to bert transformer build to CI * Add e2e test for triton * Add back license * Add gpu annotation on test * Fix storage uri in the test * upgrade test cluster * Add debug for the pod * Fix test resource * Add retry * Test bert model with triton * Skip triton test * Skip tag resolution for nvcr.io * Use simpler example * skip tag resolution for nvcr.io * Upgrade to Knative 0.15.0 for e2e testing * Upgrade to kubernetes 1.16 * Add note for knative version dependency
Hi,
I am trying to run a very basic kubeflow pipeline with 2 components:
1/ preprocess
2/ train
However, when trying to run the pipeline, I get the message
This step is in Error state with this message: failed to save outputs: exit status 2
in the Pipeline UI.When I check the pod logs I get the following error and I attached the succesful logs before the error.
My actual pipeline script is as follows:
The preprocess container is actually executed as the files on storage were stored but it looks like something is going wrong between the container communication and the orchestration.
As the error message is quite cryptic, can anyone help to tell me where to look to fix this issue?
FYI: this is my
run_preprocess.py
:The text was updated successfully, but these errors were encountered: