Skip to content

Run distributed text_classification error proc param error:name:[fc_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed #9454

Closed
@typhoonzero

Description

@typhoonzero

Background, run distributed vgg16 goes well but model:
https://github.com/typhoonzero/fluid_gpu_benchmark/blob/master/text_fluid.py results in following error first-time trainer want to send variables.

E0328 11:59:35.739938   211 grpc_client.cc:189] proc param error:name:[fc_0.w_0@GRAD.block3.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.739984   208 grpc_client.cc:189] proc param error:name:[fc_0.b_0@GRAD.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.740000   203 grpc_client.cc:189] proc param error:name:[sequence_conv_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.740012   204 grpc_client.cc:189] proc param error:name:[embedding_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.27:30256] grpc error:Connect Failed
E0328 11:59:35.740399   202 grpc_client.cc:189] proc param error:name:[fc_1.b_0@GRAD.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
E0328 11:59:35.740417   198 grpc_client.cc:189] proc param error:name:[sequence_conv_0.w_0@GRAD.block1.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
E0328 11:59:35.740432   209 grpc_client.cc:189] proc param error:name:[embedding_0.w_0@GRAD.block1.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed
E0328 11:59:35.740423   207 grpc_client.cc:189] proc param error:name:[fc_0.w_0@GRAD.block0.trainer_3] ep:[192.168.16.28:30256] grpc error:Connect Failed

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions