Benchmark of tensor serialization performance. #4610

dzhwinter · 2017-10-06T03:36:13Z

This PR is testing the speed of protobuf/pure C format/mixed options and will not merge into the code base, just for reviewing the code's correctness.

dzhwinter · 2017-10-10T00:40:23Z

Option 1: Use Protobuf to serialize tensor.

Protobuf is very bad for large chunk of data. For example [1]:

It's corresponding encoded binary is:

[1] From Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

As you can see from the second graph, each repeated field contains a field tag. This is very bad both for the speed and storage. Especially we will need to serialize the gradient and parameter between trainers and pservers.

Option 2: Write custom tensor serialization.

This is the option in this PR. The user need to write the encoding and decoding code.

Option 3: Only use Protobuf to serialize tensor meta data, serialize the tensor memory block directly, and pack them together.

This option depends on Protobuf, but offers forward and backward compatibility.

Benchmark Result

Here are some results of testing for this three options above. Time cost measures the one round of Tensor serialization and deserialization. We try the different size of Tensors, such as 10x10, 100x100, 1000x1000, and average the time cost of 1000 times. In the table, time cost smaller is better. The benchmark result as table shows:

	time cost
option1	0.075905
option2	0.00283767
option3	0.00294829

Option 3 is the best trade-off between speed/efficiency and maintain difficulty.

unship · 2017-11-03T16:44:09Z

for primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) , protobuf can use pack encoding, field_key = field_number<<3+2, field value is length followed by only values, that is to say field key appear only once

when training, tensor(word is encoded as number) can use pack encoding, when inferring, do you have so large repeated string to field store in protobuf message？

dzhwinter added 3 commits October 5, 2017 14:00

"benchmark for proto and custom format"

5adace3

"benchmark done."

32c37ed

"try more round"

e91a6a2

dzhwinter closed this Oct 6, 2017

dzhwinter mentioned this pull request Oct 6, 2017

"Serialize LoDTensor, Save/Restore model" #4602

Merged

helinwang mentioned this pull request Nov 3, 2017

关于PaddlePaddle Serving的一点调研 PaddlePaddle/PaddleCloud#394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark of tensor serialization performance. #4610

Benchmark of tensor serialization performance. #4610

dzhwinter commented Oct 6, 2017 •

edited

Loading

dzhwinter commented Oct 10, 2017 •

edited

Loading

Option 1: Use Protobuf to serialize tensor.

Option 2: Write custom tensor serialization.

Option 3: Only use Protobuf to serialize tensor meta data, serialize the tensor memory block directly, and pack them together.

unship commented Nov 3, 2017

Benchmark of tensor serialization performance. #4610

Benchmark of tensor serialization performance. #4610

Conversation

dzhwinter commented Oct 6, 2017 • edited Loading

dzhwinter commented Oct 10, 2017 • edited Loading

Option 1: Use Protobuf to serialize tensor.

Option 2: Write custom tensor serialization.

Option 3: Only use Protobuf to serialize tensor meta data, serialize the tensor memory block directly, and pack them together.

Benchmark Result

unship commented Nov 3, 2017

dzhwinter commented Oct 6, 2017 •

edited

Loading

dzhwinter commented Oct 10, 2017 •

edited

Loading