Skip to content

Benchmark of tensor serialization performance. #4610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

dzhwinter
Copy link
Contributor

@dzhwinter dzhwinter commented Oct 6, 2017

This PR is testing the speed of protobuf/pure C format/mixed options and will not merge into the code base, just for reviewing the code's correctness.

@dzhwinter
Copy link
Contributor Author

dzhwinter commented Oct 10, 2017

Option 1: Use Protobuf to serialize tensor.

Protobuf is very bad for large chunk of data. For example [1]:

screen shot 2017-10-04 at 5 35 12 pm

It's corresponding encoded binary is:

screen shot 2017-10-04 at 5 34 02 pm

[1] From Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

As you can see from the second graph, each repeated field contains a field tag. This is very bad both for the speed and storage. Especially we will need to serialize the gradient and parameter between trainers and pservers.

Option 2: Write custom tensor serialization.

This is the option in this PR. The user need to write the encoding and decoding code.

Option 3: Only use Protobuf to serialize tensor meta data, serialize the tensor memory block directly, and pack them together.

This option depends on Protobuf, but offers forward and backward compatibility.

Benchmark Result

Here are some results of testing for this three options above. Time cost measures the one round of Tensor serialization and deserialization. We try the different size of Tensors, such as 10x10, 100x100, 1000x1000, and average the time cost of 1000 times. In the table, time cost smaller is better. The benchmark result as table shows:

  time cost
option1 0.075905
option2 0.00283767
option3 0.00294829

Option 3 is the best trade-off between speed/efficiency and maintain difficulty.

@unship
Copy link

unship commented Nov 3, 2017

for primitive numeric types (types which use the varint, 32-bit, or 64-bit wire types) , protobuf can use pack encoding, field_key = field_number<<3+2, field value is length followed by only values, that is to say field key appear only once

when training, tensor(word is encoded as number) can use pack encoding, when inferring, do you have so large repeated string to field store in protobuf message?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants