Protobuf to Arrow, using Rust
Take a protobuf:
message SearchRequest {
string query = 1;
int32 page_number = 2;
int32 result_per_page = 3;
}And convert serialized messages directly to pyarrow.RecordBatch:
from ptars import HandlerPool
messages = [
SearchRequest(
query="protobuf to arrow",
page_number=0,
result_per_page=10,
),
SearchRequest(
query="protobuf to arrow",
page_number=1,
result_per_page=10,
),
]
payloads = [message.SerializeToString() for message in messages]
pool = HandlerPool([SearchRequest.DESCRIPTOR.file])
handler = pool.get_for_message(SearchRequest.DESCRIPTOR)
record_batch = handler.list_to_record_batch(payloads)| query | page_number | result_per_page |
|---|---|---|
| protobuf to arrow | 0 | 10 |
| protobuf to arrow | 1 | 10 |
You can also convert a pyarrow.RecordBatch back to serialized protobuf messages:
array: pa.BinaryArray = handler.record_batch_to_array(record_batch)
messages_back: list[SearchRequest] = [
SearchRequest.FromString(s.as_py()) for s in array
]Ptars is a rust implementation of protarrow, which is implemented in plain python. It is:
- 2.5 times faster when converting from proto to arrow.
- 3 times faster when converting from arrow to proto.
---- benchmark 'to_arrow': 2 tests ----
Name (time in ms) Mean
---------------------------------------
protarrow_to_arrow 9.4863 (2.63)
ptars_to_arrow 3.6009 (1.0)
---------------------------------------
---- benchmark 'to_proto': 2 tests -----
Name (time in ms) Mean
----------------------------------------
protarrow_to_proto 20.8297 (3.20)
ptars_to_proto 6.5013 (1.0)
----------------------------------------