Support various payload of baidu-std: json, proto-json and proto-text #2946

chenBright · 2025-04-09T08:57:42Z

What problem does this PR solve?

Issue Number: #2405 第四点。

Problem Summary:

What is changed and the side effects?

Changed:

Side effects:

Performance effects(性能影响):
Breaking backward compatibility(向后兼容性):

Check List:

Please make sure your changes are compilable(请确保你的更改可以通过编译).
When providing us with a new feature, it is best to add related tests(如果你向我们增加一个新的功能, 请添加相关测试).
Please follow Contributor Covenant Code of Conduct.(请遵循贡献者准则).

wwbmmm · 2025-04-11T01:54:02Z

有点复杂了。我觉得把compress和序列化方式耦合在一起不太好，组合太多了。
可以考虑先序列化成一个SerializedRequest，然后再传进Compress。
解压的时候先Decompress成一个SerializedRequest，然后再反序列化。
或者能不能把序列化/反序列化的工作交给用户自己去做呢？用户只要传递SerializedRequest/SerializedResponse和框架交互就可以了。

chenBright · 2025-04-11T10:03:29Z

有点复杂了。我觉得把compress和序列化方式耦合在一起不太好，组合太多了。可以考虑先序列化成一个SerializedRequest，然后再传进Compress。解压的时候先Decompress成一个SerializedRequest，然后再反序列化。或者能不能把序列化/反序列化的工作交给用户自己去做呢？用户只要传递SerializedRequest/SerializedResponse和框架交互就可以了。

gzip和zlib的compress和序列化方式耦合的原因应该是protobuf接口导致的，GzipInputStream/GzipOutputStream将解压/压缩过程作为输入/输出流的一部分了。这应该是出于性能考虑吧，我测了一下，耦合的性能比非耦合的性能高：

# 1kb
# 非耦合
I20250411 11:00:48.295392  2989 brpc_http_rpc_protocol_unittest.cpp:1884] Compress 100000 times, cost 444246ns/call
# 耦合
I20250411 11:00:57.442049  2989 brpc_http_rpc_protocol_unittest.cpp:1893] Compress 100000 times, cost 91466ns/call

# 16kb
# 非耦合
I20250411 11:49:16.398469 110097 brpc_http_rpc_protocol_unittest.cpp:1884] Compress 100000 times, cost 75561ns/call
# 耦合
I20250411 11:49:17.467223 110097 brpc_http_rpc_protocol_unittest.cpp:1893] Compress 100000 times, cost 10687ns/call

所以，还是有必要保留耦合的实现。而protobuf不支持的snappy的实现是非耦合的（题外话：可能也可以实现为耦合方式，提高性能）。

所以得同时支持这两种模式。

抽象一下，将compress和序列化方式两个过程尽量解耦：

// CompressCallback provides raw data for compression,
// and a buffer for storing compressed data.
class CompressCallback : public CompressBase {
public:
    // Converts the data into `output' for later compression.
    virtual bool Convert(google::protobuf::io::ZeroCopyOutputStream* output) = 0;
    // Returns the buffer for storing compressed data.
    virtual butil::IOBuf& Buffer() = 0;
};

// DecompressCallback provides raw data stored in a buffer for decompression,
// and handles the decompressed data.
class DecompressCallback : public CompressBase {
public:
    // Converts the decompressed `input'.
    virtual bool Convert(google::protobuf::io::ZeroCopyInputStream* intput) = 0;
    // Returns the buffer containing compressed data.
    virtual const butil::IOBuf& Buffer() = 0;
};

struct CompressHandler {
    // Compress data from CompressCallback::Convert() into CompressCallback::Buffer().
    bool (*Compress)(CompressCallback& callback);

    // Decompress data from DecompressCallback::Buffer() into DecompressCallback::Convert().
    bool (*Decompress)(DecompressCallback& callback);

    // Name of the compression algorithm, must be string constant.
    const char* name;
};

每种序列化方式的CompressCallback/DecompressCallback实现了序列化/反序列化逻辑，到时作为回调传给CompressHandler，每种CompressHandler根据自己的特点，选择时候使用耦合的实现，以及在合适的位置调用回调函数进行序列化/反序列化。

wwbmmm · 2025-04-11T10:25:23Z

gzip和zlib的compress和序列化方式耦合的原因应该是protobuf接口导致的，GzipInputStream/GzipOutputStream将解压/压缩过程作为输入/输出流的一部分了。这应该是出于性能考虑吧，我测了一下，耦合的性能比非耦合的性能高：

嗯，我的意思是，对pb格式，处理方式和当前一致，还是耦合方式，保持性能优势。对于其它格式，传入一个SerializedRequest类型的msg，做特殊处理，这样是否可以？

chenBright · 2025-04-11T10:31:16Z

对于其它格式，传入一个SerializedRequest类型的msg，做特殊处理，这样是否可以？

这种方式只适用于snappy。问题在于compress，不在于序列化，其他格式使用耦合方式性能也是更好的（应该是少了一次拷贝带来的性能提升）。

如果按这个实现，CompressHandler得提供两套接口：耦合方式和非耦合方式。

chenBright · 2025-04-11T10:34:47Z

我觉得将序列化过程作为回调传到compress里，会更合适一些。

wwbmmm · 2025-04-11T12:10:42Z

我觉得将序列化过程作为回调传到compress里，会更合适一些。

我理解是不是把compress和decompress的接口从传递protobuf msg改成传递ZeroCopyOutput/InputStream?

会不会有用户自己实现的compress/decompress格式，可能会造成不兼容

chenBright · 2025-04-11T16:19:48Z

我理解是不是把compress和decompress的接口从传递protobuf msg改成传递ZeroCopyOutput/InputStream?

这样的话，调用方就得将protobuf message转成ZeroCopyOutput/InputStream了吧。那么原来pb的实现也得改吧。

会不会有用户自己实现的compress/decompress格式，可能会造成不兼容

应该不会，增加压缩类型，需要修复bRPC代码，增加压缩类型和在协议处理中增加处理逻辑。除非用户自己注册了自定义压缩类型，用于用户逻辑。

或者看看这个方案如何：

参考RedisRequest/SerializedRequest等技巧，CompressCallback/DecompressCallback继承Message，就可作为参数传入CompressHandler接口。CompressHandler压缩/解压的时候：

如果是CompressCallback/DecompressCallback，则做特殊处理，进行特定格式的序列化/反序列化。
如果是pb格式，则执行原有逻辑。

这个方案的兼容性会更好，不用改CompressHandler的接口，也不会影响用户逻辑中使用CompressHandler的逻辑。

wwbmmm · 2025-04-12T03:11:11Z

参考RedisRequest/SerializedRequest等技巧，CompressCallback/DecompressCallback继承Message，就可作为参数传入CompressHandler接口。CompressHandler压缩/解压的时候：

如果是CompressCallback/DecompressCallback，则做特殊处理（调Convert进行序列化/反序列化）。

如果是pb格式，则执行原有逻辑。

这个方案的兼容性会更好，不用改CompressHandler的接口，也不会影响用户逻辑中使用CompressHandler的逻辑。

看起来好像可以

chenBright · 2025-04-13T06:05:43Z

参考RedisRequest/SerializedRequest等技巧，CompressCallback/DecompressCallback继承Message，就可作为参数传入CompressHandler接口。CompressHandler压缩/解压的时候：

如果是CompressCallback/DecompressCallback，则做特殊处理（调Convert进行序列化/反序列化）。

如果是pb格式，则执行原有逻辑。

这个方案的兼容性会更好，不用改CompressHandler的接口，也不会影响用户逻辑中使用CompressHandler的逻辑。

看起来好像可以

@wwbmmm 再看看

wwbmmm · 2025-04-13T12:52:28Z

LGTM

chenBright force-pushed the baidu-std-payload branch 4 times, most recently from 8e2425a to d88ea67 Compare April 10, 2025 13:33

Support various payload of baidu-std: json, proto-json and proto-text

30945c3

chenBright force-pushed the baidu-std-payload branch from d88ea67 to 30945c3 Compare April 11, 2025 01:32

chenBright force-pushed the baidu-std-payload branch from d54a292 to ab666d4 Compare April 11, 2025 10:25

chenBright force-pushed the baidu-std-payload branch 2 times, most recently from fb24ab3 to 519fdbd Compare April 12, 2025 18:55

Refactor implementation of compress

b9a2a73

chenBright force-pushed the baidu-std-payload branch from 519fdbd to b9a2a73 Compare April 12, 2025 19:02

chenBright merged commit a1877bc into apache:master Apr 16, 2025
21 checks passed

chenBright deleted the baidu-std-payload branch April 16, 2025 09:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support various payload of baidu-std: json, proto-json and proto-text #2946

Support various payload of baidu-std: json, proto-json and proto-text #2946

chenBright commented Apr 9, 2025 •

edited

Loading

wwbmmm commented Apr 11, 2025

chenBright commented Apr 11, 2025 •

edited

Loading

wwbmmm commented Apr 11, 2025 •

edited

Loading

chenBright commented Apr 11, 2025 •

edited

Loading

chenBright commented Apr 11, 2025 •

edited

Loading

wwbmmm commented Apr 11, 2025

chenBright commented Apr 11, 2025 •

edited

Loading

wwbmmm commented Apr 12, 2025

chenBright commented Apr 13, 2025

wwbmmm commented Apr 13, 2025

Support various payload of baidu-std: json, proto-json and proto-text #2946

Support various payload of baidu-std: json, proto-json and proto-text #2946

Conversation

chenBright commented Apr 9, 2025 • edited Loading

What problem does this PR solve?

What is changed and the side effects?

Check List:

wwbmmm commented Apr 11, 2025

chenBright commented Apr 11, 2025 • edited Loading

wwbmmm commented Apr 11, 2025 • edited Loading

chenBright commented Apr 11, 2025 • edited Loading

chenBright commented Apr 11, 2025 • edited Loading

wwbmmm commented Apr 11, 2025

chenBright commented Apr 11, 2025 • edited Loading

wwbmmm commented Apr 12, 2025

chenBright commented Apr 13, 2025

wwbmmm commented Apr 13, 2025

chenBright commented Apr 9, 2025 •

edited

Loading

chenBright commented Apr 11, 2025 •

edited

Loading

wwbmmm commented Apr 11, 2025 •

edited

Loading

chenBright commented Apr 11, 2025 •

edited

Loading

chenBright commented Apr 11, 2025 •

edited

Loading

chenBright commented Apr 11, 2025 •

edited

Loading