Description
openedon Oct 3, 2024
Problem
When using pools, the ResetVT
method is able to reuse allocated memory of bytes. It looks like this with a minimal example:
syntax = "proto2";
package TEST;
option go_package = "./testpkg";
message Foo {
required bytes raw = 1;
}
func (m *Foo) ResetVT() {
if m != nil {
f0 := m.Raw[:0]
m.Reset()
m.Raw = f0
}
}
When the bytes are inside a oneof
field, the memory is no longer reused:
syntax = "proto2";
package TEST;
option go_package = "./testpkg";
message Foo {
oneof data {
bytes raw = 1;
bytes zlib_data = 2;
}
}
func (m *Foo) ResetVT() {
if m != nil {
m.Reset()
}
}
Use Case
This is a real problem for me when parsing open streetmap's PBF files, because they use this construct for large blobs of data that occur hundreds to thousands of times in a PBF file: https://github.com/openstreetmap/OSM-binary/blob/65e7e976f5c8e47f057a0d921639ea8e6309ef06/osmpbf/fileformat.proto#L38
Maybe it would have been better if they used a single bytes
field and a separate type
field, but I'm afraid the design is set in stone as the format is already widely used.
Suggested Goal
Unfortunately I don't understand the code well enough to provide a pull request, but this is how I would imagine the generated code to look like in order to fix the problem:
func (m *Foo) ResetVT() {
if m != nil {
var f0 isFoo_Data
switch t := m.Data.(type) {
case *Foo_Raw:
t.Raw = t.Raw[:0]
f0 = t
case *Foo_ZlibData:
t.ZlibData = t.ZlibData[:0]
f0 = t
}
m.Reset()
m.Data = f0
}
}
Inside the UnmarshalVT
method, the code would look something like this; As a first step I'm only reusing the memory, if the Data
field previously had the same type:
...
var v []byte
if m.Data == nil {
v = make([]byte, postIndex-iNdEx)
} else {
switch t := m.Data.(type) {
case *Foo_Raw:
if cap(t.Raw) < postIndex-iNdEx {
v = make([]byte, postIndex-iNdEx)
} else {
v = t.Raw[:postIndex-iNdEx]
}
default:
v = make([]byte, postIndex-iNdEx)
}
}
copy(v, dAtA[iNdEx:postIndex])
m.Data = &Foo_Raw{Raw: v}
...