-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal reproduction test case for misaligned vector of double #213
Conversation
Great reproduction case. First note that it could be completely random that one buffer works and the other doesn't if for example that aligment argument given the the create vector call was 4 instead of the expected 8. Based on you first dump: In the buffer without size prefix, the vector size field starts at offsets 0x000c (12 dec) and contains 4 zero bytes. Observations: this size field must be aligned to 4 bytes. Because the vector is a double, the byte that follows the size field must be aligned to 8 bytes. In the buffer without size field, the following address is 0x0010 (16 dec) which is a multiple of 8. The vectors are therefore correctly aligned in both cases, assuming I have made no mistake in the above. In the buffer without size prefix there is no zero padding after the header. That is coincidental because it just happened to not be necessary. The table starts at offset 0x0004 which is stored as the root table offset in the first 4 bytes of the table header at offset 0x0000. The table starts with the vtable offset which is the negative value "f4 ff ff ff" or 0xfffffff4, or -12 dec, meaning the vtable appears 12 bytes later. That is just how FlatBuffers work. And indeed we find the 6 byte long vtable at that location. In the buffer with size prefix, there are 4 bytes of zero padding at offset 0x0008 (8 dec). The table starts at offset 0x000c (12 dec) storing a negative -12 value to the vtable 12 bytes later where the 6 byte vtable is found. The header stores the buffer size at offset 0x0000 with the value 0x0000001a (26 dec) which is the length of the printed buffer excluding the 4 byte size prefix itself. (I do not recall if the size prefix should include itself or not).
EDIT: |
I made a mistake in my conclusion - see edit above. To further confirm that the buffer is valid, try access a vector with actual data in the buffer with size prefix and confirm the pointer addresses and content when reading the data with ordinary reader interface. Make sure the buffer is aligned including the size prefix. |
Thanks for your analysis @mikkelfj. I will see if I can confirm this by verifying the buffers with the C++ implementation. |
I don't think C++ verifies aligment. |
@mikkelfj ok :( But I went ahead and did it anyway and got strange results... The "empty" doublevec table verifies, i.e this passes: const unsigned char flatbuf2[] =
{
0x04, 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0xff, 0x04, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier2((uint8_t *) flatbuf2, sizeof(flatbuf2));
TEST_EQ(VerifyDoubleVecBuffer(verifier2), true); The buffer with an empty double vector in it but still without size prefix does not pass: const unsigned char flatbuf3[] =
{
0x04, 0x00, 0x00, 0x00, 0xf4, 0xff, 0xff, 0xff, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x06, 0x00, 0x08, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier3((uint8_t *) flatbuf3, sizeof(flatbuf3));
TEST_EQ(VerifyDoubleVecBuffer(verifier3), true); |
I also tried producing buffers with the same content using C++ to compare. I know it doesn't have to be the same binary representation for the same content and I don't know enough about flatbuffers to understand it all yet but nevertheless:
Both the above verifies in C++ verifier.
Only the second verifies in the C++ verifier and it's two bytes longer.. and considering that the doublevec empty no size prefix is 12 bytes both produced with flatcc and C++ it does not seem to me C++ zero pads to 8 bytes? |
@mikkelfj while I'm far from sure about my findings above and not sure exactly what they mean it doesn't seem to confirm that the problem we see here with flatcc is restricted to the verifier. |
The flatbuf3 buffer in your example has a size prefix of the vector which is 4 (at byte offset 8 in the buffer). That is followed by 4 zero bytes. If this is a byte vector it would work, but it is a double vector it makes no sense. The size is not a multiple of 8. |
@mikkelfj flatbuf3 buffer was created with this flatcc code (test case in this PR), same as initially presented in this PR that you said was valid but that the verifier was to blame: DoubleVec_start_as_root(B);
DoubleVec_a_create(B, 0, 0);
DoubleVec_end_as_root(B); Are you saying we are back to conclude that there is indeed a bug with double vectors in flatcc? |
Correction: the size field is not 4. 4 is the offset to the vector. So it is correct. The following zero bytes is the length field. In flatbuf3 the alignment is correct because the size starts at offset 12 and the empty vector payload starts at offset 16 which is a multiple of 8. In flatbuf2 the offset to the vector to be "04 00 04 00" which is wrong. It should be "04 00 00 00" and it should be followed by 4 zeroes and then by a vtable. Someting is missing. |
In your comment starting with "I also tried producing buffers", I don't understand the first dumps. The second dumps: The flatcc ( C ) and flatc ( C++ ) buffer, the buffers have the same content, but the vtable is stored before the table in C++. The results in more padding required which explains the longer length. In both cases the offset the vector is 4 and points to a size field with zeroes that are a followed by an offset that is a multiple of 8. So both look correct. As to the first dumps, I'll need more time but it appears to have "04 00 04 00" similar to before. |
I think the first dumps are garbage. |
I don't have time trace this right now, but I doubt that you are listing the buffers correctly. Also, without proper format hex dumps as in your previous posts, it is very difficul to assess alignment. |
Hmm ok.. strange.. I tried to be careful with the dumps but I had to find something to do it in C++ that's why the format isn't exactly the same. One interesting fact though is that functionally this all appears to be working if I stay in flatcc land (correct roundtrip write/read and comparison) and avoiding the verifier - but causes alignment and possibly other memory issues when subjected to sanitizers. So whatever problem there is, it is likely systemic. |
https://github.com/dvidelabs/flatcc/blob/master/doc/binary-format.md might help Some of the dumps are missing vtables like 06 00 08 00 04 00. That, and the strange 04 00 04 00 sequence is why I doubt the dumps. |
There are two things that surprise me with
That sad in the C++ version: |
@mzaks The offsets to the vtables are soffset_t which is a signed 32-bit value. The value is subtracted from the location where the offset is stored. Since C++ generally stores vtables before the tables that use them, the soffsets are typically small positive numbers. FlatCC stores vtables clustered together last in the buffer, so it stores negative values at table start. |
First four bytes |
That is not the vtable. That part of the buffer is missing. The 04 00 04 00 is still a strange value. |
Ah, @mikkelfj I think you are correct, sorry. It is |
Your next analysis is correct from a cursory look. That is why I earlier said the second set of buffers appear correct, but the first do not make any sense. |
it makes sense if |
Yes it appears you are right. |
@bjornharrtell that means the buffer dumps are probably correct. In that case they should also verify since the null tables cannot be misaligned as they are not present. |
Well, I have double checked and I cannot get And then we have the size prefixed variants from the initial description: doublevec table with size (with empty vector): 20: doublevec table with size:
20: 00000000 1a 00 00 00 08 00 00 00 00 00 00 00 f4 ff ff ff |................|
20: 00000010 04 00 00 00 00 00 00 00 06 00 08 00 04 00 |..............| doublevec table with size (without vector): 1: doublevec table with size:
1: 00000000 0c 00 00 00 04 00 00 00 fc ff ff ff 04 00 04 00 |................| The "doublevec table with size (with empty vector)" fails to verify: const unsigned char flatbuf4[] =
{
0x1a, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf4, 0xff, 0xff, 0xff,
0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x06, 0x00, 0x08, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier4((uint8_t *) flatbuf4, sizeof(flatbuf4));
TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier4), true); The "doublevec table with size (without vector):" verifies with the C++ verifier as follows: const unsigned char flatbuf5[] =
{
0x0c, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0xff, 0x04, 0x00, 0x04, 0x00
};
flatbuffers::Verifier verifier5((uint8_t *) flatbuf5, sizeof(flatbuf5));
TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier5), true); So, same C++ verifier results with size prefixed variants. |
I'd like to know more about why the C++ verifier fails. As to C++ verifier failing on "The buffer with an empty double vector in it but still without size prefix does not pass:"
I really cannot see what would be wrong with that buffer except that the verifier might be looking for an optional file identifier. Without more information it is hard to tell. As for those with size prefixes, perhaps we should focus on the one without first, but still, if you can get an error message, assertion or breakpoint clue ... |
Ohh If C++ verifies alignment, then you cannot verify a buffer in a const char array. It will not be aligned. You need a uint64_t array or use alignas(8) attribute. The same goes for flatcc but since that would be a different process, it might get lucky and find aligned data. |
BTW: please keep this PR more or less as is since I use it for testing. |
The "flatcc, doublevec empty with size prefix" passes C++ verification: // flatcc, doublevec empty with size prefix
uint8_t *flatbuf5 = hexstr_to_char(
"0c 00 00 00 04 00 00 00 fc ff ff ff 04 00 04 00", 16);
print_buf("flatcc, doublevec empty with size prefix", flatbuf5, 16);
flatbuffers::Verifier verifier5(flatbuf5, 16);
TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier5), true); The "flatcc, doublevec 0 len a vector with size prefix" does not pass C++ verification: // flatcc, doublevec 0 len a vector with size prefix
uint8_t *flatbuf4 = hexstr_to_char(
"1a 00 00 00 08 00 00 00 00 00 00 00 f4 ff ff ff"
"04 00 00 00 00 00 00 00 06 00 08 00 04 00", 30);
print_buf("flatcc, doublevec 0 len a vector with size prefix", flatbuf4, 30);
flatbuffers::Verifier verifier4(flatbuf4, 30);
//TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier4), true); // fails verification Attempt to manually add tail padding "flatcc, doublevec 0 len a vector with size prefix + manual two byte padding" still does not pass C++ verification: // flatcc, doublevec 0 len a vector with size prefix + manual two byte padding
uint8_t *flatbuf44 = hexstr_to_char(
"1a 00 00 00 08 00 00 00 00 00 00 00 f4 ff ff ff"
"04 00 00 00 00 00 00 00 06 00 08 00 04 00 00 00", 32);
print_buf("flatcc, doublevec 0 len a vector with size prefix + manual two byte padding", flatbuf44, 32);
flatbuffers::Verifier verifier44(flatbuf44, 32);
//TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier44), true); // fails verification I've tried to debug the C++ verifier in the above case and believe it's this conditional that ends up false: |
Debugging session was correct, adjusting the size prefix to 1c (which make sense now.. ) makes is pass: // flatcc, doublevec 0 len a vector with size prefix + manual two byte padding + adjusted size prefix
uint8_t *flatbuf44 = hexstr_to_char(
"1c 00 00 00 08 00 00 00 00 00 00 00 f4 ff ff ff"
"04 00 00 00 00 00 00 00 06 00 08 00 04 00 00 00", 32);
print_buf("flatcc, doublevec 0 len a vector with size prefix + manual two byte padding", flatbuf44, 32);
flatbuffers::Verifier verifier44(flatbuf44, 32);
TEST_EQ(VerifySizePrefixedDoubleVecBuffer(verifier44), true); |
OK thanks, good to know. |
Just to be clear from what I hear the minimum alignment of a flatbuffer is 4 bytes or the size of the largest scalar stored and the full buffer size should always be divisible with the minimum alignment, so what you call tail padding is not optional it is mandatory in the flatbuffer specification (or reference implementation). |
The min alignment is 4 because of the uoffset type. The actual alignment will typically be the largest scalar actually stored in the buffer, so either 4 or 8. However, the alignment can be larger if a struct or vector with larger alignment is stored in the buffer. Flatcc currently only supports specifying struct alignment beyond its natural content based alignment. The alignment is always a power of 2. I have not heard of it being divisible by min alignment, but that kind of follows from the power of 2. |
Agreed. I was just trying to make it clear that flatcc makes buffers that to not adhere to the alignment requirement i.e a doublevec table with an empty vector of doubles without size prefix is produced to a 22 byte buffer but should be 8 byte aligned. This is already deduced by the above experiments I just wanted to get the conclusion down and confirmed as a bug. |
I wouldn't call it a bug. But FlatBuffer were never very precisely specified and it was an active design choice. C++ tail alignment cannot be exact unless it always pads up to the maximum possible alignment. For example some buffers might store a 256 byte aligned struct for use with graphics card, while other buffers of same type might not. The alignment does not affect the current buffer, but the following buffer. Hence it does not really make sense to tail pad based on current buffer content unless you alway opt for max. That sounded a bit excessive so flatcc chose to not do that. In hindsight flatcc should probably have padded to 4 or 8 bytes just to ensure that buffers can be stacked meaningfully, but since that is easy to do manually, there wasn't really a need for it. Also, size prefixed buffers did not exist early on. In reality one ought to prepad buffers up to the alignment requirement of the buffer being added to a stack of buffers. That will ensure proper alignment. So the more I think about it, the more I think it is best to just leave it alone. If you want to verify with C++, just align the buffer manually. Flatcc gives you the alignment requirement of the current buffer just before or after finalize so prepadding a stacked buffer is easy. |
What is a bug is that the flatcc verifier fails a valid size prefixed buffer. That needs to be fixed. |
Also, tail aligment needs a section in documentation to explain what I just stated in the above. |
Of course there is still the issue with adjusting the size prefix, but that cannot really be helped, except with a helper function to adjust an existing buffer based on the next buffers alignment requirement. |
It is pretty simple, this is a difference with how it is implemented in C++, which is the "gold standard" as far as how FlatBuffers should work, given any ambiguities in the text specification. C is doing something different from pretty much all other language implementations here, and fails the C++ verifier. So regardless of whether theoretically this would make sense to be allowed, this should be fixed. The eco-system does not benefit from one implementation doing things incompatibly from the others. |
@aardappel I tend to agree, but can you give the exact requirements the C++ implements? |
All such data items are self-aligned to both the start and the end of the buffer, thus the size of the buffer is a multiple of |
Thanks, that is essentially the alignment reported by the flatcc builder near completion. |
a8e7a8a
to
d7f50b2
Compare
I am starting to look into this again. To summarize: The issue is simple in principle: The fix is likely to add specific with_size variants of the verifier and feed it the original buffer start with size prefix. |
@aardappel I have now added code to pad flatcc buffers to the largest object alignment. There was actually alignment code already, but it requires that the user calls start_buffer with a block alignment argument, but it is not set by default. But I don't think this is quite right either. If a buffer sometimes contains large objects (e.g. double) and large aligned structs, and sometimes not, then buffers will have different alignments, and when stacking buffers with size prefix, the buffer following might have a larger object and be incorrectly aligned. The solution is to find the theoretical maximum alignment in the schema compiler. The flatcc builder should already be prepared to take such an alignment argument, but it needs to be generated first. But this could also be wasteful if some buffers have rarely used large alignments. |
I guess the crucial difference is that so far we have made no guarantee that if you put multiple FlatBuffers consecutively in memory that this guarantees their alignment, like structs would. We only guarantee alignment for a single FlatBuffer, storing multiple FlatBuffers is between you and your memory allocators, etc. We could make this "work by default" by aligning a FlatBuffer not to the largest stored item, but the largest declared item. I would be against such a change, given how:
So, an implementation should default to "largest stored item". That said, if an implementation has users that would like to "stack" FlatBuffers, an optional "largest declared item" mode could be allowed, since more padding is entirely compatible with the existing FlatBuffers design. But it should not be the default. If you made it the default, then users of said implementation would run into trouble the moment they accept data made by another language. The user would need to be very aware that the alignment they are requesting is not guaranteed across FlatBuffers. Though frankly, rather than going thru all that trouble, I'd instead recommend to have a way for the user to retrieve the largest declared alignment somehow, and use that to correctly "stack" their buffers, if needed. That is simpler for such a niche use case, and has the advantage that the actual buffers are not bloated unnecessarily when stored outside this stacking, and better yet, it would work with buffers originating from any implementation without those needing updates. @dbaileychess to see if he agrees with my reasoning. |
I agree to that. This is also largely how flatcc has worked, except it did not tailpad until recently. The user could always either self-align before stacking, or passing a block-align argument to the builder. |
Just for the record: As far as I can tell from the flatcc code, older version should actually tailpad, but ONLY up to the block align argument and only if different from the default 0 argument. Now the runtime largest block is taken into account when tailpadding. At any rate the largest runtime alignment has always been taking into account for prefix padding so the buffer starts correctly. |
I agree with Wouter. Any sort of "meta" flatbuffer work (stacking multiple flatbuffers) should be outside the core library concern. We can provide helper accessors to get the largest declared type if needed. |
The test case provided in this PR was useful in adding new _with_size verifiers. See end of #210 for details. Note that previously tail pad was added which likely helps passing C++ verifier, but that was not the core issue. |
I believe I have narrowed down the problem with #210 further. It appears to happen only for buffers with size.
I've made new doublevec_test with a very minimal schema and two test cases, one run without size prefix and one with.. the latter fails with "vector header out of range or unaligned".
ctest --verbose output:
The table definition is:
And the gen code is creating as root and an empty vector for a as it does not seem to matter if I fill it or not with something.
If I create two variants without setting the a vector at all both verifies and looks like this: