-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IPAddress Presto type #10596
Add IPAddress Presto type #10596
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka Nice clean change. Thank you for putting the effort. Overall looks good. Some comments below. I still need to understand the cast logic, but it will be easier once you add documentation. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mohsaka. Have some minor comments.
@mbasmanova Thank you for all of the review comments. I believe I got everything. If there is anything else or I misunderstood something please let me know. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka Thank you for adding the documentation. This is very helpful. One question.
type is BIGINT. The format that the address is stored in is defined as part of `(RFC 4291#section-2.5.5.2) <https://datatracker.ietf.org/doc/html/rfc4291.html#section-2.5.5.2>`_ | ||
As Velox is run on Little Endian systems and the standard is network byte(Big Endian) | ||
order, we reverse the bytes to allow for masking and other bit operations | ||
used in IPADDRESS/IPPREFIX related functions. This type can be used to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an orderable type? If so, does OrderBy produce correct results if it treats values of this type the same as values of BIGINT type? If not, we may have a problem similar to #10338
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mbasmanova
I believe that it is working somewhat correctly. To confirm I do have an E2E test that was added for my complete PR of IPADDRESS/IPPREFIX/functions.
prestodb/presto#23282
I believe that the reason for this is due to the 1 to 1 mapping of IPADDRESS to BIGINT and how they are stored.
For example the ipaddress
1.1.1.1
in hex is 00000000000000000000ffff01010101
and which has a binary value of 281470698586369
1.1.1.2
in hex is 00000000000000000000ffff01010102
which has a binary value of 281470698586370
which is one larger.
1.1.2.1
in hex is 00000000000000000000ffff01010201
which is 281470698586625
which is larger than both 1.1.1.1
and 1.1.1.2
as it should be.
Similar with IPV6.
That being said that does point out two issues. One that I don't think presto even takes care of.
- When we sort a mixture of IPV4 and IPV6 there will be a split between IPV6 and IPV4 addresses at
::FFFF:FFFF:FFFF
This one should exist in presto.
So the ordering will be for example
::FFFE:FFFF:FFFF
0.0.0.0
....
255.255.255.255
::1:0000:0000:0000
- The other problem being that any IPV6 IPADDRESS with the MSB set will be placed at the start instead of at the end. This one shouldn't exist in presto as its doing a compareUnsigned.
For example
FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF
::
::1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into this. I'll need more time to understand.
I see that compare in Presto for IPADDRESS is not the same as for BIGINT (or maybe they just look different). This suggest that IPADDRESS type is affected by #10338 and therefore we won't be able to use it (without causing correctness issues) until that issue is resolved or we find a way to block code paths that may be affected.
@Override
public int compareTo(Block leftBlock, int leftPosition, Block rightBlock, int rightPosition)
{
int compare = Long.compareUnsigned(reverseBytes(leftBlock.getLong(leftPosition, 0)), reverseBytes(rightBlock.getLong(rightPosition, 0)));
if (compare != 0) {
return compare;
}
return Long.compareUnsigned(reverseBytes(leftBlock.getLong(leftPosition, SIZE_OF_LONG)), reverseBytes(rightBlock.getLong(rightPosition, SIZE_OF_LONG)));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem. Simplest fix would be to switch to varbinary but that is not very efficient. Other option would be to override the operator similar to how we override cast? Not sure if this is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is possible
Not yet. @bikramSingh91 has been looking into this as part of #10338
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka One option might be to proceed to land this change and continue landing functions that use that type, but document that it is not "ready to be used" and add a plan validator to Presto to reject plans with that type when running native engine (we have that for TS_w_TS type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka We should add the plan validator with a config like it was done for timestamp with timezone in Presto coordiantor first before merging this change in velox.
Example:
prestodb/presto#23200
prestodb/presto#23374
It will allow us to test it, but disable for production or correctness critical use cases. If we merge this and worker start executing queries with IPAddress type unconditonally, we will face correctness problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amitkdutta Thank you for the pointers. Will work on adding the plan changes before merging in these changes.
folly::IPAddressV6 v6Addr(addrBytes); | ||
|
||
exec::StringWriter<false> result(flatResult, row); | ||
result.reserve(kIPAddressMaxStrLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reserve capacity for all rows outside of the loop?
See getRawStringBufferWithSpace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed example
velox/velox/exec/RowContainer.cpp
Line 676 in c0f0a31
flatResult->resize(rows.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka Looks great % this type suffering from #10338
CC: @amitkdutta
const auto* ipaddresses = input.as<SimpleVector<int128_t>>(); | ||
folly::ByteArray16 addrBytes; | ||
|
||
flatResult->resize(rows.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right. I'm out next week. Maybe @bikramSingh91 can help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka Thank you for contributing.
Let me get back to you by EOD today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikramSingh91 Thank you. I went off of
velox/velox/exec/RowContainer.cpp
Line 676 in c0f0a31
flatResult->resize(rows.size()); |
as stated before. Which when I looked at it it seemed to me to be
rows.size() - number of elements
size_t totalBytes = flagBytes_ * rows.size() + fixedWidthRowSize * rows.size(); - Total number of bytes for all of the elements.
I guess I misunderstood something. Thank you for helping me out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohsaka thanks for the update, I unfortunately did not get time to review this today and need to head out of work, I will try to get to it over the weekend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context: rows
here is a selectivity vector which can have inactive rows at the end. Therefore you can have cases where rows.size() is say 10 but has 6 inactive trailing rows. Therefore, if you use rows.size() then you would be allocating extra space than which is actually required as in expression eval (as eval here is only concerned with the active rows for efficiency, the code around expression eval ensures those rows are copied over to the final result or this vector is sufficiently resized). rows.end() is therefore more appropriate here which returns one past the last selected value.
(My recommendation) However, to make implementation simpler, we have a utility function context.ensureWritable()
which takes care of these lower level details for you. See example: https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context.ensureWritable()
should already be called in castTo and castFrom. More follow up on
#10596 (comment)
const auto* ipaddresses = input.as<SimpleVector<int128_t>>(); | ||
folly::ByteArray16 addrBytes; | ||
|
||
flatResult->resize(rows.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context: rows
here is a selectivity vector which can have inactive rows at the end. Therefore you can have cases where rows.size() is say 10 but has 6 inactive trailing rows. Therefore, if you use rows.size() then you would be allocating extra space than which is actually required as in expression eval (as eval here is only concerned with the active rows for efficiency, the code around expression eval ensures those rows are copied over to the final result or this vector is sufficiently resized). rows.end() is therefore more appropriate here which returns one past the last selected value.
(My recommendation) However, to make implementation simpler, we have a utility function context.ensureWritable()
which takes care of these lower level details for you. See example: https://github.com/facebookincubator/velox/blob/main/velox/expression/CastExpr.cpp#L167
folly::ByteArray16 addrBytes; | ||
|
||
flatResult->resize(rows.size()); | ||
flatResult->getRawStringBufferWithSpace(rows.size() * kIPAddressMaxStrLen); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you dont need this, since you are using exec::StringWriter
, it takes care of string buffer management for you.
In fact calling getRawStringBufferWithSpace() would be counter productive as this would try to allocate a single large buffer of this size, whereas the vector might already have a number of smaller buffers that can accomodate values that you will be writing next. Using exec::StringWriter
will abstract away all those lower level details for you so would be the preferred way to go.
Another way to do this (without using exec::StringWriter
) would be to use the FlatVector's API FlatVector<StringView>::set(vector_size_t idx, StringView value)
which also takes care of buffer management
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bikramSingh91 Thank you for the tips. The reason I tried to use the allocator was via @mbasmanova's suggestion of preallocating the strings before hand. See
#10596 (comment)
&
#10596 (comment)
At first I didn't have any reservation for the string space, then I reserved each individual row's string space. But then @mbasmanova suggested I allocate the entire string space before entering the row by row casting.
exec::EvalCtx& context, | ||
const SelectivityVector& rows, | ||
BaseVector& result) { | ||
auto* flatResult = result.as<FlatVector<int128_t>>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call context.ensureWritable
before writing into the vector. here and other cast* implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really appreciate your advice but I believe that I already call this in the caller, see castTo
castFrom
functions.
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@xiaoxmeng This PR is not ready to merge. If possible could you remove the |
@xiaoxmeng merged this pull request in e092535. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Did you consider the compatibility with Presto |
I used UUID as the example for the IPADDRESS type. So I figured that we would already have compatibility between fixed width 128 type <-> HUGEINT. But it seems like I may be wrong. I did do E2E testing with Prestissimo, but I did not try creating tables in Presto and reading them in Prestissimo. Is that what you are talking about? Please confirm. Thanks! Also could you give UUID a try as well? There's another type TimestampWithTimezone but I think that may be fine, being an We need a Serializer change here if that is the case. Thanks @wraymo |
Split of #10538
In response to @mbasmanova :
#10538 (review)
Took a look and added the to and from varbinary casting
Contains CAST functions for IPAddress <-> Varchar and IPAddress <-> Varbinary utilizing the folly library.
In response to:
https://velox-lib.io/blog/optimize-try-more
I believe the folly library utilizes makeUnexpected already.
https://github.com/facebook/folly/blob/main/folly/IPAddress.cpp
I added one error for Varbinary conversion utilizing
context.setStatus
If I'm not understanding something correctly please let me know.
As for adding a limited fuzzer @aditi-pandit
#10538 (comment)
Is this to test the casting
IPAddress <-> Varchar and IPAddress <-> Varbinary
or something else.We already test the casting using the duckdb type, we also have the prestissimo test in the E2E test.
Fuzzer support will be added in after the types and functions have been added.