Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IPADDRESS and IPPREFIX with functions #10538

Closed

Conversation

mohsaka
Copy link
Contributor

@mohsaka mohsaka commented Jul 23, 2024

Issue #7591

E2E Tests at
prestodb/presto#23282

IPADDRESS/IPPREFIX

Types

IPADDRESS (HUGEINT)

All IPs are stored as IPV6. IPV4 is converted to IPV6 with a form of 
0x 0000 0000 0000 0000 0000 FFFF XXXX XXXX

IPPREFIX ({HUGEINT, TINYINT})

Stored as VarBinary. In Network Byte order. With a TINYINT prefix
at the end. 

Functions

NOTES:

We cannot apply IPV4 operations on an IPV4 converted IPV6 address.
Ex. We can apply a length 100 prefix on an IPV4 converted address. So we need
to always convert IPV6 to IPV4 if possible.

All error checking is done with the IPPREFIX and IPADDRESS types. We assume if they
are that type then they are valid.

cast(ipaddress as varchar) → varchar

memcpy int128_t from ipaddress type into byte array. 
If we are loading from a little endian system we need to flip
the bytes since it needs to be in network byte order.

We then check if it is a v4 mapped address so we can make
the correct ipV4/V6 object to print correctly.

All error handling done by the boost library.

cast(ipaddressString as IPADDRESS) → IPADDRESS

Just create an IPV6 object using folly. Conversion handled automatically.

cast(ipprefix as VARCHAR) → VARCHAR

Same as ipaddress, except we append on the prefix as a string.

cast(ipprefixString as IPADDRESS) → IPPREFIX

Create network with string. This will check validity.
If the user enters an IPV4 formatted IPV6 address it will be okay
with any IPV6 prefix length. We don't want this so we convert to
IPV4 and apply prefix length to mask to check for IPV4 validity. 

Copy over mask and return

ip_prefix(ipaddress, prefix_bits) → IPPREFIX

If ipaddress is a varchar, use same logic as cast to convert to
ip address and call with ipaddress type. 

Call mask on IPV4 or IPV6 then convert to IPV6 address. 

ip_subnet_min(ipprefix) → IPADDRESS

Return the canonical IP address already stored in the IP Prefix object.
It is the smallest IP we could have.

ip_subnet_max(ipprefix) → IPADDRESS

We want to set all non prefix bits to 1. For IPV4 thats max 32 bits, for IPV6
thats max 128. Subtract prefix from the max and do some bit manipulation
to flip that many bits.

EX. Prefix is 25 on an IPV4. So we want to flip the 7 LSBs. 
1 << 7 = (1,0,0,0,0,0,0,0)
(1,0,0,0,0,0,0,0) - 1 = (0,1,1,1,1,1,1,1);

We then OR this with IPPREFIX's canonical IP to set them all to 1.

Note, for ipprefix = 0 in IPV6 we should be able to overflow then subtract one
from a bit standpoint but it does not work. So added a special case. 

ip_subnet_range(ipprefix) → [IPADDRESS, IPADDRESS]

Just return ip_subnet_min and ip_subnet_max in an array. 

is_subnet_of(ipprefix, ipaddress) → BOOLEAN

We just need to make sure that the prefix of IPADDRESS is equal to the ip
of IPPREFIX when subject to the prefix length of IPPREFIX.
To do this we do a similar bit operation. However we flip the bits to get the prefix
side and AND them together to extract the prefix bits from the IP. 
We then compare to check if the prefix are equal. 

Note, for ipprefix = 0 in IPV6 we should be able to overflow then subtract one
from a bit standpoint but it does not work. So added a special case. 

is_subnet_of(ipprefix1, ipprefix2) → BOOLEAN

Just call is_subnet_of(ipprefix, ipaddress) however with an additional condition
to make sure that the prefix of ipprefix2 is longer or equal to that of ipprefix1.
If its shorter than there is no way for ipprefix1 to contain all of ipprefix2. 

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 23, 2024
Copy link

netlify bot commented Jul 23, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit e8dc270
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66a07f24b14e050008a4cf55

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mohsaka Thank you for adding these types and functions that use them. This PR is rather large. Would it be possible to break it up into smaller pieces? For example, one PR to add IPADDRESS type + operators (CAST, etc.), another to add IPPREFIX type, and a third PR to add functions.

Let's makes sure that all operators for these types are included. Specifically, make sure that all CASTs supported in Presto are supported in Velox as well.

  VELOX_UNSUPPORTED(
     "Cast from {} to IPAddress not yet supported",
     resultType->toString());

"not yet supported" in the error message above suggests that there might be CASTs supported in Presto, but not Velox. Let's make sure to include all supported casts.

For CASTs from strings, make sure these are implemented efficiently and do not generate exceptions unnecessarily. See https://velox-lib.io/blog/optimize-try-more

@mbasmanova
Copy link
Contributor

@mohsaka Would you check if these types are affected by #10338 ?

@mohsaka
Copy link
Contributor Author

mohsaka commented Jul 25, 2024

@mohsaka Would you check if these types are affected by #10338 ?

Thank you for the review.

Took a look. It shouldn't be affected by this as the binary to HUGEINT is a 1 to 1 mapping. We don't have multiple representatives of the same value.

If you were thinking of the case where we have an IPV4 and an IPV6 with the same value, we automatically assume IPV4 converted to IPV6. EX.

EXPECT_EQ(castIPAddress("::ffff:ffff:ffff"), "255.255.255.255");

We can see that IPV6 ::ffff:ffff:ffff Will be assumed to be IPV4 255.255.255.255 rather than IPV6 of value ::ffff:ffff:ffff.

Will also break up the PR. Thanks!

@aditi-pandit
Copy link
Collaborator

@mohsaka : Thanks for your PR. Splitting is a good idea.

Also, it would be great if we had a limited fuzzer support for validating function results. It is very hard to verify results without it.

Christian has a limited fuzzer for CAST expressions https://github.com/facebookincubator/velox/pull/10523/files#diff-324086e31fd4ee807d8a0e8234490cf68550a0501eb5e955ac499e21f71c320c. Would be great if we can do a similar one for IPAddress and IPPrefix as well.

@mohsaka mohsaka closed this Aug 7, 2024
facebook-github-bot pushed a commit that referenced this pull request Aug 13, 2024
Summary:
Split of #10538

In response to mbasmanova :
#10538 (review)

Took a look and added the to and from varbinary casting
Contains CAST functions for IPAddress <-> Varchar and IPAddress <-> Varbinary utilizing the folly library.

In response to:
https://velox-lib.io/blog/optimize-try-more

I believe the folly library utilizes makeUnexpected already.
https://github.com/facebook/folly/blob/main/folly/IPAddress.cpp
I added one error for Varbinary conversion utilizing `context.setStatus`
If I'm not understanding something correctly please let me know.

As for adding a limited fuzzer aditi-pandit
#10538 (comment)

Is this to test the casting `IPAddress <-> Varchar and IPAddress <-> Varbinary` or something else.
We already test the casting using the duckdb type, we also have the prestissimo test in the E2E test.
Fuzzer support will be added in after the types and functions have been added.

Pull Request resolved: #10596

Reviewed By: bikramSingh91

Differential Revision: D61163377

Pulled By: xiaoxmeng

fbshipit-source-id: 2c0fa2e8e6d250a2167a1ff82b2878548798a226
@mohsaka mohsaka mentioned this pull request Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants