Skip to content

getaddrinfo conversion #15555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open

Conversation

NattyNarwhal
Copy link
Member

@NattyNarwhal NattyNarwhal commented Aug 23, 2024

Probably perilous. Tested only on my Mac and a Gentoo/ppc64le system so far. Concerns:

  • This adds some new functions out of convenience (php_network_getaddress, for common situations like resolving localhost - we only want just the one) and necessity (php_network_getaddresses_ex - exposes more of getaddrinfo like address family).
  • There are quite a few spots that are IPv4 only. Some of them are because the function is documented as returning IPv4 addresses (this sucks, but too late in cycle to fix?), some are because it's just kinda assumed (i.e. FastCGI)?
  • sockaddr_conv.c should probably be converted to bool returns, though I'm not sure if it's an ABI break (since it can be represented differently). It's a little ugly anyways...
  • socket_addrinfo_lookup calls getaddrinfo directly. It could call php_network_getaddresses_ex, though we'd need to expose service there too.
  • This doesn't drop php_network_gethostbyname and the associated detritus just yet, since it's an ABI break. We could do it before RC1... We can do ABI breaks before RC1, so I'm removing the remaining support.
    • As previously mentioned, big build system simplification.
  • Does HAVE_IPV6 or building without it even make sense anymore? We kinda assume inet_ntop for example, so assuming getaddrinfo also makes sense.
  • Userland function gethostbyaddr actually calls the modern gethostbyname equivalent already when possible. The fallback path to the legacy function can be removed with gethostbyname.

Fixes GH-15531

php_network_getaddresses is our wrapper function around getaddrinfo, the
modern replacement for gethostbyname/addr. This converts these userland
functions to use the newer API, which does have some benefits... that
these functions can't yet take advantage of. Other references to
gethostbyname/addr will be removed in future commits, to eventually
remove the PHP wrappers in the next ABI break.

Note that php_network_getaddresses does its own E_WARNINGs.

Note that these functions are documented as only supporting IPv4
addresses; we need to do another BC break to return IPv6 addresses.
Allows controlling the address family and flags to use.
Not ideal right now; it should really be using the PHP wrapper function,
and have bool conversion. But it does get rid of a use of gethostbyname.
This is curiously IPv4 only, so I kept the the same semantics.
A common idiom is only caring about the first IP address, for i.e.
resolving localhost -> 127.0.0.1/::1. Iterating through and freeing the
whole list in this case is tedious.

In this case, we provide a function that takes a pointer to a sockaddr
storage (so it can be stack allocated), as well as information like
address family (for i.e. only IPv4 addresses). This should hopefully be
only as boilerplate as gethostbyname was.
The underlying php_network_getaddresses will docref this for us, so
there's no point in taking a zend_string we don't have to do anything
with.
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only had a quick look at it, but thank you for taking care of this!

int address_count = php_network_getaddress(&resolved, hostname, 0, AF_INET, 0, NULL);
if (address_count == 0) {
/* don't need to docref here, getaddresses E_WARNINGs for us */
RETURN_STRINGL(hostname, hostname_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated optimization, but if we would use Z_PARAM_PATH_STR to get a zend_string we could just return a copy here.

We will no longer be using this function.
Callers should use php_network_getaddress* instead.
@NattyNarwhal
Copy link
Member Author

If breaking ABI is fine, then it should be possible to completely remove gethostbyname/addr in this PR. I'll see how much that removes.

Since this can never return -1 (0 on error/none, >0 for addresses)
Includes some global state as well.
Comment on lines 137 to 138
static void file_globals_dtor(php_file_globals *file_globals_p)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call to this should also be removed then.

@@ -7,7 +7,7 @@ curl
--SKIPIF--
<?php
$addr = "www.".uniqid().".".uniqid();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we could try to come up with something that we know does not exist? Like a weird subdomain name of php.net

@NattyNarwhal
Copy link
Member Author

Windows failure is curious, because it doesn't seem to happen on on Unix?

@cmb69
Copy link
Member

cmb69 commented Aug 24, 2024

Windows failure is curious, because it doesn't seem to happen on on Unix?

On Windows the relevant part of the stack backtrace is

php8_debug.dll!php_network_getaddresses_ex(const char * host, int socktype, int family, int ai_flags, sockaddr * * * sal, _zend_string * * error_string) Line 204 (c:\php-sdk\phpdev\vs17\x64\php-src-15555\main\network.c:204)
php8_debug.dll!php_network_getaddress(php_sockaddr_storage * sockaddr, const char * host, int socktype, int family, int ai_flags, _zend_string * * error_string) Line 260 (c:\php-sdk\phpdev\vs17\x64\php-src-15555\main\network.c:260)
php8_debug.dll!zif_gethostbyname(_zend_execute_data * execute_data, _zval_struct * return_value) Line 236 (c:\php-sdk\phpdev\vs17\x64\php-src-15555\ext\standard\dns.c:236)

and ret = 11001 (WSAHOST_NOT_FOUND).

Old stack backtrace (i.e. without this patch):

php8_debug.dll!php_network_gethostbyname(const char * name) Line 1309 (c:\php-sdk\phpdev\vs17\x64\php-src\main\network.c:1309)
php8_debug.dll!php_gethostbyname(char * name) Line 301 (c:\php-sdk\phpdev\vs17\x64\php-src\ext\standard\dns.c:301)
php8_debug.dll!zif_gethostbyname(_zend_execute_data * execute_data, _zval_struct * return_value) Line 235 (c:\php-sdk\phpdev\vs17\x64\php-src\ext\standard\dns.c:235)

php_gethostbyname() returns the zend_string as expected, but the check for the return value doesn't cater to this (and no warning is raised earlier):

php-src/ext/standard/dns.c

Lines 235 to 240 in c5bce0d

if (!(ipaddr = php_gethostbyname(hostname))) {
php_error_docref(NULL, E_WARNING, "Host name to ip failed %s", hostname);
RETURN_STRINGL(hostname, hostname_len);
} else {
RETURN_STR(ipaddr);
}

I haven't checked the behavior on Linux, but would have expected that getaddrinfo() would have returned EAI_NONAME (or some other failure), but that apparently doesn't happen.

By the way, that test (gethostbyname_error002.phpt) should probably be changed to drop the is_string.

@cmb69
Copy link
Member

cmb69 commented Aug 24, 2024

Just checked on WSL2 (Debian 11), and var_dump(gethostbyname("1234567890")) outputs string(12) "73.150.2.210" (with and without this patch),

By the way, that test (gethostbyname_error002.phpt) should probably be changed to drop the is_string.

May actually have be a good idea to do this alreay. See https://3v4l.org/iP10j. And maybe the documentation ("Returns the IPv4 address or a string containing the unmodified hostname on failure.") and Windows implementation would need to be fixed accordingly.

@NattyNarwhal
Copy link
Member Author

Oh, it's trying to turn an integer into an IP address on Unix, and Windows doesn't do that? I agree this test doesn't make sense then - the error and non-error return types are the same, but what it actually returns seems different per platform. (And the behaviour seems identical between gethostbyname and getaddrinfo, so it's not a breaking change - not documented on the PHP side though.)

@cmb69
Copy link
Member

cmb69 commented Aug 24, 2024

Oh, it's trying to turn an integer into an IP address on Unix, and Windows doesn't do that?

It seems to me that the Unix behavior is non standard. From the spec:

If the nodename argument is not null, it can be a descriptive name or can be an address string. If the specified address family is AF_INET, [IP6] [Option Start] AF_INET6, [Option End] or AF_UNSPEC, valid descriptive names include host names. If the specified address family is AF_INET or AF_UNSPEC, address strings using Internet standard dot notation as specified in inet_ntop are valid.

Maybe change that test, and see what actually happens on CI. Perhaps there is different behavior even on POSIX platforms.

It seems on Unix, the behaviour is to convert integers into IP
addresses, whereas Windows tries to resolve it. Regardless, it's silly
to test what the type is, since it returns the unmodified hostname if it
can't be resolved, so it'll always be a string, error or not.
@cmb69
Copy link
Member

cmb69 commented Aug 24, 2024

Hm, all non Windows systems seem to behave identically; however, FreeBSD and Travis have not been run at all for whatever reason. No sure what to do now; documenting the behavior is a no-brainer (besides someone would have to do that), but should we adapt the Windows behavior? What exactly would need to be done? Just turning an integer to an IP address (basically long2ip((int) $addr))?

(Aside: apparently the MacOS builds could need some TLC.)

@NattyNarwhal
Copy link
Member Author

Looks like the same behaviour. I looked at NetBSD's implementation as well as musl's, and it seems they do similar things - call inet_aton behind the scenes for numeric IPs of the AF_INET persuasion, and that on Unices does the turning an integer into a v4 address behaviour. musl's inet_aton is simple and presumably how most do it. I'm guessing Windows might do the same if it gets passed to inet_aton, but it probably tries it as an address instead.

Annoyingly, I don't see any flags to control this behaviour.

@cmb69
Copy link
Member

cmb69 commented Aug 24, 2024

I'm guessing Windows might do the same if it gets passed to inet_aton, […]

To my knowledge, there is no inet_aton(). PHP 8.3 had two implementations (#13479), but neither would have properly catered to a numeric string. But we could re-use some external code or cook up our own, if we want to.

[…], but it probably tries it as an address instead.

Indeed.

@NattyNarwhal
Copy link
Member Author

I meant inet_ntoa, sorry. Unfortunately, it looks hard to override the platform behaviour to avoid converting pure integers into IPv4 addresses as there's no flags and the change happens at the resolver level, unless we decide we want to do that on all platforms, and do the same on Windows as well. Then it's just a matter of checking if something is purely numeric and convert to an IP instead of the system behaviour (and have to manually fill in the structure ourselves...).

@NattyNarwhal
Copy link
Member Author

Circling back to the hardcoded IPv4 usage and raw usage of getaddrinfo things I brought up: I think the right thing to do would probably be adding ai_protocol (for the sockets function, because it can use it for the hint) and service arguments to the new functions like php_network_getaddresses_ex. It's a lot of arguments though.

Alternatively, if we're OK with that sockets function using getaddrinfo directly, then the next best thing is to add the service only, the port is returned in the sockaddr, and can be consumed directly with less copies/additional structure writes that we'd have conditions for if-v4/v6 - and it'd make those functions handle v6 more easily, since they can just work with the returned sockaddr, which was the intent of getaddrinfo.

@cmb69
Copy link
Member

cmb69 commented Aug 25, 2024

Maybe @devnexen has some thoughts about this.

#ifdef HAVE_GETADDRINFO
struct sockaddr_in6 *sin6 = (struct sockaddr_in6*)sin;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be possible to factorise it further via macro to avoid this double assignation and the family conditions below, wdyt ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reconsider usages of gethostbyname_r()
4 participants