Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Issue 20544 - socket.remoteAddress throws out of memory error with unix domain socket peer #7383

Merged
merged 1 commit into from
Jan 30, 2020

Conversation

adamdruppe
Copy link
Contributor

No description provided.

@dlang-bot
Copy link
Contributor

dlang-bot commented Jan 29, 2020

Thanks for your pull request and interest in making D better, @adamdruppe! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
20544 normal socket.remoteAddress throws out of memory error with unix domain socket peer

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + phobos#7383"

Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, can you adda changelog entry or reference a bug report please.

@adamdruppe
Copy link
Contributor Author

I suspect the root cause is a bit deeper here but my test case revealed it buried in thousands of lines of code and I haven't fully reduced it yet.

My feeling right now is the constructor on line 2011 is actually setting the length wrong given the abstract linux socket (which starts with a nul byte, but then has other data after that).

I'll probably come back tomorrow and hopefully follow up with more detail and perhaps do an additional PR.

But I figured might as well open this since regardless of how the state got here, .ptr[0 .. size_t.max] is just obviously wrong regardless, causing crashes as well as violating that @trusted promise in the signature so this check remains correct (or at the least, not harmful) even if there are supplemental fixes.

Copy link
Member

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the only situation in which this change would have an effect, is when the socket is constructed with an empty path. This is obviously not going to do anything useful, so I think the correct fix is to add an enforce to the constructor to check for that. (The logic in the constructor is already wrong for the case of empty paths, as the abstract UNIX socket check then references an out-of-bounds character.)

@CyberShadow
Copy link
Member

causing crashes as well as violating that @trusted promise in the signature

I would rather say that the problem is that if _nameLen == sockaddr_un.init.sun_path.offsetof, then the object is in an invalid state. The length of the path must be non-zero so that we can at least know if we're working with an abstract socket or not, and also, it must be non-zero, because either we're working with a non-abstract socket (in which case it is zero-terminated, i.e. length is at least 1 for the terminating NUL character), or an abstract socket (in which case it has a leading NUL character, so again length is at least 1).

@adamdruppe
Copy link
Contributor Author

So what specifically caused this in my code is a call to remoteAddress.toString() on an abstract unix socket server connection. The data in there is sockaddr_un(AF_UNIX, ['?' repeated 108 times]), which is what the protected constructor new UnixAddress() sets it to. It also sets _nameLen = sun.sizeof.

toString does appear to work with abstract names set with the other constructor, and it appears to to work with strings of the maximum length. So I believe this protected ctor is the problem. It should probably at least null terminate it lol. Maybe fill with zeroes instead of '?'.

BTW though in my code I just tested with a cast; cast(UnixAddress) ira ? "unix:" : ira.toString(); and moved on. that might be the best thing for my use anyway.

@CyberShadow
Copy link
Member

I think that constructor call should be immediately followed by a getpeername + .setNameLen, as in Socket.remoteAddress.

Could you please have a look at what exactly is going on, or at least, provide a minimal reproducible testcase?

@adamdruppe
Copy link
Contributor Author

yeah im just looking at the commit history to see why it is doing that ? thing right now anyway. git blame led here: https://github.com/dlang/phobos/pull/3300/files

which seems to be just changing from dynamic to static allocation so i don't think changing the ctor will hurt anything.

Here's a little program that reproduces (on Linux)

import std.socket;
import std.stdio;
import std.concurrency;

void client() {
        auto socket = new Socket(AddressFamily.UNIX, SocketType.STREAM);
        socket.connect(new UnixAddress("\0test"));
        ubyte[128] buffer;
        socket.receive(buffer[]);
        socket.close();
}

void main() {
        auto socket = new Socket(AddressFamily.UNIX, SocketType.STREAM);
        socket.bind(new UnixAddress("\0test"));
        socket.listen(1);
        spawn(&client);
        auto peer = socket.accept();

        scope(exit) {
                peer.send("");
                peer.close();
                socket.close();
        }

        writeln(peer.remoteAddress());
}

Note the abstract socket is not necessary, it works the same way with a regular name too.

@adamdruppe
Copy link
Contributor Author

so getpeername returns 2 with an unbound unix domain socket, which means the sun_path is entirely untouched as that means only the family tag (a ushort) is initiated.

Which means len == 0 in an entirely valid code path.... which is only relevant in one place - the path method.

So yeah this change is actually the correct solution as there is no deeper root cause. Name length is simply allowed to be zero.

refs:
https://www.ibm.com/support/knowledgecenter/en/SSB23S_1.1.0.15/gtpc2/cpp_getpeername.html
"A UNIX domain stream socket that is not bound to a path name by using the bind function is unnamed. "

@CyberShadow
Copy link
Member

OK. Agreed, except I think it would be better to explicitly return null if len is zero.

@adamdruppe
Copy link
Contributor Author

yea i'll make that change, force pushed up now.

Copy link
Member

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reference a bugzilla issue.

std/socket.d Outdated
@@ -2024,6 +2024,8 @@ static if (is(sockaddr_un))
@property string path() @trusted const pure
{
auto len = _nameLen - sockaddr_un.init.sun_path.offsetof;
if (len == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No tabs.

Suggested change
if (len == 0)
if (len == 0)

std/socket.d Outdated
@@ -2024,6 +2024,8 @@ static if (is(sockaddr_un))
@property string path() @trusted const pure
{
auto len = _nameLen - sockaddr_un.init.sun_path.offsetof;
if (len == 0)
return null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment would be good.

Suggested change
return null;
return null; // An empty path may be returned from getpeername

@CyberShadow
Copy link
Member

Here's a little program that reproduces (on Linux)

Can you add it as a unittest?

@adamdruppe
Copy link
Contributor Author

I just added one assert to the existing unittest to call the method since the setup is basically the same anyway.

also formally filed: https://issues.dlang.org/show_bug.cgi?id=20544

@adamdruppe
Copy link
Contributor Author

Hmmm the autotester failed on BSD. It is possible that system does different behavior.

It is also possible that Linux will change its behavior at some point in the future; I don't think the spec requires it to return null, just that it can.

but i could always make it version(linux) assert..... or loosen it a bit and just ensure it passes without throwing an exception.

@adamdruppe
Copy link
Contributor Author

yeah im just gonna call the method inside the unittest to ensure it doesn't throw. the comment will explain it "may".

@CyberShadow
Copy link
Member

I think assertNotThrown may be the appropriate way to test this, then.

Copy link
Member

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased and fixed the commit message for you.

@CyberShadow CyberShadow changed the title obviously broken if len == 0 Fix Issue 20544 - socket.remoteAddress throws out of memory error with unix domain socket peer Jan 29, 2020
@adamdruppe
Copy link
Contributor Author

cool

@adamdruppe
Copy link
Contributor Author

eeek dub down again and it seems to cause the buildkite fail. i think.

@thewilsonator
Copy link
Contributor

Thi fails on circle which is required.

@adamdruppe
Copy link
Contributor Author

Is there a way to tell it to try again?

I don't see any legitimate reason for this to fail.

@thewilsonator
Copy link
Contributor

Is there a way to tell it to try again?

Force push, unless you have permissions to retrigger builds.

@adamdruppe
Copy link
Contributor Author

yeah there we go it passed this time. I suspect the problems yesterday were actually just dub's 502 rather than something related here.

@adamdruppe
Copy link
Contributor Author

i think the force push disabled the auto-merge flag tho

@wilzbach wilzbach merged commit 53fe0c4 into dlang:master Jan 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants