Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe ref counted #8368

Merged
merged 12 commits into from
Oct 3, 2022
Merged

Safe ref counted #8368

merged 12 commits into from
Oct 3, 2022

Conversation

dukc
Copy link
Contributor

@dukc dukc commented Jan 26, 2022

This is off Atila's #8101 . I wanted to create the PR against his fork but GitHub didn't allow that.

I tried to reproduce the safety bug @Geod24 found, but could not. It appears that language updates have automatically corrected that. In fact I had to add an additional @trusted wrapper to the constructor in question so that the language would not prevent it's use with scope data totally. I added a test to verify that the safety bug stays gone.

I also had the destructor to check that dip1000 is enabled, otherwise it now stays @system.

@dukc dukc requested a review from MetaLang as a code owner January 26, 2022 20:43
@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @dukc! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8368"

@dukc dukc force-pushed the safe-ref-counted branch from 956d20f to 90d4336 Compare January 26, 2022 21:19
@Geod24
Copy link
Member

Geod24 commented Jan 27, 2022

Challenge accepted

import std.stdio;
import std.typecons;

struct Container
{
    ubyte[] data;
}

struct Blob
{
    ubyte[64] data;
}


void main ()
{
    auto ptr = getPtr();
    writeln(ptr);
    const save = ptr.dup;
    auto other = getPtr();
    assert(save == ptr);
}

ubyte[] getPtr ()
{
    Blob blob;
    RefCounted!Container rc;
    escape(rc.refCountedPayload(), blob);
    assert(rc.refCountedPayload().data !is null);
    return rc.refCountedPayload().data;
}

void escape (ref Container rc, ref Blob local)
{
    rc.data = local.data;
}

Compiles, runs, crashes.
That might be more of a compiler bug than a RefCounted bug though.

Command used:

$ ./dmd/generated/osx/release/64/dmd -preview=dip1000 -run breakdip1000.d

Commits used:

$ git -C dmd show HEAD
commit 71ba3fa46221daa0539b03ed7e41f454bd6ce326
Author: Iain Buclaw <ibuclaw@gdcproject.org>
Date:   Wed Jan 26 15:15:44 2022 +0100

    posix.mak: Invoke bash to run the bootstrap script (#13549)
    
$ git -C druntime show HEAD
commit 33511e263134530a5994a775b03a061ea3f1aa34
Author: Sönke Ludwig <sludwig@outerproduct.org>
Date:   Mon Jan 24 17:30:16 2022 +0100

    Fix GetICMProfile signature

    See https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-geticmprofilew
    
$ git -C phobos show HEAD
commit 90d4336ccd96a096c26c393fe1c081c8fd09433e (HEAD, dukc/safe-ref-counted)
Author: Ate Eskola <ajieskola@gmail.com>
Date:   Wed Jan 26 21:43:02 2022 +0200

    Addressed the concerns in reviews

@dukc
Copy link
Contributor Author

dukc commented Jan 27, 2022

Thanks! Yeah, as you suspected it's a compiler bug that can be abused without Phobos at all:

@safe unittest //this compiles
{
    void escape (ref ubyte[] arr, ref ubyte[64] local)
    {
        arr = local;
    }

    @safe ubyte[] getPtr ()
    {
        ubyte[64] blob;
        ubyte[] arr;
        escape(arr, blob);
        return arr; // Ahh, just like bad old C days!
    }
}

Gotta report, if I don't find it already reported.

@dukc
Copy link
Contributor Author

dukc commented Jan 27, 2022

Didn't find an existing issue, so submitted: https://issues.dlang.org/show_bug.cgi?id=22709

I have another possible vulnerability in mind, but It's late around here so I'll test it another day. I hope I won't forget what I'm having in mind.

@Geod24
Copy link
Member

Geod24 commented Jan 28, 2022

I also looked a bit deeper into the internals of RefCounted, and there are some issues that prevented me from exploiting some other features. For example, opAssign takes its argument by value, not ref. Given the use cases of RefCounted, I think that could lead to some very undesirable behavior, but I'll explore this once the low-hanging compiler bugs are fixed :)

@RazvanN7
Copy link
Collaborator

Issue 22709 has been fixed: dlang/dmd#13577

@dukc
Copy link
Contributor Author

dukc commented Jan 28, 2022

Tried to implement my plan to hack this implementation, along the lines of this:

@safe unittest
{
    @safe void assignCrap(RefCounted!(int*) countedRef)
    {
        int local;
        auto anotherRef = countedRef; // I was thinking that anotherRef has shorter lifetime than local...
        anotherRef.refCountedPayload() = &local; // ...so that this would compile
    }
}

The compiler does not fall for this, nor for some similar variations I tried. I THINK it is because the return attribute in refCountedPayload makes the reference to the payload to be of shorter lifetime than anotherRef, not the payload itself.

That does not mean that this is safe, just that my poor head is running out of cunning with this. Perhaps by using a struct payload and using some member function to play with the lifetimes of the variables it is still hackable.

@pbackus
Copy link
Contributor

pbackus commented Jan 29, 2022

This is the problematic case:

@safe unittest
{
    auto rc = RefCounted!int(123);
    (ref int n) {
        destroy(rc);
        int oops = n; // use-after-free if allowed
    }(rc.refCountedPayload);
}

As far as I know, in order to ensure this can't happen in @safe code, either refCountedPayload or the destructor must be @system. Currently, it looks like they are both @safe (though I haven't tested it myself). If so, that means this PR is unsound.

@dukc
Copy link
Contributor Author

dukc commented Jan 29, 2022

Good catch! That one compiles, and also a simpler variant of it (in my opinion) compiles:

@safe unittest
{
    auto rc = RefCounted!int(123);
    int* ptr = &rc.refCountedPayload();
    destroy(rc);
    int oops = *ptr;
}

I think the problem is allowing destruction of rc before end of the scope. The destructor is (as far as we have discovered) @safe if called at the end of the lifetime, but not otherwise. Meaning, this PR might be safe in @live but not otherwise :(.

Worse, I think this applies to about any pattern -dip1000 would enable:

@safe void abuse(Container)()
{ auto cont = Container([1,2,3]);
  scope ptr = &cont.front;
  destroy(cont);
  int oops = *ptr;
}

@pbackus
Copy link
Contributor

pbackus commented Jan 29, 2022

Worse, I think this applies to about any pattern -dip1000 would enable:

There is one pattern that can work here, which is a callback-based API:

auto ref apply(alias callback, T)(auto ref RefCounted!T this_)
{
    // Ensures a reference is held until after `callback` returns
    auto hold = this_;
    // Explicit `scope` prevents `callback` from escaping a reference
    scope ptr = () @trusted { return &this_.refCountedPayload(); }();
    return callback(*ptr);
}

// Usage
@safe unittest
{
    auto rc = RefCounted!int(123);
    rc.apply!((ref n) {
        destroy(rc);
        int ok = n; // ok - kept alive by `hold`
    });
}

The downsides of this approach are that (a) the syntax isn't as nice, and (b) it requires an extra ref count increment/decrement at runtime.

I think the best we can do, without a real ownership/borrowing system in the language (@live doesn't count), is to provide both a @safe callback-based API for users willing to accept the runtime overhead, and a @system API that gives direct access, and puts the responsibility on users not to create dangling references. In other words: make refCountedPayload @system and add a @safe apply.

@Geod24
Copy link
Member

Geod24 commented Jan 31, 2022

@pbackus : What if someone calls destroy twice ?

@pbackus
Copy link
Contributor

pbackus commented Jan 31, 2022

@Geod24 The destructor should reset _refCounted to RefCountedStore.init to ensure that it is idempotent even if destroy!false is used.

@dukc
Copy link
Contributor Author

dukc commented Feb 2, 2022

There is one pattern that can work here, which is a callback-based API:

Going to do it like that, when I find time for it.

Perhaps there is a way to get the refCountedPayload working in @safe. I'm thinking about making the destructor private, and defining an UDA at DRuntime, let's say noEarlyDestroy. destroy would be @system for types annotated with that attribute. What do you think, would that work? In any case, I'm not going to implement that in this PR.

@pbackus
Copy link
Contributor

pbackus commented Feb 2, 2022

I'm thinking about making the destructor private, and defining an UDA at DRuntime, let's say noEarlyDestroy. destroy would be @system for types annotated with that attribute. What do you think, would that work?

A similar idea has been suggested before: https://issues.dlang.org/show_bug.cgi?id=21981

Currently, it would not work, because @safe can bypass private using __traits(getMember). Fixing that issue requires breaking existing code, so it will require a deprecation cycle at the very least.

@dukc
Copy link
Contributor Author

dukc commented Feb 2, 2022

That feature of __traits(getMember) breaks @safety of pretty much everything anyway, so I think we're best of pretending it does not exist.

@pbackus
Copy link
Contributor

pbackus commented Feb 2, 2022

Well, it's entirely possible at this point that the issue with __traits(getMember) will never be fixed. That's one of the motivations behind @system variables (DIP 1035): they're a way to block access from @safe code that doesn't rely on private.

Until and unless we have 100% confirmation that private will be made inviolable in @safe code, we should not commit to adding any new language features that rely on private for memory safety.

@dukc
Copy link
Contributor Author

dukc commented Feb 4, 2022

Until and unless we have 100% confirmation that private will be made inviolable in @safe code, we should not commit to adding any new language features that rely on private for memory safety.

Yeah, since DIP1035 can fix most but not all problems of __traits(getmember), it is within possibility that W&A declare violating privacy via it is an official feature even in @safe, rather than a bug. I agree we need an official decision on that before trying that @noEarlyDestroy thing.

Perhaps a forum topic is in order. On the other hand if DIP1035 gets rejected (I would be disappointed) no discussion is needed IMO: that behaviour is unambiguously a bug.

@dukc dukc changed the title Safe ref counted draft:Safe ref counted Feb 4, 2022
@dukc dukc closed this Feb 4, 2022
@dukc dukc reopened this Feb 4, 2022
@dukc dukc changed the title draft:Safe ref counted Draft: Safe ref counted Feb 4, 2022
@dukc dukc marked this pull request as draft February 4, 2022 20:56
@dukc
Copy link
Contributor Author

dukc commented Feb 4, 2022

Ahh, finally found the "convert to draft" button

@dukc
Copy link
Contributor Author

dukc commented Feb 12, 2022

Mostly done in my local tree. Still trying to get automatic inference of @safe working, it will compile if apply is used in explicit @safe but for some reason if inference is on it stays @system.

I encountered one problem though: consistency. If apply is used on a null Nullable, it will simply do nothing. However if it's a null RefCounted, it will either auto-initialize it or trigger an assertion. Not very consistent. Should I think another name for apply on RefCounted, or alternatively make it no-op on null RefCounted?

Copy link
Contributor

@pbackus pbackus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor doc/formatting issues.

The removal of RefCounted's test suite seems premature.

Copy link
Collaborator

@RazvanN7 RazvanN7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the ugliness that is added by checking if the dip1000 switch is set or not. Why don't we just drop support for non-dip1000 phobos?

std/file.d Outdated
// Must be a template, because the destructor is unsafe or safe depending on
// whether `-preview=dip1000` is in use. Otherwise, linking errors would
// result.
struct DirIterator(bool useDIP1000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is a breaking change since DirIterator is part of the public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a documented symbol though, and BuildKite passing proves it doesn't break much. But yes it's possible someone still uses it.

Hmm, I wonder if this would avoid the breakage?

struct _DirIterator(bool useDIP1000) {/*...*/}
alias DirIterator = _DirIterator!dip1000Enabled;

Previously this led to a linking error. Alias to a template instance, the compiler thinks the linker can find the function precompiled in the Phobos binary, which it can't because it was compiled with DIP1000 resulting in different ABI. But on the other hand I tested without any template parameters - will using a bool parameter for DIP1000 change the fact? I quess not, but I have to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tested this. For some reason, doing the alias trick works fine for DirIterator without causing linking errors for non-dip1000 client code, but not for dirEntries. With dirEntries I have to keep the useDIP1000 template parameter there. Luckily it's not so much breaking change there because functions do not require using !() for instantiation, except maybe if the user wants to take address of the function.

Copy link
Contributor Author

@dukc dukc Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed the aforementioned change of DirIterator.

@dukc
Copy link
Contributor Author

dukc commented Sep 13, 2022

I don't like the ugliness that is added by checking if the dip1000 switch is set or not. Why don't we just drop support for non-dip1000 phobos?

The switch is intended for non-dip1000 client code, not for compiling the Phobos binary without dip1000. Or did you mean dropping support for such client code? Way too early. DIP1000 is not even fully enabled by yet, giving only deprecations on violations. Typical codebase being what it is, we'd force well over half of them to migrate right away.

I have migrated some code at Symmetry to dip1000 semantics. It's sometimes hard, unless you cop out with @system or @trusted abuse. A migration period is there for a reason.

@pbackus
Copy link
Contributor

pbackus commented Sep 18, 2022

generated/linux/release/64/betterctests/std_typecons.d(7075): Error: template instance `AliasSeq!(SafeRefCounted, RefCounted)` template `AliasSeq` is not defined
generated/linux/release/64/betterctests/std_typecons.d(7102): Error: template instance `AliasSeq!(SafeRefCounted, RefCounted)` template `AliasSeq` is not defined
generated/linux/release/64/betterctests/std_typecons.d(7126): Error: template instance `AliasSeq!(SafeRefCounted, RefCounted)` template `AliasSeq` is not defined
posix.mak:668: recipe for target 'std/typecons.betterc' failed

@dukc dukc force-pushed the safe-ref-counted branch 2 times, most recently from fbe4a5b to 947ecbc Compare September 21, 2022 21:27
dukc and others added 2 commits September 22, 2022 00:37
Co-authored-by: Paul Backus <snarwin@gmail.com>
Co-authored-by: Paul Backus <snarwin@gmail.com>
@dukc
Copy link
Contributor Author

dukc commented Sep 21, 2022

All comments addressed.

@RazvanN7 RazvanN7 added the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Sep 26, 2022
@RazvanN7
Copy link
Collaborator

cc @Geod24 @atilaneves . Let's get this in!

@dukc
Copy link
Contributor Author

dukc commented Oct 3, 2022

Hello?

@pbackus
Copy link
Contributor

pbackus commented Oct 3, 2022

Looks like Dlang Bot is down, so the "72h -> merge" label doesn't do anything. Since it's been a full week, I think we can assume this is good to go.

@pbackus pbackus merged commit aaa0b70 into dlang:master Oct 3, 2022
@dukc
Copy link
Contributor Author

dukc commented Oct 4, 2022

🚀🙌

Geod24 added a commit to Geod24/phobos that referenced this pull request Oct 13, 2022
Those fixes were done in PR dlang#8509 but PR dlang#8368 accidentally reverted them.
thewilsonator pushed a commit that referenced this pull request Oct 13, 2022
Those fixes were done in PR #8509 but PR #8368 accidentally reverted them.
Geod24 added a commit to Geod24/phobos that referenced this pull request Nov 17, 2022
Those fixes were done in PR dlang#8509 but PR dlang#8368 accidentally reverted them.
Geod24 added a commit to Geod24/phobos that referenced this pull request Nov 30, 2022
Those fixes were done in PR dlang#8509 but PR dlang#8368 accidentally reverted them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merge:72h no objection -> merge The PR will be merged if there are no objections raised.
Projects
None yet
Development

Successfully merging this pull request may close these issues.