Skip to content

Disposable is leaky by default #159

Open

Description

I'm a bit uneasy with the fact that it's allowed to create a Disposable without registering its dispose method to some kind of disposal scope, either with using on the current scope, or add to some DisposableStack.

In fact, the disposal registration is opt-in, not opt-out. This gives a Disposable object leaky behavior by default. It's like still doing malloc() and free(), but added opt-in support for scope based RAII. This is especially concerning given most of the resource management in JavaScript to date has been implicit by garbage collection. A Disposable object can leak even after it getting garbage collected.

function createHandle(): Disposable & Handle {
    const { write, close } = nativeHandle();
    return { write, [Symbol.dispose](): { close() } };
}

function main() {
    const handle = createHandle();
    handle.write(data);
    // ops you forget to dispose handle
}

main();
// now it's leaking quietly
// any because handle is garbage collected you can't even find it in heap dump

It's worse than leaking - it's leaking quietly, and it can disappear from heap dump because the dispose method can be garbage collected. Diagnosing a leak like this would be a nightmare.

I know, before the proposal people have already been doing explicit resource management in JavaScript. But that's because it's the only way to do it. I would expect a language level improvement on resource management to do better than just providing a syntactic sugar without addressing the unsafe nature of explicit resource management.

I'm not saying I have a solution to the problem. But I'd like to at least start a conversation on possibilities.

My proposal have improved for several iterations thanks to in-depth discussion. To help people to quickly follow up with the most recent proposal of mine, please check this mega post.

Maybe it's better to not exposing the dispose method on any returned value. A dispose method must be registered to some DisposeScope. While decoupling DisposeScope from block scope, making it similar to Autorelease Pool Blocks in Objective-C. So we don't need explicit transfer of ownership when a function is using some Disposable resource but wants to delegate its disposal to a parent DisposeScope. Also a DisposeScope can be stored and re-opened to support use cases like class.

So it would look something like this:

interface DisposeScope {
    defer<T>((value: T) => Promise<void>|void): void;
}

interface DisposableFunction<T, Args extends any[]> {
    (scope: DisposeScope, ...Args): T;
}

interface DisposableClass<T, Args extends any[]> {
    new (scope: DisposeScope, ...Args): T;
}

function Allocate(scope: DisposeScope, length: number): Array {
    const arr = new Array(length);
    scope.defer(() => arr.splice(0, arr.length));
    return arr;
}

function Bar() {
    const bar = using Allocate(10); // using always bind to the nearest dispose scope above the call stack
}

async dispose { // create a DisposeScope at this stack level
    const foo = using Allocate(5);
    {
        Bar();
    }
} // await bar dispose(); awiat foo dispose();

const baz = using Allocate(5); // error: no enclosing DisposeScope


// For class

class Resource {
    constructor(scope: DisposeScope) {
        this.#scope = scope;
    }
    async run(): Promise<void> {
        async dispose(this.#scope) { // reopen a DisposeScope
            this.#arr = using Allocate(5);
        } // disposal is being deferred to the owner of this.#scope
    }
}

async dispose { // create a DisposeScope at this stack level
    const resource = using new Resource();
} // await resource.#arr dispose();

Enclosing some code inside a DisposeScope means you want the scope to collect all the resource dispose methods, and call them when the DisposeScope is closed, and to make sure all dispose methods complete before executing the statements after the DisposeScope block.

You can even take it further to not pass DisposeScope as an argument. Instead making it an environment slot on the execution context. A defer keyword can be added to register a disposal method to the enclosing DisposeScope. However storing and re-opening a scope would require some changes to be supported.

So it can even look something like this:

function Allocate(length: number): Array {
    const arr = new Array(length);
    defer arr.splice(0, arr.length); // the expression is deferred for evaluation in the enclosing DisposeScope
    async defer { // a block expression can be used for defer
        await cleanup(); // await can be used after async defer
    }; // async defer can be used in a sync function as long as the enclosing DisposeScope is async dispose
    return arr;
}

function Bar() {
    const bar = Allocate(10); // function call always inherits the nearest dispose scope above the call stack
}

async dispose { // create a DisposeScope at this stack level
    const foo = Allocate(5);
    {
        Bar();
    }
} // await bar dispose(); awiat foo dispose();

const baz = Allocate(5); // error: no enclosing DisposeScope


// For class

function EnclosingDisposeScope(): unique symbol {
    // [native function]
    // returns a special symbol uniquely identifies the current enclosing DisposeScope
}

class Task {
    constructor() {
        this.#scope = EnclosingDisposeScope();
    }
    run(): void {
        dispose(this.#scope) { // reopen a DisposeScope identified by the symbol
            this.#arr = Allocate(5);
        } // disposal is being deferred to the scope identified by this.#scope
    }
}

async dispose { // create a DisposeScope at this stack level
    const task = new Task();
    dispose { // a new DisposeScope at top of the stack
        task.run();
    }
} // await task.#arr dispose();

In this way there's no need to define a Disposable type, and no need to use a DisposableStack. Even the using keyword can be omitted. A function created disposable resource no longer has to return a Disposable type. Every function just return whatever value they produce, and defer statement can be inserted at any location to dispose resource at the enclosing DisposeScope.

In this way, a disposal cannot be scheduled with defer if there's no enclosing DisposeScope. And the only way to register disposal operation is to use defer, and the dispose method is never exposed to the outside.

This would provide a safer default behavior when using Disposable resource. If you somehow "forgot" to create an new DisposeScope, resource created directly or indirectly from your function won't leak silently - it either fails to defer to an enclosing DisposeScope, resulting an error being thrown, or it's guarenteed to be disposed when the enclosing DisposeScope closed. If a resource is living longer than you expected, it's very likely being deferred to some long running DisposeScope, and you can easily find both the DisposeScope and the dispose method it referecnes in a heap dump. Diagnosing resource retaintion problems would be easier this way.

Extending the lifetime of a DisposeScope

Resource lifetime and execution context (functions, try-catch blocks etc) lifetime are loosely related. Resource references can be cloned, assigned, transferred, returned, down and up the call stack. You simply cannot confine a resouce in a fixed block of code enclosed by a pair of curly braces. Execution context on the other hand, is more ordered - it's always in a Last-In-First-Out stack. There are special behaviors like async/await, macrotasks, generator functions but if you generalize execution context enough, these can also follow the same rule.

A curly brace block syntax for a resource DisposeScope made it easy to implicitly pass the scope down the execution stack, but it doesn't capture the dynamic nature of resource passing between different execution contexts, and in many cases, outside the curly brace block the DisposeScope can enclose. When that happen, we will have use-after-free problems:

function createHandle(): Handle {
    const { write, close } = nativeHandle();
    defer close();
    return { write };
}
dispose {
    const handle = createHandle();
    doSomething().then(() => {
        handle.write(data); // error: handle already disposed
    }); // this returned synchronously while the callback is scheduled on an async context
}

Note that DisposableStack also has this problem:

function createHandle(): Disposable & Handle {
    const { write, close } = nativeHandle();
    return { write, [Symbol.dispose](): { close() } };
}
function main() {
    using handle = createHandle();
    doSomething().then(() => {
        handle.write(data); // error: handle already disposed
    }); // this returned synchronously while the callback is scheduled on an async context
}

However, if we decouple the DisposeScope from execution block scope, we can define a different semantic for it. Just look at the following code and it would be very intuitive if it can just work:

function createHandle(): Handle {
    const { write, close } = nativeHandle();
    defer close();
    return { write };
}
dispose { // spoiler: this should be an async dispose
    const scope = EnclosingDisposeScope();
    const handle = createHandle();
    doSomething().then(() => {
        dispose(scope) { // re-open a captured scope on an async context
            handle.write(data);
        }
    });
}

An immediate observation tell us, we can make it work if we add the following rule for a DisposeScope disposal: Only dispose a DisposeScope when all the references to that scope were gone (unreachable / garbage collected).

We can manually analyze the bahavior by denoting reference counts of the scope.

dispose { // open the scope, refcount++: 1
    const scope = EnclosingDisposeScope(); // refcount++: 2
    const handle = createHandle();
    doSomething().then(/* ... */); // captures scope, refcount++: 3
    // const scope gone, refcount--: 2
} // close the scope, refcount--: 1

// on the async execution context:
() => {
    dispose(scope) { // re-open the scope, refcount++: 2
        handle.write(data);
    } // close the scope, refcount--: 1
} // captured scope gone, refcount--: 0, now we can dispose scope

We can see that after reaching the end of the dispose block that created the DisposeScope, there is still reference to the scope. So the dispose doesn't happen. It's until the scheduled async context finished, and the reference count of the scope dropped to 0, we can finally dispose the scope.

This however, still has a problem. The DisposeScope we created here is an synchronous scope. But the refcount of the scope only reach to 0 on a different async execution context. This means we should use an async dispose scope instead, even the deferred function is synchronous:

function createHandle(): Handle {
    const { write, close } = nativeHandle();
    defer close();
    return { write };
}
async dispose {
    const scope = EnclosingDisposeScope();
    const handle = createHandle();
    doSomething().then(() => {
        async dispose(scope) { // re-open the captured scope on an async context
            handle.write(data);
        }
    });
}

We can write self managed resources:

function createHandle(): Handle {
    const scope = EnclosingDisposeScope();
    const { write, close } = nativeHandle();
    defer close();
    return { write, scope }; // extending the scope's lifetime to the lifetime of the returned object
}
async dispose {
    const handle = createHandle();
    doSomething().then(() => {
        // no need to capture the scope explicitly, handle will hold a reference to the scope already
        handle.write(data);
    });
}

// For class

class Task {
    constructor() {
        this.#scope = EnclosingDisposeScope(); // extending the scope's lifetime to the lifetime of the returned object
    }
    run(): void {
        dispose(this.#scope) {
            this.#arr = Allocate(5);
        }
    }
}
async dispose {
    const task = new Task();
    task.run();
}

We can even keep improving this by integrating async context tracking mechanism like Node.js AsyncLocalStorage / AsyncResource or the AsyncContext proposal. So the DisposeScope itself is an AsyncResource.

There are however some performance implications if we binds the lifetime of an async DisposeScope to its references. It means the dispose can only happen after a GC cycle. But personally I think the benifits are worth the tradeoff. We can achieve almost implicit usage of managed resource this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions