Skip to content

Conversation

@kpreisser
Copy link
Contributor

@kpreisser kpreisser commented Oct 8, 2022

Hi, this is a PR to use DynamicMethod (System.Reflection.Emit) for generating code to call callbacks defined with Function.FromCallback() and Linker.DefineFunction(), instead of using reflection. This improves performance (although not as much as I imagined), and avoids a number of heap allocations (e.g. allocating the arguments array, and boxing the arguments and return values), thereby contributing to #113.

Additionally, I fixed the handling of nested ValueTuples, which previously only worked one level. For example, returning a ValueTuple<..., ValueTuple<..., ValueTuple<int>>> (= 15 values) now should work correctly. Additionally, issues #158 and #159 should be fixed with this change.
I also changed the static Function.Finalizer and Value.Finalizer delegates to a method with an UnmanagedCallersOnlyAttribute and use it as function pointer (delegate* unmanaged<IntPtr, void>), since it is a static method, which can also improve performance.

Edit: I updated the PR so that when the current .NET runtime/platform interprets or doesn't support dynamic code (e.g. on Unity when using the IL2CPP backend, where it might be possible to use precompiled wasm modules that don't require a JIT), reflection is still used as a fallback to call the callback.

For example, when using a Func<Caller, int, long, string, ValueTuple<int, float, double, long, object, Function, int, ValueTuple<int>>>, the generated code will be equivalent to the following:

static unsafe void InvokeCallback(Delegate callback, Caller caller, Value* args, int nargs, Value* results, int nresults)
{
    var dele = (Func<Caller, int, long, string, ValueTuple<int, float, double, long, object, Function, int, ValueTuple<int>>>)callback;

    ValueTuple<int, float, double, long, object, Function, int, ValueTuple<int>> result = dele(
        caller,
        Int32ValueBoxConverter.Instance.Unbox(caller, args[0].ToValueBox()),
        Int64ValueBoxConverter.Instance.Unbox(caller, args[1].ToValueBox()),
        GenericValueBoxConverter<string>.Instance.Unbox(caller, args[2].ToValueBox()));

    results[0] = Value.FromValueBox(Int32ValueBoxConverter.Instance.Box(result.Item1));
    results[1] = Value.FromValueBox(Float32ValueBoxConverter.Instance.Box(result.Item2));
    results[2] = Value.FromValueBox(Float64ValueBoxConverter.Instance.Box(result.Item3));
    results[3] = Value.FromValueBox(Int64ValueBoxConverter.Instance.Box(result.Item4));
    results[4] = Value.FromValueBox(GenericValueBoxConverter<object>.Box(result.Item5));
    results[5] = Value.FromValueBox(FuncRefValueBoxConverter.Instance.Box(result.Item6));
    results[6] = Value.FromValueBox(Int32ValueBoxConverter.Instance.Box(result.Item7));
    results[7] = Value.FromValueBox(Int32ValueBoxConverter.Instance.Box(result.Rest.Item1));
}

The most performance boost occurs when defining a function with a single parameter. When testing with the following code with .NET 7.0.0-rc.1 on Windows 10 Version 21H2 x64, using an Action<int>:

    using var config = new Config();
    config.WithOptimizationLevel(OptimizationLevel.Speed);

    using var engine = new Engine(config);
    using var module = Module.FromText(
        engine,
        "hello",
        @"
(module 
    (func $hello (import """" ""hello"") (param i32))
    (func (export ""run"")
        (local $0 i32)
        loop $for-loop|0
            local.get $0
            i32.const 2000000
            i32.lt_s
            if
                local.get $0
                call $hello

                local.get $0
                i32.const 1
                i32.add
                local.set $0
                br $for-loop|0
            end
        end
    )
)
");

    using var linker = new Linker(engine);
    using var store = new Store(engine);

    int calls = 0;
    linker.Define(
        "",
        "hello",
        Function.FromCallback(store, (int x) =>
        {
            calls++;
        })
    );

    var instance = linker.Instantiate(store, module);
    var run = instance.GetAction("run")!;

    var sw = new Stopwatch();
    for (int i = 0; i < 5; i++)
    {
        sw.Restart();
        run();
        sw.Stop();

        Console.WriteLine("Elapsed: " + sw.Elapsed);
    }

Before the change, the times are listed as follows (when compiling for Release):

Elapsed: 00:00:00.3099829
Elapsed: 00:00:00.4299516
Elapsed: 00:00:00.4231544
Elapsed: 00:00:00.4376052
Elapsed: 00:00:00.4250763

After the change:

Elapsed: 00:00:00.1626791
Elapsed: 00:00:00.1181175
Elapsed: 00:00:00.1178859
Elapsed: 00:00:00.1169129
Elapsed: 00:00:00.1146859

However, when using more than one arguments, the time with reflection suddenly decreases, and the performance gain is much less. For example, using a Action<int, float, long>:

    using var config = new Config();
    config.WithOptimizationLevel(OptimizationLevel.Speed);

    using var engine = new Engine(config);
    using var module = Module.FromText(
        engine,
        "hello",
        @"
(module 
    (func $hello (import """" ""hello"") (param i32 f32 i64))
    (func (export ""run"")
        (local $0 i32)
        loop $for-loop|0
            local.get $0
            i32.const 2000000
            i32.lt_s
            if
                local.get $0
                f32.const 123.456
                i64.const 1234567890
                call $hello

                local.get $0
                i32.const 1
                i32.add
                local.set $0
                br $for-loop|0
            end
        end
    )
)
");

    using var linker = new Linker(engine);
    using var store = new Store(engine);

    int calls = 0;
    linker.Define(
        "",
        "hello",
        Function.FromCallback(store, (int x, float y, long z) =>
        {
            calls++;
        })
    );

    var instance = linker.Instantiate(store, module);
    var run = instance.GetAction("run")!;

    var sw = new Stopwatch();
    for (int i = 0; i < 5; i++)
    {
        sw.Restart();
        run();
        sw.Stop();

        Console.WriteLine("Elapsed: " + sw.Elapsed);
    }

Before the change:

Elapsed: 00:00:00.3110580
Elapsed: 00:00:00.2320351
Elapsed: 00:00:00.2336036
Elapsed: 00:00:00.2323470
Elapsed: 00:00:00.2347170

After the change:

Elapsed: 00:00:00.2313183
Elapsed: 00:00:00.2069259
Elapsed: 00:00:00.2080663
Elapsed: 00:00:00.2107904
Elapsed: 00:00:00.2098015

Testing with a Func<int, float, long, ValueTuple<int, int, long>>:

    using var config = new Config();
    config.WithOptimizationLevel(OptimizationLevel.Speed);

    using var engine = new Engine(config);
    using var module = Module.FromText(
        engine,
        "hello",
        @"
(module 
    (func $hello (import """" ""hello"") (param i32 f32 i64) (result i32 i32 i64))
    (func (export ""run"")
        (local $0 i32)
        loop $for-loop|0
            local.get $0
            i32.const 2000000
            i32.lt_s
            if
                local.get $0
                f32.const 123.456
                i64.const 1234567890
                call $hello
                drop
                drop
                drop

                local.get $0
                i32.const 1
                i32.add
                local.set $0
                br $for-loop|0
            end
        end
    )
)
");

    using var linker = new Linker(engine);
    using var store = new Store(engine);

    int calls = 0;
    linker.Define(
        "",
        "hello",
        Function.FromCallback(store, (int x, float y, long z) =>
        {
            calls++;
            return (1, 2, 3L);
        })
    );

    var instance = linker.Instantiate(store, module);
    var run = instance.GetAction("run")!;

    var sw = new Stopwatch();
    for (int i = 0; i < 5; i++)
    {
        sw.Restart();
        run();
        sw.Stop();

        Console.WriteLine("Elapsed: " + sw.Elapsed);
    }

Before:

Elapsed: 00:00:00.6005550
Elapsed: 00:00:00.5286508
Elapsed: 00:00:00.5350325
Elapsed: 00:00:00.5246926
Elapsed: 00:00:00.5371668

After:

Elapsed: 00:00:00.4315517
Elapsed: 00:00:00.3945320
Elapsed: 00:00:00.3934935
Elapsed: 00:00:00.4064949
Elapsed: 00:00:00.3909994

Note: Currently the delegate types that can be used to define a callback are limited to Action<...> and Func<...>. I think this restriction could be lifted in the future, to allow delegates of any type. (This would require a change in Function.GetFunctionType(), to not iterate through the generic type arguments, but interate through the parameter types of the delegate's Invoke method.)


Another approach could be to use a source generator, as noted in the previous comments in Function.InvokeCallback(). I chose the DynamicMethod approach as it is agnositc to the consumer's language used (e.g. C#, F#, VB.NET etc), and also works if the assembly that defines the delegates has already been compiled (e.g. in a plugin system).

A difference is that with the DynamicMethod approach, code (IL, and then native code by the JIT when called) is generated when defining the methods; however I think that is negligible since because this usually happens when there is already another code generation (by Wasmtime itself).
However, the .NET runtime/platform must support compiling dynamic code. If dynamic code isn't supported, or is supported but would be interpreted (which would probably cause a performance slowdown), we need to fall back using reflection to call the callback.

What do you think?

Thanks!

…stead of using reflection to call a registered callback.

This can improve performance and reduces heap allocations (contributing to bytecodealliance#113).

This also solves issues bytecodealliance#158 and bytecodealliance#159.
The exception message now occurs from the cast '(T?)value.ExternRefObject' in GenericValueBoxConverter<T>.Unbox().
…rt compilation of dynamic code.

TODO: Run the FunctionTests and ExternRefTests separately for this scenario.
@martindevans
Copy link
Contributor

martindevans commented Oct 10, 2022

I'll test this out in Unity tomorrow and will report back if there are any issues there. Thanks for keeping Unity in mind when developing this :)

@martindevans
Copy link
Contributor

@kpreisser Unfortunately I encountered several issues in Unity :(

delegate* unmanaged<IntPtr, void> finalizer

"The target runtime doesn't support extensible or runtime-environment default calling conventions."

All of the function pointers have this same error. It looks like explicitly specifying a calling convention (e.g. delegate* unmanaged[Cdecl]<IntPtr, void> finalizer) works around this. I'm not familiar enough with this area to know which calling convention should be used or if it's practical to specify everywhere, but I hope this is an easy fix?

dynamicMethod.CreateDelegate<InvokeCallbackDelegate>(callback)

There's no generic version of CreateDelegate available in Unity. Using dynamicMethod.CreateDelegate(typeof(InvokeCallbackDelegate), callback); instead is an obvious workaround.

[UnmanagedCallersOnly]

This attribute simply doesn't exist. Without it code like &Finalize cannot work. I think this is the most serious issue, there's no easy workaround that I'm aware of.

@kpreisser
Copy link
Contributor Author

kpreisser commented Oct 10, 2022

Hi @martindevans,
thanks for testing this, and reporting back!

Ok, I think I will revert the change to Finalizer (delegate*); it actually isn't required for the main changes in this PR. (It is just a tiny performance optimization that could be looked at separately.)

The other necessary change with the non-generic CreateDelegate should be fine.

Out of interest (as it's been a while since I tried out Unity), what configuration do you use in Unity (e.g. which scripting backend)? Which .NET version does Unity currently support?
UnmanagedCallersOnlyAttribute and MethodInfo.CreateDelegate<T>() were introduced with .NET 5.0; if that's not present in the BCL used by Unity, it seems that they only support an older .NET version. (In that case, it might make sense to look at adding support for older target frameworks for wasmtime-dotnet).

Thanks!

@martindevans
Copy link
Contributor

martindevans commented Oct 10, 2022

I just did an experiment with a slightly different way to do this (not using IL generation). Instead of using Linker.Define I used linker.DefineFunction (so now we've got some type information) and then modified the callback to look like this: https://gist.github.com/martindevans/5152cbc078472e678defbdc005238678. basically taking advantage of the extra type information to do away with the need for reflecting anything. By my benchmark (https://gist.github.com/martindevans/54431bf8f2d421b4c4e0eb4fbfe512f6) this is about the same speed as the IL generated solution (which is about what I'd expect, it's basically doing the same thing). The downside is that this only works when the generic DefineFunction is used instead of the non-generic Define.

Since it looks like there might be more work done on this would it be possible to split out your bugfix changes (for #158 and #159) to a separate PR, that way Peter Huene can review those while we iterate on the rest?

what configuration do you use in Unity

Basically all of them - I'm developing an asset for the asset store (making WASM easy to use in Unity for safer modding and easier non-C# dependencies) so of course I want to support as wide an array of usecases as possible (currently targetting Unity 2021 as the minimum version, but I might bump that up to 2022 if necessary). At the moment Unity roughly corresponds to .NET Standard 2.1 I believe.

it might make sense to look at adding support for older target frameworks

That's a good point. I just did a very quick test and the project does seems to build just fine with <TargetFramework>netstandard2.1</TargetFramework><LangVersion>10</LangVersion>. Perhaps that should be the main target instead of .net5?

@kpreisser
Copy link
Contributor Author

kpreisser commented Oct 10, 2022

I just did an experiment with a slightly different way to do this (not using IL generation). Instead of using Linker.Define I used linker.DefineFunction (so now we've got some type information) and then modified the callback to look like this: https://gist.github.com/martindevans/5152cbc078472e678defbdc005238678. basically taking advantage of the extra type information to do away with the need for reflecting anything.

Thanks! Yes, that's a good idea, I also just thought about this. That way we would have different generic overloads of Linker.DefineFunction for the different Action<...> and Func<...> types, I think could also be made to work for returning ValueTuple. For example, we could have overloads like these, which then use ValueBox.Converter<T> for each parameter type and return type:

void DefineFunction(string module, string name, Action callback);
void DefineFunction<T>(string module, string name, Action<T> callback);
void DefineFunction<T1, T2>(string module, string name, Action<T1, T2> callback);
void DefineFunction<T1, T2, T3>(string module, string name, Action<T1, T2, T3> callback);
void DefineFunction<TResult>(string module, string name, Func<TResult> callback);
void DefineFunction<T, TResult>(string module, string name, Func<T, TResult> callback);
void DefineFunction<T1, T2, TResult>(string module, string name, Func<T1, T2, TResult> callback);
void DefineFunction<T1, T2, T3, TResult>(string module, string name, Func<T1, T2, T3, TResult> callback);
void DefineFunction<TResult1, TResult2>(string module, string name, Func<ValueTuple<TResult1, TResult2>> callback);
void DefineFunction<T, TResult1, TResult2>(string module, string name, Func<T, ValueTuple<TResult1, TResult2>> callback);
void DefineFunction<T1, T2, TResult1, TResult2>(string module, string name, Func<T1, T2, ValueTuple<TResult1, TResult2>> callback);
void DefineFunction<T1, T2, T3, TResult1, TResult2>(string module, string name, Func<T1, T2, T3, ValueTuple<TResult1, TResult2>> callback);
// etc, for example we could support combinations for up to 16 parameters and up to 4 result values

This would also have optimal performance and avoid the allocations, without the need to dynamically generate code. However, this would probably need a way to auto-generate code for these methods functions (e.g. in a partial class file) when compiling Wasmtime.csproj, as they would be too many to edit them manually.

For other delegate types that don't fit in this pattern, we could then still fall back to reflection if needed.

@kpreisser
Copy link
Contributor Author

kpreisser commented Oct 10, 2022

I fiddled a bit with T4 text templates to generate overloads of Linker.DefineFunction() for combinations of possible parameter counts, result counts, and hasCaller flags. This seems to actually work with the C# compiler resolving the correct overload:

grafik

The generated overload for the above example would look like this:

public void DefineFunction<T1, T2, T3, TResult1, TResult2, TResult3>(string module, string name, Func<Caller, T1, T2, T3, ValueTuple<TResult1, TResult2, TResult3>> callback)
{
    // ...

    var parameterKinds = new List<ValueKind>();
    var resultKinds = new List<ValueKind>();

    using var funcType = Function.GetFunctionType(callback.GetType(), parameterKinds, resultKinds, out var hasCaller);

    // ...

    var convT1 = ValueBox.Converter<T1>();
    var convT2 = ValueBox.Converter<T2>();
    var convT3 = ValueBox.Converter<T3>();
    var convTResult1 = ValueBox.Converter<TResult1>();
    var convTResult2 = ValueBox.Converter<TResult2>();
    var convTResult3 = ValueBox.Converter<TResult3>();

    unsafe
    {
        Function.Native.WasmtimeFuncCallback func = (env, callerPtr, args, nargs, results, nresults) =>
        {
            using var caller = new Caller(callerPtr);

            try
            {
                var result = callback(
                    caller,
                    convT1.Unbox(caller, args[0].ToValueBox()),
                    convT2.Unbox(caller, args[1].ToValueBox()),
                    convT3.Unbox(caller, args[2].ToValueBox()));

                results[0] = Value.FromValueBox(convTResult1.Box(result.Item1));
                results[1] = Value.FromValueBox(convTResult2.Box(result.Item2));
                results[2] = Value.FromValueBox(convTResult3.Box(result.Item3));                

                return IntPtr.Zero;
            }
            catch (Exception ex)
            {
                var bytes = Encoding.UTF8.GetBytes(ex.Message);

                fixed (byte* ptr = bytes)
                {
                    return Function.Native.wasmtime_trap_new(ptr, (UIntPtr)bytes.Length);
                }
            }
        };

        // ...
    }
}

This would allow efficiently invoking the callback as long as one of the generic overloads is called (i.e. the delegate is known at compile-time to be Func<...>/Action<...>), without the need to use dynamic code generation. For delegate types not covered by the overloads (or if the delegate type is unknown at compile-time), we still would use reflection to call it.

@kpreisser
Copy link
Contributor Author

Closing in favor of #163.

@kpreisser kpreisser closed this Oct 11, 2022
@kpreisser kpreisser deleted the callbackWithDynamicMethod branch October 11, 2022 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants