Skip to content

[DllImportGenerator] Update GeneratedDllImportAttribute handling of character set / encoding #61326

Closed
@elinor-fung

Description

@elinor-fung

For p/invokes, the character set is encoded into the metadata for the method. As a result, adding anything, like UTF-8, is complex and far-reaching. The current experience (ANSI means UTF-8 on Unix) is odd and confusing. The p/invoke source generator should be used to improve this experience.

We’d like to:

  • Avoid proliferating the pattern of ‘use CharSet.Ansi on Unix to get UTF-8
  • Allow specifying the character set to use for all parameters a method
    • Instead of needing MarshalAs on each parameter
  • Avoid adding to the CharSet enumeration
    • Don’t want inconsistent support and don’t want to implement new support in all the places that currently use it

Our current thinking is to:

Example:

// UTF-8 - equivalent to explicitly specifying [MarshalAs(UnmanagedType.LPUTF8Str)] on string parameters
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf8StringMarshalling))]
public static partial int Method(string s);

// UTF-16 - equivalent to CharSet.Unicode behaviour in built-in
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf16StringMarshalling))]
public static partial int Method(string s);

// Error - invalid encoding
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(int))]
public static partial int Method(string s);

// User-defined marshalling
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(MyCustomMarshal.Wtf8String))]
public static partial int Method(string s);

Where:

// .NET can provide:
namespace System.Runtime.InteropServices.Encoding
{
    // UTF-16 with endianness based on the current platform
    struct Utf16StringMarshalling { ... }

    // UTF-8
    struct Utf8StringMarshalling { ... }

    // ANSI
    [SupportedOSPlatform("windows")]
    struct AnsiStringMarshalling { ... }

    ...
}

// User can define:
namespace MyCustomMarshal
{
    struct Wtf8String { ... }
}

Other considerations:

  • Naming: Unicode vs Utf16
    • .NET has usually used the (Windows-centric) term Unicode to refer to UTF-16. Naming the struct ...Utf16StringMarshalling would be correct and in line with our cross-platform focus, but UnicodeStringMarshalling would be more consistent with existing APIs.
  • Auto (UTF-8 on Unix, UTF-16 on Windows);
    • We expect usage to be low. If necessary, users can define different p/invokes and call the desired one conditionally (for example, using the OperatingSystem APIs)
  • Defaults:
    • The source generator requires specifying marshalling information for string/char.
      • Requires the intention to be made clear and removes hidden assumptions, but can make declarations more verbose
    • The source generator does not check / reconcile higher level settings like DefaultCharSetAttribute.
  • ExactSpelling: uses CharSet to probe for entry point on Windows, doesn’t mean anything on Unix
    • The source generator could require exact spelling for entry point names
      • Would be in the spirit of avoiding propagating some of the Windows-centric aspects of DllImport

@AaronRobinsonMSFT @jkoritzinsky @jkotas @stephentoub

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions