Closed
Description
For p/invokes, the character set is encoded into the metadata for the method. As a result, adding anything, like UTF-8, is complex and far-reaching. The current experience (ANSI means UTF-8 on Unix) is odd and confusing. The p/invoke source generator should be used to improve this experience.
We’d like to:
- Avoid proliferating the pattern of ‘use
CharSet.Ansi
on Unix to get UTF-8 - Allow specifying the character set to use for all parameters a method
- Instead of needing
MarshalAs
on each parameter
- Instead of needing
- Avoid adding to the
CharSet
enumeration- Don’t want inconsistent support and don’t want to implement new support in all the places that currently use it
Our current thinking is to:
- Remove
CharSet
field - Add
MarshalStringsUsing
field -Type
- Should be a type that could be used with
MarshalUsing
/NativeMarshalling
attributes for custom marshalling of strings - New APIs can be added for common marshallers (which can use things like
System.Text.Encoding
under the hood)
- Should be a type that could be used with
Example:
// UTF-8 - equivalent to explicitly specifying [MarshalAs(UnmanagedType.LPUTF8Str)] on string parameters
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf8StringMarshalling))]
public static partial int Method(string s);
// UTF-16 - equivalent to CharSet.Unicode behaviour in built-in
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(System.Runtime.InteropServices.Encoding.Utf16StringMarshalling))]
public static partial int Method(string s);
// Error - invalid encoding
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(int))]
public static partial int Method(string s);
// User-defined marshalling
[GeneratedDllImport("lib", MarshalStringsUsing = typeof(MyCustomMarshal.Wtf8String))]
public static partial int Method(string s);
Where:
// .NET can provide:
namespace System.Runtime.InteropServices.Encoding
{
// UTF-16 with endianness based on the current platform
struct Utf16StringMarshalling { ... }
// UTF-8
struct Utf8StringMarshalling { ... }
// ANSI
[SupportedOSPlatform("windows")]
struct AnsiStringMarshalling { ... }
...
}
// User can define:
namespace MyCustomMarshal
{
struct Wtf8String { ... }
}
Other considerations:
- Naming:
Unicode
vsUtf16
- .NET has usually used the (Windows-centric) term Unicode to refer to UTF-16. Naming the struct
...Utf16StringMarshalling
would be correct and in line with our cross-platform focus, butUnicodeStringMarshalling
would be more consistent with existing APIs.
- .NET has usually used the (Windows-centric) term Unicode to refer to UTF-16. Naming the struct
- Auto (UTF-8 on Unix, UTF-16 on Windows);
- We expect usage to be low. If necessary, users can define different p/invokes and call the desired one conditionally (for example, using the
OperatingSystem
APIs)
- We expect usage to be low. If necessary, users can define different p/invokes and call the desired one conditionally (for example, using the
- Defaults:
- The source generator requires specifying marshalling information for string/char.
- Requires the intention to be made clear and removes hidden assumptions, but can make declarations more verbose
- The source generator does not check / reconcile higher level settings like
DefaultCharSetAttribute
.
- The source generator requires specifying marshalling information for string/char.
ExactSpelling
: usesCharSet
to probe for entry point on Windows, doesn’t mean anything on Unix- The source generator could require exact spelling for entry point names
- Would be in the spirit of avoiding propagating some of the Windows-centric aspects of DllImport
- The source generator could require exact spelling for entry point names