Skip to content

Optimize retrieval of simple name from Assembly.FullName, fix up faulty methods #9739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 22, 2025

Conversation

h3xds1nz
Copy link
Member

@h3xds1nz h3xds1nz commented Sep 7, 2024

Description

Optimizes retrieval of simple name from Assembly.FullName and removes all the weird ways used throughout to get it, this optimizes both startup and runtime performance.

  • In Baml2006Reader and XamlSchemaContext the original method did not properly parse assembly names with properly escaped commas, a weird edge-case but still a valid one. This was there due to CAS anyways.
  • Since we're in all cases as far as I know working with RuntimeAssembly, using assembly.GetName().Name is always the slowest, followed by new AssemblyName(assembly.FullName).Name.
  • The original intention was to merely unify this behaviour, however given how slow it is and since there was already custom parsing done in Xaml/Baml, I've decided to roll our own.
  • The two methods were moved from SafeSecurityHelper to its own ReflectionUtils static class.
  • Since the input is trusted (FullName is already after AssemblyName parser sanization, and with RuntimeAssembly (even dynamic assemblies from AssemblyBuilder), we can just look for a separator, there should be no concerns with custom parsing.
  • PBT on net472 fallbacks to new AssemblyName(assembly.FullName).Name.

Benchmarks for RuntimeAssembly

Method assembly Mean [ns] Error [ns] StdDev Gen0 Code Size Allocated
GetName_Name DataF(...)=null [75] 762.18 ns 4.407 ns 4.123 ns 0.0439 1,768 B 736 B
AssemblyName DataF(...)=null [75] 382.63 ns 4.307 ns 4.028 ns 0.0291 3,140 B 488 B
PR_EDIT DataF(...)=null [75] 10.25 ns 0.045 ns 0.038 ns - 1,629 B -
GetName_Name Syste(...)7798e [89] 516.84 ns 4.869 ns 4.555 ns 0.0362 1,761 B 616 B
AssemblyName Syste(...)7798e [89] 387.02 ns 3.930 ns 3.676 ns 0.0310 3,115 B 520 B
PR_EDIT Syste(...)7798e [89] 10.75 ns 0.051 ns 0.045 ns - 1,629 B -
GetName_Name Bench(...)cefc4 [84] 739.83 ns 7.599 ns 6.736 ns 0.0486 1,761 B 816 B
AssemblyName Bench(...)cefc4 [84] 375.59 ns 4.439 ns 4.153 ns 0.0300 3,140 B 504 B
PR_EDIT Bench(...)cefc4 [84] 10.76 ns 0.078 ns 0.073 ns - 1,629 B -

Benchmark code

Benchmark code
public static IEnumerable<Assembly> AssembliesToTest
{
    get
    {
        // DataFormatsBenchmark, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null
        yield return AppDomain.CurrentDomain.GetAssemblies()[1];
        // System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e
        yield return AppDomain.CurrentDomain.GetAssemblies()[0];
        // BenchmarkDotNet, Version=0.13.12.0, Culture=neutral, PublicKeyToken=aa0ca2f9092cefc4
        yield return AppDomain.CurrentDomain.GetAssemblies()[2];
    }
}

[Benchmark]
[ArgumentsSource(nameof(AssembliesToTest))]
public string AssemblyName(Assembly assembly)
{
    AssemblyName name = new AssemblyName(assembly.FullName);
    string partialName = name.Name;
    return (partialName != null) ? partialName : string.Empty;
}

[Benchmark]
[ArgumentsSource(nameof(AssembliesToTest))]
public string GetName_Name(Assembly assembly)
{
    return assembly.GetName().Name ?? string.Empty;
}

[Benchmark]
[ArgumentsSource(nameof(AssembliesToTest))]
public ReadOnlySpan<char> PR_EDIT(Assembly assembly)
{
    ReadOnlySpan<char> fullName = assembly.FullName;
    if (fullName.IsEmpty)
        return ReadOnlySpan<char>.Empty;

    ReadOnlySpan<char> nameSlice = fullName;
    // Skip any escaped commas in the name if present
    int escapedComma = fullName.LastIndexOf("\\,", StringComparison.Ordinal);
    if (escapedComma != -1)
        nameSlice = nameSlice.Slice(escapedComma + 2);

    // Find the real ending of the name section
    int commaIndex = nameSlice.IndexOf(',');
    if (commaIndex != -1)
        fullName = fullName.Slice(0, fullName.Length - nameSlice.Length + commaIndex);

    // Check if we need to unescape, this is very rare case so we can just do it the dirty way
    if (escapedComma != -1 || fullName.Contains('\\'))
        UnescapeDirty(ref fullName);

    // Since having "," or "=" in the assembly name is very rare, we don't want to inline
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void UnescapeDirty(ref ReadOnlySpan<char> dirtyName)
    {
        StringBuilder sb = new(dirtyName.Length);
        sb.Append(dirtyName);
        sb.Replace("\\", null);
        dirtyName = sb.ToString();
    }

    return fullName;
}

Customer Impact

Improved startup/runtime performance, decreased allocations.

Regression

No.

Testing

Local build, sample app run, assert tests.

// Please put "allows ref struct" here, those ToString() hurt
// System.Private.CoreLib, Version=9.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e
// Expected "System.Private.CoreLib"
ReadOnlySpan<char> x1 = GetAssemblyPartialName(AppDomain.CurrentDomain.GetAssemblies()[0]);
Assert.AreEqual(x1.ToString(), AppDomain.CurrentDomain.GetAssemblies()[0].GetName().Name);

// Some additional tests to test consistency of the parser
AssemblyBuilder asm1 = AssemblyBuilder.DefineDynamicAssembly(new AssemblyName("1"), AssemblyBuilderAccess.Run);
AssemblyBuilder asm2 = AssemblyBuilder.DefineDynamicAssembly(new AssemblyName("\\,1\\="), AssemblyBuilderAccess.Run);
AssemblyBuilder asm3 = AssemblyBuilder.DefineDynamicAssembly(new AssemblyName("\\,thisis\\,alsovalid.name"), AssemblyBuilderAccess.Run);
AssemblyBuilder asm4 = AssemblyBuilder.DefineDynamicAssembly(new AssemblyName("DataFormat\\,\\, Benchmark\\, Version\\=, Culture=neutral, PublicKeyToken=null"), AssemblyBuilderAccess.Run);
AssemblyBuilder asm5 = AssemblyBuilder.DefineDynamicAssembly(new AssemblyName("DataFormat\\,\\, Benchmark\\, Version\\=, Culture=neutral, PublicKeyToken=null"), AssemblyBuilderAccess.Run);
AssemblyBuilder asm6 = AssemblyBuilder.DefineDynamicAssembly(new AssemblyName("\\,\\=+thisis\\,alsovalid.name"), AssemblyBuilderAccess.Run);
x1 = GetAssemblyPartialName(asm1);
Assert.AreEqual(x1.ToString(), asm1.GetName().Name);
x1 = GetAssemblyPartialName(asm2);
Assert.AreEqual(x1.ToString(), asm2.GetName().Name);
x1 = GetAssemblyPartialName(asm3);
Assert.AreEqual(x1.ToString(), asm3.GetName().Name);
x1 = GetAssemblyPartialName(asm4);
Assert.AreEqual(x1.ToString(), asm4.GetName().Name);
x1 = GetAssemblyPartialName(asm5);
Assert.AreEqual(x1.ToString(), asm5.GetName().Name);
x1 = GetAssemblyPartialName(asm6);
Assert.AreEqual(x1.ToString(), asm6.GetName().Name);

Risk

In my mind it is low, all places I've tested are passing RuntimeAssembly, so there should be no surprises.

Microsoft Reviewers: Open in CodeFlow

@h3xds1nz h3xds1nz requested review from a team as code owners September 7, 2024 23:51
@dotnet-policy-service dotnet-policy-service bot added PR metadata: Label to tag PRs, to facilitate with triage Community Contribution A label for all community Contributions labels Sep 7, 2024
@h3xds1nz h3xds1nz force-pushed the assemblyname-speedup branch from f4173b4 to 072404b Compare October 17, 2024 10:52
@h3xds1nz
Copy link
Member Author

Fix merge conflicts after #9744

himgoyalmicro
himgoyalmicro previously approved these changes Jan 15, 2025
@himgoyalmicro
Copy link
Contributor

@h3xds1nz, this PR can be taken in after the conflict resolution.

@himgoyalmicro
Copy link
Contributor

Also, would it be possible to add some tests for this change as well?

@h3xds1nz
Copy link
Member Author

@himgoyalmicro Awesome, now just the other one and I can continue the journey <3

Merge conflicts should be solved in the last commit, was kind of few wrong auto-merges of usings.

Regarding tests, yeah, I'll get a PR in somewhere this week.

@ThomasGoulet73
Copy link
Contributor

Sorry for being a bit late but wouldn't it be better to use the new AssemblyNameInfo instead ? It should be faster than the existing code using AssemblyName and it might be slower than the custom parser that you implemented but it would be better than using our own parser that might contain bugs and that the WPF team would need to maintain.

@h3xds1nz
Copy link
Member Author

@ThomasGoulet73 It uses the same parser impl basically except it doesn't throw because it has TryParse. I've considered it originally.

As I've mentioned, I wouldn't use custom impl if it wasn't already parsed by the runtime parser. It is not a science and it is well documented what you need to escape. Plus WPF was already doing custom slicing and it was doing it wrong.

@ThomasGoulet73
Copy link
Contributor

I mean I wouldn't call it a "science" but I wouldn't call it straightforward either. The thing is your implementation does not do exactly the same thing as GetName/AssemblyName. Assembly names can contain chars such as '\t', '\n', '\r' or '\\'. You can see in your own test that if you use new AssemblyName("Test\n\Test") the result differs between your parser and GetName/AssemblyName/AssemblyNameInfo.

It most probably never happen in practice but I just wanted to make it known that it wasn't a 1-to-1 replacement.

Also FYI I wrote a quick fuzzer to test it out:

Code
char[] supportedChars = ['a', 'b', 'c', '\\', ',', '=', '\'', '"', 't', 'r', 'n'];

for (int x = 0; x < 10_000; x++)
{
    int len = Random.Shared.Next(1, 30);

    char[] chars = new char[len];

    for (int y = 0; y < chars.Length; y++)
    {
        int i = Random.Shared.Next(0, supportedChars.Length);

        chars[y] = supportedChars[i];
    }

    var str = new string(chars);

    if (AssemblyNameInfo.TryParse(str, out AssemblyNameInfo ani))
    {
        var name = GetAssemblyPartialName(ani.FullName);

        if (name.ToString() != ani.Name)
        {
            File.AppendAllLines("Test1.txt", [$"{ani.FullName}: {name.ToString()} != {ani.Name}"]);
        }
    }
}

It uses the same parser impl basically except it doesn't throw because it has TryParse.

That's true. It's not much faster but it allocates 1 less instance (AssemblyName itself), can be used with span and won't instantiate a CultureInfo if Culture != neutral.

@h3xds1nz
Copy link
Member Author

I have indeed missed that part and I would probably find out during writing the extended tests but I'm happy you've pointed it out.

I have updated the fallback mechanism to use AssemblyNameInfo so I don't have to re-implement the unescape mechanism, this way we will safely get the benefit of both speed and correctness and those who use weird assembly names will keep paying the price.

@h3xds1nz
Copy link
Member Author

@himgoyalmicro This is now ready, last commit addresses a great concern from Thomas I've indeed missed and didn't test for.

Copy link
Contributor

@himgoyalmicro himgoyalmicro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@himgoyalmicro himgoyalmicro merged commit 93ee7fb into dotnet:main Jan 22, 2025
8 checks passed
@himgoyalmicro
Copy link
Contributor

himgoyalmicro commented Jan 22, 2025

We are deeply grateful for your continuous contribution, @h3xds1nz 😄. Just a small request can you please also update #9822?

@h3xds1nz
Copy link
Member Author

@himgoyalmicro That is so sweet, thank you! Yeah I was planning to rebase that one. Glad we're thinking the same haha

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Community Contribution A label for all community Contributions PR metadata: Label to tag PRs, to facilitate with triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants