Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Simplify JitDisasm matching behavior and support namespaces/generics in release #74430

Merged
merged 27 commits into from
Sep 4, 2022

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Aug 23, 2022

This changes how the JIT matches method names and signatures for method
sets (e.g. JitDisasm). It also starts printing method instantiations for full method names
and makes references to types consistent in generic instantiations and the signature.

In addition it starts supporting generic instantiations in release too.

To do this, most of the type printing is moved to the JIT, which also aligns the output
between crossgen2 and the VM (there were subtle differences here, like spaces between generic type arguments).
More importantly, we (for the most part) stop relying on JIT-EE methods that are documented to only be for debug purposes.

The new behavior of the matching is the following:

  • The matching behavior is always string based.
  • The JitDisasm string is a space-separated list of patterns. Patterns can arbitrarily
    contain both '*' (match any characters) and '?' (match any 1 character).
  • The string matched against depends on characters in the pattern:
    • If the pattern contains a ':' character, the string matched against is prefixed by the class name and a colon
    • If the pattern contains a '(' character, the string matched against is suffixed by the signature
    • If the class name (part before colon) contains a '[', the class contains its generic instantiation
    • If the method name (part between colon and '(') contains a '[', the method contains its generic instantiation

For example, consider

namespace MyNamespace
{
    public class C<T1, T2>
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        public void M<T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
        {
        }
    }
}

new C<sbyte, string>().M<int, object>(default, default, default, default); // compilation 1
new C<int, int>().M<int, int>(default, default, default, default); // compilation 2

The full strings are:
Before:

MyNamespace.C`2[SByte,__Canon][System.SByte,System.__Canon]:M(byte,System.__Canon,int,System.__Canon)
MyNamespace.C`2[Int32,Int32][System.Int32,System.Int32]:M(int,int,int,int)

Notice no method instantiation and the double class instantiation, which seems like an EE bug. Also two different names are used for sbyte: System.SByte and byte.

After the change the strings are:

MyNamespace.C`2[byte,System.__Canon]:M[int,System.__Canon](byte,System.__Canon,int,System.__Canon)
MyNamespace.C`2[int,int]:M[int,int](int,int,int,int)

The following strings will match both compilations:

M
*C`2:M
*C`2[*]:M[*](*)
MyNamespace.C`2:M

The following will match only the first one:

M[int,*Canon]
MyNamespace.C`2[byte,*]:M
M(*Canon)

There is one significant change in behavior here, which is that I have removed the special case that allows matching class names without namespaces. In particular, today Console:WriteLine would match all overloads of System.Console.WriteLine, while after this change it will not match. However, with generalized wild cards the replacement is simple in *Console:WriteLine.

This changes how the JIT matches method names and signatures for method
sets (e.g. JitDisasm). The new behavior is the following:

* The matching behavior is always string based.
* The JitDisasm string is a space-separated list of patterns. Patterns
  can be quoted if they need to include spaces, and can arbitrarily
  contain both * (match any characters) and ? (match any 1 character).
* If the pattern contains a : character, the string matched against is
  prefixed by "ClassName:".
* If the pattern contains a ( character then the string matched against
  ends with its signature in the usual format.
* Type names do not include namespaces.
* The signature matched against does not contain return types or
  "is-instance" specifiers.

For example, *:WriteLine is equivalent to WriteLine and will match all
methods named WriteLine. Console:WriteLine will match all overloads of
WriteLine defined in a class called Console. Console:WriteLine(String)
and WriteLine(String) will match only the string overload.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 23, 2022
@ghost ghost assigned jakobbotsch Aug 23, 2022
@ghost
Copy link

ghost commented Aug 23, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This changes how the JIT matches method names and signatures for method
sets (e.g. JitDisasm). The new behavior is the following:

  • The matching behavior is always string based.
  • The JitDisasm string is a space-separated list of patterns. Patterns
    can be quoted if they need to include spaces, and can arbitrarily
    contain both * (match any characters) and ? (match any 1 character).
  • If the pattern contains a : character, the string matched against is
    prefixed by "ClassName:".
  • If the pattern contains a ( character then the string matched against
    ends with its signature in the usual format.
  • Type names do not include namespaces (might change this, the EE side is not giving us the namespace in release builds)
  • The signature matched against does not contain return types or
    "is-instance" specifiers.

For example, *:WriteLine is equivalent to WriteLine and will match all
methods named WriteLine. Console:WriteLine will match all overloads of
WriteLine defined in a class called Console. Console:WriteLine(String)
and WriteLine(String) will match only the string overload. Overall, the string
you get at the top of JitDisasm and in JitDisasmSummary is exactly the
string we are matching against now (minus the return type suffix).

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

@dotnet/jit-contrib, any concerns about this change in behavior of the matching?
Also cc @stephentoub

int numArgs = sigInfo != nullptr ? sigInfo->numArgs : -1;
// names[hasClassName][hasSignature]
char* names[2][2] = {};
Compiler* comp = JitTls::GetCompiler();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we worried about this? It should be a single call in release builds and only when JitDisasm is non-empty.
We need it to allocate memory in rare cases where the string we are matching against is longer than 1024 characters.

Co-authored-by: Aaron Robinson <arobins@microsoft.com>
@BruceForstall
Copy link
Member

any concerns about this change in behavior of the matching?

Can you give examples of the change in behavior? It's not clear to me from the description, with the exception of supporting matching arguments, which is a great improvement.

  • If the pattern contains a : character, the string matched against is prefixed by "ClassName:".

What about the ":return_type" and ":this" suffixes we currently use?

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Aug 24, 2022

Can you give examples of the change in behavior? It's not clear to me from the description, with the exception of supporting matching arguments, which is a great improvement.

To be honest it is not clear to me what the full extent of the existing parser's behavior is. One difference is that it allows ClassName::MethodName (two colons) -- we now only allow one colon. As you say, another one is in the matching of parameters. A final one is the treatment of wildcards, which I never really understood.

At least the way I normally have used method sets (ClassName:MethodName and MethodName, the occassional *) the behavior should not have changed, but I'm not sure if other people on the team are using some more advanced features.

What about the ":return_type" and ":this" suffixes we currently use?

It is not included as part of the signature that the matching works against.

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Aug 24, 2022

Do those changes allow the filter to be used with generic types and methods?

@jakobbotsch
Copy link
Member Author

Do those changes allow the filter to be used with generic types and methods?

Not in release builds. It probably needs some work on the JIT-EE interface so it won't happen as part of this change.

Like signatures, format the instantiations in the JIT so we get
consistent type names in the entire string.
While there as a comment that this is expensive, in my measuring of
crossgenning SPC I could not see any performance difference.
if (method.Instantiation.Length != 0)
{
sig->sigInst.methInst = GetJitInstantiation(method.Instantiation);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved this out of the method.IsIntrinsic path to unconditionally fill it, since it is used by JIT for JitDisasm now and making assumptions on the exact scenarios this is used by the JIT is a bit icky.

If we are worried about performance (I did not see any measurable difference in crossgen2 times of SPC) then I think we should remove CORINFO_SIG_INFO::sigInst and put this on the same plan as ICorStaticInfo::getTypeInstantiationArgument. But I think we can wait and see the crossgen2 TP graphs in PowerBI.

Comment on lines 6463 to +6490
void MethodContext::recGetTypeInstantiationArgument(CORINFO_CLASS_HANDLE cls,
CORINFO_CLASS_HANDLE result,
unsigned index)
unsigned index,
CORINFO_CLASS_HANDLE result)
{
if (GetTypeInstantiationArgument == nullptr)
GetTypeInstantiationArgument = new LightWeightMap<DWORDLONG, DWORDLONG>();
GetTypeInstantiationArgument = new LightWeightMap<DLD, DWORDLONG>();

DWORDLONG key = CastHandle(cls);
DLD key;
ZeroMemory(&key, sizeof(key));
key.A = CastHandle(cls);
key.B = index;
DWORDLONG value = CastHandle(result);
GetTypeInstantiationArgument->Add(key, value);
DEBUG_REC(dmpGetTypeInstantiationArgument(key, value));
}
void MethodContext::dmpGetTypeInstantiationArgument(DWORDLONG key, DWORDLONG value)
void MethodContext::dmpGetTypeInstantiationArgument(DLD key, DWORDLONG value)
{
printf("GetTypeInstantiationArgument key - classNonNull-%llu, value NonNull-%llu", key, value);
printf("GetTypeInstantiationArgument key - classNonNull-%llu, index-%u, value NonNull-%llu", key.A, key.B, value);
GetTypeInstantiationArgument->Unlock();
}
CORINFO_CLASS_HANDLE MethodContext::repGetTypeInstantiationArgument(CORINFO_CLASS_HANDLE cls, unsigned index)
{
CORINFO_CLASS_HANDLE result = nullptr;

DWORDLONG key = CastHandle(cls);
DLD key;
ZeroMemory(&key, sizeof(key));
key.A = CastHandle(cls);
key.B = index;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BruceForstall annoyingly, I suppose this bug fix means I should bump the JIT-EE GUID in this change even though there are no changes to the JIT-EE interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True; any time the SPMI binary format changes, we also need a JIT-EE GUID change.

@jakobbotsch
Copy link
Member Author

@dotnet/jit-contrib This should be ready now. I have updated the original text of the PR, so please read it and give me your thoughts on that. This behavior change probably being the largest thing:

There is one significant change in behavior here, which is that I have removed the special case that allows matching class names without namespaces. In particular, today Console:WriteLine would match all overloads of System.Console.WriteLine, while after this change it will not match. However, with generalized wild cards the replacement is simple in *Console:WriteLine.

It would be nice to get a review on this today so I can hopefully merge this on the weekend and avoid disruption due to the JIT-EE GUID change.

CONFIG_INTEGER(JitDumpTier0, W("JitDumpTier0"), 1) // Dump tier0 requests
CONFIG_INTEGER(JitDumpAtOSROffset, W("JitDumpAtOSROffset"), -1) // Only dump OSR requests for this offset
CONFIG_INTEGER(JitDumpInlinePhases, W("JitDumpInlinePhases"), 1) // Dump inline compiler phases
CONFIG_INTEGER(JitDisasmSpilled, W("JitDisasmSpilled"), 0) // Display native code when any register spilling occurs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was shoehorned into JitDisasm by specifying "SPILLED" before. I have introduced a separate variable for it instead.

@jakobbotsch jakobbotsch marked this pull request as ready for review September 2, 2022 20:26
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great!

I only pointed out a couple nits, but feel free to ignore.

src/coreclr/inc/corinfo.h Outdated Show resolved Hide resolved
docs/design/coreclr/jit/viewing-jit-dumps.md Outdated Show resolved Hide resolved
Co-authored-by: Bruce Forstall <brucefo@microsoft.com>
@jakobbotsch
Copy link
Member Author

superpmi failures are expected due to the JIT-EE GUID change.

@jakobbotsch jakobbotsch merged commit 243cf9f into dotnet:main Sep 4, 2022
@jakobbotsch jakobbotsch deleted the jit-method-name-string-match branch September 4, 2022 15:58
@jakobbotsch jakobbotsch changed the title JIT: Simplify JitDisasm matching behavior JIT: Simplify JitDisasm matching behavior and support namespaces/generics in release Sep 7, 2022
@jakobbotsch
Copy link
Member Author

/backport to release/7.0

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2022

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3014516413

@github-actions
Copy link
Contributor

github-actions bot commented Sep 8, 2022

@jakobbotsch backporting to release/7.0 failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Remove SPILLED JitDisasm in favor of JitDisasmSpilled
Applying: JIT: Simplify MethodSet matching behavior
error: sha1 information is lacking or useless (src/coreclr/jit/compiler.cpp).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 JIT: Simplify MethodSet matching behavior
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

jakobbotsch added a commit to jakobbotsch/runtime that referenced this pull request Sep 8, 2022
This changes how the JIT matches method names and signatures for method
sets (e.g. JitDisasm). It also starts printing method instantiations for full method names
and makes references to types consistent in generic instantiations and the signature.
In addition it starts supporting generic instantiations in release too.
To do this, most of the type printing is moved to the JIT, which also aligns the output
between crossgen2 and the VM (there were subtle differences here, like spaces between generic type arguments).
More importantly, we (for the most part) stop relying on JIT-EE methods that are documented to only be for debug purposes.

The new behavior of the matching is the following:

* The matching behavior is always string based.
* The JitDisasm string is a space-separated list of patterns. Patterns can arbitrarily
   contain both '*' (match any characters) and '?' (match any 1 character).
* The string matched against depends on characters in the pattern:
    + If the pattern contains a ':' character, the string matched against is prefixed by the class name and a colon
    + If the pattern contains a '(' character, the string matched against is suffixed by the signature
    + If the class name (part before colon) contains a '[', the class contains its generic instantiation
    + If the method name (part between colon and '(') contains a '[', the method contains its generic instantiation

For example, consider

```
namespace MyNamespace
{
    public class C<T1, T2>
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        public void M<T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
        {
        }
    }
}

new C<sbyte, string>().M<int, object>(default, default, default, default); // compilation 1
new C<int, int>().M<int, int>(default, default, default, default); // compilation 2
```
The full strings are:
Before the change:

```
MyNamespace.C`2[SByte,__Canon][System.SByte,System.__Canon]:M(byte,System.__Canon,int,System.__Canon)
MyNamespace.C`2[Int32,Int32][System.Int32,System.Int32]:M(int,int,int,int)
```
Notice no method instantiation and the double class instantiation, which seems like an EE bug. Also two different names are used for sbyte: System.SByte and byte.

After the change the strings are:

```
MyNamespace.C`2[byte,System.__Canon]:M[int,System.__Canon](byte,System.__Canon,int,System.__Canon)
MyNamespace.C`2[int,int]:M[int,int](int,int,int,int)
```

The following strings will match both compilations:

```
M
*C`2:M
*C`2[*]:M[*](*)
MyNamespace.C`2:M
```

The following will match only the first one:

```
M[int,*Canon]
MyNamespace.C`2[byte,*]:M
M(*Canon)
```

There is one significant change in behavior here, which is that I have removed the special case that allows matching class names without namespaces. In particular, today Console:WriteLine would match all overloads of System.Console.WriteLine, while after this change it will not match. However, with generalized wild cards the replacement is simple in *Console:WriteLine.
carlossanlop pushed a commit that referenced this pull request Sep 8, 2022
…amespaces/generics in release (#75260)

* JIT: Simplify JitDisasm matching behavior  (#74430)

This changes how the JIT matches method names and signatures for method
sets (e.g. JitDisasm). It also starts printing method instantiations for full method names
and makes references to types consistent in generic instantiations and the signature.
In addition it starts supporting generic instantiations in release too.
To do this, most of the type printing is moved to the JIT, which also aligns the output
between crossgen2 and the VM (there were subtle differences here, like spaces between generic type arguments).
More importantly, we (for the most part) stop relying on JIT-EE methods that are documented to only be for debug purposes.

The new behavior of the matching is the following:

* The matching behavior is always string based.
* The JitDisasm string is a space-separated list of patterns. Patterns can arbitrarily
   contain both '*' (match any characters) and '?' (match any 1 character).
* The string matched against depends on characters in the pattern:
    + If the pattern contains a ':' character, the string matched against is prefixed by the class name and a colon
    + If the pattern contains a '(' character, the string matched against is suffixed by the signature
    + If the class name (part before colon) contains a '[', the class contains its generic instantiation
    + If the method name (part between colon and '(') contains a '[', the method contains its generic instantiation

For example, consider

```
namespace MyNamespace
{
    public class C<T1, T2>
    {
        [MethodImpl(MethodImplOptions.NoInlining)]
        public void M<T3, T4>(T1 arg1, T2 arg2, T3 arg3, T4 arg4)
        {
        }
    }
}

new C<sbyte, string>().M<int, object>(default, default, default, default); // compilation 1
new C<int, int>().M<int, int>(default, default, default, default); // compilation 2
```
The full strings are:
Before the change:

```
MyNamespace.C`2[SByte,__Canon][System.SByte,System.__Canon]:M(byte,System.__Canon,int,System.__Canon)
MyNamespace.C`2[Int32,Int32][System.Int32,System.Int32]:M(int,int,int,int)
```
Notice no method instantiation and the double class instantiation, which seems like an EE bug. Also two different names are used for sbyte: System.SByte and byte.

After the change the strings are:

```
MyNamespace.C`2[byte,System.__Canon]:M[int,System.__Canon](byte,System.__Canon,int,System.__Canon)
MyNamespace.C`2[int,int]:M[int,int](int,int,int,int)
```

The following strings will match both compilations:

```
M
*C`2:M
*C`2[*]:M[*](*)
MyNamespace.C`2:M
```

The following will match only the first one:

```
M[int,*Canon]
MyNamespace.C`2[byte,*]:M
M(*Canon)
```

There is one significant change in behavior here, which is that I have removed the special case that allows matching class names without namespaces. In particular, today Console:WriteLine would match all overloads of System.Console.WriteLine, while after this change it will not match. However, with generalized wild cards the replacement is simple in *Console:WriteLine.

* Update JIT-EE GUID

Avoid using the same JIT-EE GUID as the main branch.
@ghost ghost locked as resolved and limited conversation to collaborators Oct 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants