Skip to content

Fix crossgen2 ArgIterator for SystemV Amd64 Unix ABI #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Dec 5, 2019

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Dec 4, 2019

The ArgIterator was missing proper handling of the SystemV amd64 struct classification.
There were also several minor issues.
The state with this fix should match the native version.

To successfully compile framework with SuperILC, I had to disable System.Runtime.Serialization.Formatters for now, as we hit an assert in JIT when trying to compile it.

The ArgIterator was missing proper handling of the SystemV struct classification.
There were also several minor issues.
@janvorli janvorli added this to the 5.0 milestone Dec 4, 2019
@janvorli janvorli requested a review from trylek December 4, 2019 01:08
@janvorli janvorli self-assigned this Dec 4, 2019
private bool getSystemVAmd64PassStructInRegisterDescriptor(CORINFO_CLASS_STRUCT_* structHnd, SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR* structPassInRegDescPtr)
{
TypeDesc typeDesc = HandleToObject(structHnd);

return _systemVStructClassificator.getSystemVAmd64PassStructInRegisterDescriptor(typeDesc, structPassInRegDescPtr);
SystemVStructClassificator.GetSystemVAmd64PassStructInRegisterDescriptor(typeDesc, out *structPassInRegDescPtr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this become a problem when PR #495 that enables multi-threaded compilation is merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one way to reword this is to ask whether the below classification dictionary should become a ConcurrentDictionary or whether we rather need to put locks around its population.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also add the classification to the TypeDesc instead of having a separate cache. We could then set a reference to the classification structure using compare exchange (if we allocated it separately) or assign the passedInRegisters as the last member and use a proper memory barrier / volatile access to ensure it is observed as executed as the last by concurrent threads of execution.
That would actually be similar to the runtime where we have the classification stored in the EEClass.

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I'm not super familiar with the Linux x64 calling convention specifics but I know this has been a pre-existing deficiency of the Crossgen2 compiler - thanks for fixing this!

{
private Dictionary<TypeDesc, SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR> _classificationCache = new Dictionary<TypeDesc, SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR>();
private static Dictionary<TypeDesc, SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR> s_classificationCache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't make statics that refer to type system entities (or NodeCache nodes) in the compiler. Doing so limits reusability of the JitInterface. We already have use cases where one can have multiple type system contexts in the same process: in those cases, static cache is just another word for "memory leak". I would prefer this stays an instance field (it also avoids the problems with races Sergiy pointed out).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that means that in multi-threaded compilation, we will need to run the classification for the same type upto as many times as there are compilation threads and we would have multiple copies of the classification structure. And I would also need one more classifier for the ArgIterator that doesn't seem to be bound to any JitInterface.
So the idea of adding the classification to the TypeDesc and removing the cache seems to be much better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a compromise. We try to keep TypeDesc as small as possible because there's a lot of them.

SystemV classification is only needed on Unix and only for types that are used as method arguments. Is it worth burning sizeof(SYSTEMV_AMD64_CORINFO_STRUCT_REG_PASSING_DESCRIPTOR) (aligned) bytes on each TypeDesc (or a pointer to the struct, whichever is smaller) to avoid recomputing the same information ~9 times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for the ArgIterator, it is worse, as we create an instance for every function call to generate GCRefMap. So we would repeat the classification over and over and also possibly create a lot of memory garbage unless we can somehow get the current thread's CorInfoImpl (or propagate the current CorInfoImpl down to the place where we create the ArgIterator instance). I'm not sure how reasonable that would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we create an instance for every function call

We don't have means to pass an extra argument with the classificator to gcrefmap computation?

I'm okay adding this to TypeDesc if the math checks out. But since we only need this information for

  • value types,
  • used as arguments,
  • and on Unix only

my gut feel is that side cache is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback, I'll look into passing the classifier to the gcrefmap computation. And I am actually also running an experiment at the moment to check what would happen to the build time of all the pri1 tests and frame work assemblies together if there is no caching of that stuff at all. That might also be an option if it turns out to be causing only a minor difference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, disabling the caching completely results in virtually zero difference in compilation time of all the framework assemblies and pri1 tests. In fact, the run without the caching was slightly faster, which I assume is just noise.

With the cache:
Total build time: 5962387 msecs
Framework time:   133722 msecs
Compilation time: 4017379 msecs
Execution time:   1811279 msecs

Without the cache:
Total build time: 5944179 msecs
Framework time:   133107 msecs
Compilation time: 3981134 msecs
Execution time:   1829931 msecs

So I am going to remove the cache.

@jashook
Copy link
Contributor

jashook commented Dec 4, 2019

@janvorli I am confused, where are we currently using ArgIterator on unix. This seems like a problem as we should be missing support in many places across the runtime/jit.

@janvorli
Copy link
Member Author

janvorli commented Dec 4, 2019

@jashook the ArgIterator in crossgen2 is used only to generate GCRefMap for arguments of calls. We already had support for the classification on the jit interface before.

Running without the cache showed no difference in the total build time of
all the coreclr pri1 tests and all the runtime assemblies, so I am removing
the cache.
@trylek
Copy link
Member

trylek commented Dec 4, 2019

@jashook - please also remember that Crossgen2 still hasn't shipped, this is part of the work to get it to preview quality. CoreCLR native code in the legacy Crossgen[1] naturally has this arg iterator logic already.

@janvorli janvorli merged commit 21dc0db into dotnet:master Dec 5, 2019
@janvorli janvorli deleted the amd64unix-argiterator-fixes branch December 5, 2019 10:40
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants