Skip to content

Expand host_runtime_contract to get assembly names and resolve their paths #100503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

steveisok
Copy link
Member

@steveisok steveisok commented Apr 1, 2024

This change continues down the path of making the host_runtime_contract structure the way the runtime can query information about the host.

  • get_assemblies: function that gets the list of assemblies (name+extension) known by the host.
  • destroy_assemblies: function that cleans up the what was returned by get_assemblies when you're done with it
  • resolve_assembly_to_path: function that when given an assembly name (no extension) will return the full path known by the host.
  • entry_assembly: string containing the full path to the designated entry assembly
     
    Contributes to Allow modification of entry assembly and probing after startup hooks are run #99883

Copy link
Contributor

Tagging subscribers to this area: @vitek-karas, @agocke, @VSadov
See info in area-owners.md if you want to be subscribed.

@steveisok
Copy link
Member Author

@elinor-fung a couple of items I'm wondering about:

  • Should the strings be utf-16 instead of utf-8 starting in the host? I see one of the roles of HostInformation is to transform strings, but I'm starting to wonder if utf-16 makes more sense because the runtime can just use the contract.
  • What are the rules around memory between the host and the runtime?

char** dirs;
};

struct probing_path_properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this rather be a callback?

A performance flaw of the current system is that the host has to push paths to the runtime upfront. It introduces a lot of unnecessary copying and parsing. If we are redoing how this works, we should address this performance flaw. Pushing a list of full paths and base-names maybe a bit more efficient than the current system, but it still leaves a lot on the table.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought was no callback, since the runtime requires all the paths on startup and would immediately have to call it. However, avoiding the extra copying would definitely be better than avoiding the extra call back to the host and would answer the question of the memory between host and runtime (for the properties, it was intentionally all allocated by the runtime).

Copy link
Member

@jkotas jkotas Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime does not fundamentally require all paths on startup. All that the runtime does with the paths during startup is to store them in a hashtable with simple name as the key.

Maintaining this hashtable on the host side would open a number of opportunities for memory optimizations. For example, we do not need to materialize all full paths eagerly. Instead, we can remember a few common prefixes and materialize the full path from the common prefix and suffix on demand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this hashtable

This hashtable needs to be case-insensitive (assembly binding is case-insensitive). Case-insensitive hashing and comparisons are tricky. The host does not seem to have the infrastructure for it.

So maybe the best design is to push the simple assembly names from the host eagerly, let the runtime maintain the hash table, but then use a callback to resolve the simple name into a full path. It would avoid potential mismatches in case-insensitive behavior for non-ascii file names, etc.

struct host_runtime_contract
{
size_t size;

// Context for the contract. Pass to functions taking a contract context.
void* context;

char* entry_assembly;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should any new things be added to the end of this structure so that it grows in backward compatible way?

});

// Add to the TPA.
tpa_list << tmp;
}
}

tpa_items->assembly_count = static_cast<uint32_t>(name_path_map.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a number of apps out there that host coreclr.dll directly using TRUSTED_PLATFORM_ASSEMBLIES and other key/value pairs. We had the direct coreclr.dll hosting documented at learning.microsoft.com for a while. There are scenarios where hosting via host policy is too inflexible.

Are we going to break the existing coreclr.dll hosting model based on TRUSTED_PLATFORM_ASSEMBLIES and other key/value pairs or are we keeping it around?

If we are keeping it around, we may want to keep corerun intact so that we get some test coverage for it. corerun is our non-shipping test-only host and it is fine for it to leave some perf on table.

If we are going to break it, it will be a significant breaking change that will require a lot of work to push through.

Copy link
Member

@elinor-fung elinor-fung Apr 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to break the existing coreclr.dll hosting model based on TRUSTED_PLATFORM_ASSEMBLIES and other key/value pairs or are we keeping it around?

We do not plan to break it. The runtime should continue to respect the key/value pairs (but hostpolicy / default hosting won't use it) and getting the properties (AppContext or hostfxr) should continue to work (but on demand).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not plan to break it.

Ok, we should think about having some test coverage. corerun test host would be a good place for it - keeping it intact should be the easiest option.

@jkotas
Copy link
Member

jkotas commented Apr 2, 2024

Contributes to #99883

As we have discussed before, it would be useful to agree on the design of the public APIs that we are going to introduce to address the scenario first.

If this change is going to fix some of the performance issues of the current hosting scheme, that sounds good. I am not sure how it helps with #99883.

});

// Add to the TPA.
tpa_list << tmp;
}
}

tpa_items->assembly_count = static_cast<uint32_t>(name_path_map.size());
tpa_items->assembly_filepaths = new char*[tpa_items->assembly_count];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, there is a lot of apps and libraries that depend on the existing documented key-value pairs in appcontext:

Are we going to break these or are these key/value pairs going to continue to work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is not to break anything. I gather from @elinor-fung that the host_runtime_contract usage is intended to be gradual.

char** dirs;
};

struct probing_path_properties
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: I would not use "probing". One of the goals of the .NET Core redesign of hosting was to avoid spurious probing for files on disk. The current implementation is a bit half-way-there, but realistically that's more of a bug than feature. TPA is already on the no-probing plan. And native and resource files are actually also no-probing inside hostpolicy, we just condense that information into the list of paths when calling into CoreCLR. Now I don't know if this is the time to change it (it can be a breaking change for some apps), but at least in terms of naming I would like to be clear that "We don't want to probe in the runtime" (note that hostpolicy does some amount of probing, but it does it so that the runtime doesn't have to).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I anticipated having to rename things :-).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add here that we have an existing structure called probe_paths_t. We have to make sure that whatever we end up naming our probing paths stuff is clear enough to be differentiated from probe_paths_t. To not make the code more complicated to understand :)


struct probing_path_properties
{
trusted_platform_assemblies trusted_platform_assemblies;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another naming nit: If we are redoing how this works, we should drop "trusted" and "platform" from the names of new things.

This name is left-over from Silverlight. There is nothing trusted or platform about assemblies in this list. If we go with the callback, a good name can be resolve_assembly_to_path - the callback would be native equivalent of existing managed API https://learn.microsoft.com/dotnet/api/system.runtime.loader.assemblydependencyresolver.resolveassemblytopath .

output->append(item.second.resolved_path);
output->push_back(PATH_SEPARATOR);
}
// Now that we have the complete list, store it in the host_runtime_contract and as a ; delimted string.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to store TPA list as an array instead of a string with delimiter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, we want to either stick to the string or use both to keep backwards compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I understand the feedback correctly, both in the runtime. What we'll angle to do is make the host by default use the callback method as opposed to passing in TRUSTED_PLATFORM_ASSEMBLIES

Copy link
Member

@jkotas jkotas Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the new model, the long TRUSTED_PLATFORM_ASSEMBLIES string with all paths concatenated should be only created on demand when the user code asks for it; and we should make sure that it does not happen in any of our mainline scenarios.

…m assembly names and the ability to resolve their paths
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@steveisok steveisok reopened this May 17, 2024
@steveisok steveisok changed the title Expand host_runtime_contract to include the entry assembly and probing paths Expand host_runtime_contract to get assembly names and resolve their paths Jun 12, 2024
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants