Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package shading draft #242

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Conversation

mhutch
Copy link
Member

@mhutch mhutch commented Aug 23, 2021

Ready for review!

FYI the diagrams don't show up correctly in the GitHub PR "files changed" but you can see them if you view the document in the branch. (edit: GitHub now supports Mermaid natively so I converted the diagrams to use that, but earlier commits still have PNGs with embedded source)

Behavior only, intro and scenarios to be added
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved
accepted/2021/experimental-package-shading.md Outdated Show resolved Hide resolved

### Package consumers

Shading is superficially invisible to consumers of a package that has shaded dependencies. They may notice the shaded assets in the build logs or output directory, and their app size may increase, but they do not need to be aware that shading exists. From the consumer's perspective, a shaded dependency is no different than any other private asset.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will shaded packages appear with dotnet list package and its derivates?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I don't think shading needs to affect those commands, but I hadn't considered it.

AFAICT the use cases for that are finding package updates and vulnerable versions, and showing transitive package references. I don't think whether a package reference is shaded is relevant to either of those, and I can't think of a scenario where listing shading information at the CLI would be useful.


Renaming a package's assets so that multiple renamed versions can be used at runtime is not something that can be performed safely. Some assets may inherently have singleton behavior, and assets may embed the original name in ways that cannot automatically be detected and updated, for example when using reflection to load an assembly by name.

The shading tools will detect known unsafe patterns and warn when assets cannot be renamed safely, for example calls to [`AssemblyLoadContext.Load`](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.loader.assemblyloadcontext.load?view=net-5.0) with values that cannot be determined statically. However, the shading tools will not be able to detect all problematic cases. Authors of packages with shaded dependencies are expected to test their package thoroughly and verify that it works correctly with shaded dependencies.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that NuGet should do the static analysis ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps with the analysis the IL linker does to figure out Type.GetType(string) usages so in theory those could be rewritten to use the shaded identity.


### Impact on package consumers

When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any approximate percentage of NU1605, NU1107 restore errors from customers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NU1605 and NU1107 together are about 2/3 of install errors in VS.


### As a package author, when should I shade a dependency?

> TODO
Copy link

@dominoFire dominoFire Sep 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about multitargeting scenarios ? Would you shade for each TFM ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the shading will affect all assemblies in the package.


Retargeting a dependency involves finding references in the any references in the package and its assets so that they refer to the shaded version of that dependency. For example, an assembly reference in an assembly asset's metadata table would be updated to reference the shaded version.

The shaded name is a mangled name specific to the shading context and is designed such that the shaded package's assets do not collide at runtime with assets from other shaded copies and from the original unshaded package. The mangling is an implementation detail, and may change in future versions of the .NET SDK. Developers should not depend on the mangling format or specific mangled IDs. However, for a given version of the SDK, the mangled ID is deterministic to allow for deterministic builds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this might be problematic for cases where the assembly name is used in an external format, e.g. the linker's XML descriptor: https://github.com/mono/linker/blob/7793a2367790bfbd8418e725e8b9751a11278cfb/docs/data-formats.md#xml-examples

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for sure. We can build in support for some things (XAML files, linker descriptors, etc.) but supporting third party formats would necessitate an extensibility system like the Maven shade plugin's resource transformers.


Renaming a package's assets so that multiple renamed versions can be used at runtime is not something that can be performed safely. Some assets may inherently have singleton behavior, and assets may embed the original name in ways that cannot automatically be detected and updated, for example when using reflection to load an assembly by name.

The shading tools will detect known unsafe patterns and warn when assets cannot be renamed safely, for example calls to [`AssemblyLoadContext.Load`](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.loader.assemblyloadcontext.load?view=net-5.0) with values that cannot be determined statically. However, the shading tools will not be able to detect all problematic cases. Authors of packages with shaded dependencies are expected to test their package thoroughly and verify that it works correctly with shaded dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlaps with the analysis the IL linker does to figure out Type.GetType(string) usages so in theory those could be rewritten to use the shaded identity.

@mhutch mhutch force-pushed the experimental-package-shading branch from 37011c6 to 61744f3 Compare October 6, 2021 00:49
@mhutch mhutch force-pushed the experimental-package-shading branch from 61744f3 to 3be5566 Compare October 6, 2021 00:51

## Background

### Dependency unification
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate at least a short note that this is distinct from the unification done by ResolveAssemblyReferences at build time, after restore. It will however influence the work that needs to be done there.


### Impact on package consumers

When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted more context on these:

Suggested change
When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.
When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand ([`NU1605` "detected package downgrade"](https://docs.microsoft.com/nuget/reference/errors-and-warnings/nu1605), [`NU1107` "version conflict detected"](https://docs.microsoft.com/nuget/reference/errors-and-warnings/nu1107)). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.


Suppose that `Foo v2.0` has major API changes since `Foo v1.0` and has both added new APIs and removed old APIs. The author of the project has referenced `Foo v2.0` directly, because they want to use its cool new APIs, but they already depend on `Bar`, which internally uses the old `Foo v1.0` APIs that were removed in `Foo v2.0`. The author of the package `Bar` has not migrated to `Foo v2.0` because they depend on an advanced option from `Foo v1.0` that is not exposed in the new `Foo v2.0` APIs.

If NuGet has sufficient information to determine that these dependencies are not compatible, then it will not unify them, and the project will not restore. On the other hand, if it *does* unify `Foo v1.0` and `Foo v2.0`, and the project uses `Bar` APIs that depend on the Foo APIs that were removed in `Foo v2.0`, then the project will have will get run-time errors such as `MissingMethodException` and `TypeLoadException`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a definition for "sufficient information" here? Is it semver?


The package author can be confident that when their library is used in an app, it will use the bundled copy of the shaded dependencies, and will not be affected by any other versions of those dependencies used elsewhere in the app.

> **NOTE**: If the shaded package is also a transitive reference, i.e a dependency of another of the project's package references, that package reference must be shaded as well. The project also should not expose any references to types from the shaded dependency, as consumers will be unable to resolve them. These caveats are explained in more detail later in this document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the "public API can't expose types from the dependency" caveat earlier but I don't know if there's a great place to put it. Maybe after the producer-side shading intro diagram?


The package author can be confident that when their library is used in an app, it will use the bundled copy of the shaded dependencies, and will not be affected by any other versions of those dependencies used elsewhere in the app.

> **NOTE**: If the shaded package is also a transitive reference, i.e a dependency of another of the project's package references, that package reference must be shaded as well. The project also should not expose any references to types from the shaded dependency, as consumers will be unable to resolve them. These caveats are explained in more detail later in this document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you're thinking with the transitive thing? It sounds like it imposes requirements on the referenced packages.

Suggested change
> **NOTE**: If the shaded package is also a transitive reference, i.e a dependency of another of the project's package references, that package reference must be shaded as well. The project also should not expose any references to types from the shaded dependency, as consumers will be unable to resolve them. These caveats are explained in more detail later in this document.
> **NOTE**: If the shaded package is also a transitive reference, i.e a dependency of another of the project's package references, that package must shade it as well. The project also should not expose any references to types from the shaded dependency, as consumers will be unable to resolve them. These caveats are explained in more detail later in this document.


### How does this compare to Maven shading?

The Maven shade plugin shades the dependencies specified by [explicit includes](https://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html), or shades all dependencies except those explicitly excluded. The include and exclude declarations permit glob patterns but do not take transitivity into account. There's a an option (`promoteTransitiveDependencies`) to promote transitive dependencies of shaded dependencies to direct dependencies, without which they are omitted. This NuGet proposal supports only explicit shading of direct dependencies, to avoid the size increase caused by unnecessary/accidental shading. It always promotes transitive dependencies as this is required for correct behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Maven shade plugin shades the dependencies specified by [explicit includes](https://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html), or shades all dependencies except those explicitly excluded. The include and exclude declarations permit glob patterns but do not take transitivity into account. There's a an option (`promoteTransitiveDependencies`) to promote transitive dependencies of shaded dependencies to direct dependencies, without which they are omitted. This NuGet proposal supports only explicit shading of direct dependencies, to avoid the size increase caused by unnecessary/accidental shading. It always promotes transitive dependencies as this is required for correct behavior.
The Maven shade plugin shades the dependencies specified by [explicit includes](https://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html), or shades all dependencies except those explicitly excluded. The include and exclude declarations permit glob patterns but do not take transitivity into account. There's an option (`promoteTransitiveDependencies`) to promote transitive dependencies of shaded dependencies to direct dependencies, without which they are omitted. This NuGet proposal supports only explicit shading of direct dependencies, to avoid the size increase caused by unnecessary/accidental shading. It always promotes transitive dependencies as this is required for correct behavior.


### How does this compare to Maven shading?

The Maven shade plugin shades the dependencies specified by [explicit includes](https://maven.apache.org/plugins/maven-shade-plugin/examples/includes-excludes.html), or shades all dependencies except those explicitly excluded. The include and exclude declarations permit glob patterns but do not take transitivity into account. There's a an option (`promoteTransitiveDependencies`) to promote transitive dependencies of shaded dependencies to direct dependencies, without which they are omitted. This NuGet proposal supports only explicit shading of direct dependencies, to avoid the size increase caused by unnecessary/accidental shading. It always promotes transitive dependencies as this is required for correct behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This NuGet proposal supports only explicit shading of direct dependencies

While this is true, since it's exposed as an item in MSBuild one could easily get the inverted Maven behavior with an ItemDefinitionGroup


### Why only allow shading private references?

We have made an explicit scoping decision to disallow shading of public references in the initial version of this feature. Shading of public references leads to scenarios where a project has references to multiple types with the same fully qualified names, differing only by assembly name. While this could be handled using [`extern alias`](https://docs.microsoft.com/en-us/dotnet/csharp/language-reference/keywords/extern-alias), it massively complicates the experience for consumers of the library, and would need better support from tooling before being made mainstream.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 100% on board with this (if it's contentious and you need any backup)


This proposal is also largely about how shading is specified in the project file, how it fits into the restore/build/pack pipeline, and how it affects the dependency graph. ILMerge/il-repack do not by themselves answer any of those.

### How does this compare to Maven shading?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other other-ecosystem system I was thinking of while reading this was Rust's Cargo. It looks like it (effectively) always shades each distinct version and provides tooling to help audit this happening: https://doc.rust-lang.org/cargo/reference/resolver.html#version-incompatibility-hazards

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I ran across that while researching this and see it as a validation of the consumer side mangling approach. It's definitely a very similar model to where I would see this going long term.

I'm not certain that such a strict interpretation of semver would work for reasoning about compatibility of NuGet packages in general, and I would like to see ways to force things to unify (e.g. to reduce size of a wasm app, when you know you're using a compatible subset of two incompatible versions) and to force things to separate (e,g. when you know there was an accidental breaking change in a minor version). But that's a problem for the future :)


### As a package author, when should I shade a dependency?

You should only shade a dependency if you are reasonably certain that it is likely to conflict with other incompatible versions of that dependency in apps that consume your package. Producer-side shading comes at a cost to the consuming application: the download/install/startup/memory cost of your shaded dependencies will no longer be shared with other copies of those dependencies, even if they are identical. By shading a dependency, you are also taking on responsibility for its bugs and security issues, as your consumers can no longer update it directly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like some motivating examples with considerations walked through.

For example, your library has configuration expressed in some format (JSON, YAML), and internally uses a library to parse that format

  1. How do you assess the likelihood of breakage vs. the benefits of unification?
  2. For instance, Newtonsoft.Json has a high version number, indicating "many breaking changes have been made"--but I bet you'd almost always be safe for a simple config json parser taking any version, and applications would be understandably upset if they depended on 8 copies of it from various references
  3. When would you choose to add shading rather than update to a newer version of a package, if a new version of that package breaks your library?

As GitHub now natively supports inline Mermaid diagrams in
Markdown files, get rid of the workaround where prior to
commit Mermaid diagrams were converted to PNGs with source
embedded in metadata

**The major downside to producer-side shading** is that the shaded assemblies are included in the app even when they are identical or completely compatible with other shaded or non-shaded copies of the same dependency. This forced redundancy is a problem for scenarios where app size and memory use is highly important such as mobile apps and client-side web apps, **so producer side shading is not intended to ever become non-experimental or recommended for general use**.

The preferred long-term solution is consumer-side shading, where shading is performed holistically for an app, where it shade packages only when necessary, and deduplicate compatible and identical shaded assemblies.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The preferred long-term solution is consumer-side shading, where shading is performed holistically for an app, where it shade packages only when necessary, and deduplicate compatible and identical shaded assemblies.
The preferred long-term solution is consumer-side shading, where shading is performed holistically for an app, where it shades packages only when necessary, and deduplicates compatible and identical shaded assemblies.


The preferred long-term solution is consumer-side shading, where shading is performed holistically for an app, where it shade packages only when necessary, and deduplicate compatible and identical shaded assemblies.

Producer-side shading uses same building blocks as consumer-side shading, but is substantially smaller in scope. It also has the advantage that consumers of a producer-side shading package do not need tooling that understands shading.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Producer-side shading uses same building blocks as consumer-side shading, but is substantially smaller in scope. It also has the advantage that consumers of a producer-side shading package do not need tooling that understands shading.
Producer-side shading uses the same building blocks as consumer-side shading, but is substantially smaller in scope. It also has the advantage that people who reference a package with producer-side shading do not need tooling that understands shading.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this sentence several times to understand, I think because it contains the word "consumer" when talking about produce-side shading.

> **NOTE** Unification is independent for each project. All the direct and transitive dependencies for each project must be unified in the context of that project, but the result of unification is specific to that set of dependencies. Another project in the same solution with different dependencies and/or dependency versions may unify to different versions of the same dependencies.

NuGet performs unification based on the dependency versions defined in the referencing packages or projects. Dependency versions may be exact, an explicit range, or a simple version that implicitly means "equal to or greater than". In the above example, the `v1.0` dependency means `>= v1.0`, so it is compatible with `v2.0`, and NuGet can unify them:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we need a whole section talking about the difference between package versions and assembly versions. Some packages don't rev their assembly versions at all and its worth noting that the file system only supports a single file in each directory with a unique name which reflects how NuGet unifies the dependency graph. I don't want people to think that its bad design, its just how it is.


### Impact on package consumers

When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.
When different versions of a dependency cannot be unified, NuGet restore fails with errors that can be difficult to resolve (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.

Its very subjective to say the errors are difficult to understand. In most case the error explicitly states that A depends on B1 and C depends on B2 so you're getting downgraded or whatever. The difficulty in my opinion is usually in resolving the conflicts. That's also subjective I suppose


When different versions of a dependency cannot be unified, NuGet restore fails with errors that are often difficult to understand (`NU1605`, `NU1107`). These errors currently make up a majority of NuGet restore errors. In some cases unification is possible but not automatic, in which case a developer may opt in by adding a direct reference to the dependency, but this is not straightforward.

Even when dependencies are unified, the result may not be correct. The versions of the dependencies may make them appear to be compatible when they are not. This can result in compiler errors, runtime errors, and behavioral differences.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Even when dependencies are unified, the result may not be correct. The versions of the dependencies may make them appear to be compatible when they are not. This can result in compiler errors, runtime errors, and behavioral differences.
Even when dependencies are unified, the result could still result in compiler errors, runtime errors, or behavioral differences. Since restore and build succeeded, the versions of the dependencies may make them appear to be compatible when they might not. This is because only the current source is checked not the code paths that dependent assemblies execute because these dependencies were compiled and tested against different versions potentially.


Producer-side package shading is an experimental feature that allows a NuGet package author to "shade" a dependency: embed a renamed copy of it in their package. This ensures that consumers of the package get the exact same version that the package author intended, regardless of any other direct or indirect references to that dependency.

**The major downside to producer-side shading** is that the shaded assemblies are included in the app even when they are identical or completely compatible with other shaded or non-shaded copies of the same dependency. This forced redundancy is a problem for scenarios where app size and memory use is highly important such as mobile apps and client-side web apps, **so producer side shading is not intended to ever become non-experimental or recommended for general use**.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a note in here that in some cases, it's impossible for a package author to know ahead of time that their package could cause conflicts. Sometimes it's the combination of random packages that cause conflicts, so it doesn't really make sense for the producer to have to handle it.

* **Package consumers may not be able to install or update** a reference to a package if it depends on a package that their project already depends on but the versions are different and cannot be unified.
* **Unification issues may appear to be bugs in the package itself** if unification causes one of the package's dependencies to have a different version than it specified and it is not compatible with that version.

Some package authors go out of their way to avoid dependencies so that they will not cause unification problems for their consumers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've certainly had to face this before, sometimes taking a dependency on something just isn't worth it


Some package authors go out of their way to avoid dependencies so that they will not cause unification problems for their consumers.

## Shading
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put a note in about doing just consumer-side assembly shading only instead of shading an entire package?


This is **explicitly disallowed** at this time for the same reason as the previous example: the package project's direct and transitive references to `Bar` must unify. The only way to do this is by shading `Foo` so that its references to `Bar` can be updated to target the shaded version, and we require the developer do this explicitly so that all shading operations are clear and intentional.

## FAQ
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the right place for this but we should address the fact that re-writing an assembly is going to strip its authenticode signature and change its public key token. This could be a problem for high security infrastructures.

@foxyseta
Copy link

foxyseta commented May 2, 2023

Sorry for necroposting. Is anyone still working on this?

@SimonCropp
Copy link

@foxyseta here is my effort at an implementation https://github.com/getsentry/dotnet-assembly-alias

@foxyseta
Copy link

foxyseta commented May 2, 2023

Thank you! This definitely sounds like a good tool to use while waiting for shading (or similar functionalities) to be natively integrated into NuGet. Hopefully, I'll have time to take a deeper look into it and drop by.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants