Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 17 additions & 1 deletion docs/features/string-literals-data-section.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,14 +54,15 @@ The utf8 string literal encoding emit strategy emits `ldsfld` of a field in a ge

For every unique string literal, a unique internal static class is generated which:
- has name composed of `<S>` followed by a hex-encoded XXH128 hash of the string
(collisions [should not happen][xxh128] with XXH128 and so they aren't currently detected or reported, the behavior in that case is undefined),
(collisions [should not happen][xxh128] with XXH128, but if there are string literals which would result in the same XXH128 hash, a compile-time error is reported),
- is nested in the `<PrivateImplementationDetails>` type to avoid polluting the global namespace
and to avoid having to enforce name uniqueness across modules,
- has one internal static readonly `string` field which is initialized in a static constructor of the class,
- is marked `beforefieldinit` so the static constructor can be called eagerly if deemed better by the runtime for some reason.

There is also an internal static readonly `.data` field generated into `<PrivateImplementationDetails>` containing the actual bytes,
similar to [u8 string literals][u8-literals] and [constant array initializers][constant-array-init].
This field uses hex-encoded SHA-256 hash for its name and collisions are currently not reported by the compiler.
These other scenarios might also reuse the data field, e.g., the following statements could all reuse the same data field:

```cs
Expand Down Expand Up @@ -241,6 +242,9 @@ This fixup phase already exists in the compiler in `MetadataWriter.WriteInstruct
It is called from `SerializeMethodBodies` which precedes `PopulateSystemTables` call,
hence synthesizing the utf8 string classes in the former should be possible and they would be emitted in the latter.

Alternatively, we could collect string literals during binding, then before emit sort them by length and content (for determinism)
to find the ones that are over the threshold and should be emitted with this new strategy.

### Statistics

The compiler could emit an info diagnostic with useful statistics for customers to determine what threshold to set.
Expand Down Expand Up @@ -302,6 +306,18 @@ The compiler should report a diagnostic when the feature is enabled together wit
`[assembly: System.Runtime.CompilerServices.CompilationRelaxations(0)]`, i.e., string interning enabled,
because that is incompatible with the feature.

### Avoiding hash collisions

Instead of XXH128 for the type names and SHA-256 for the data field names, we could use index-based names.
- The compiler could assign names lazily based on metadata tokens which are deterministic.
If building on the current approach, that might require some refactoring,
because internal data structures in the compiler might not be ready for lazy names like that.
But it would be easier if combined with the first strategy suggested for [automatic threshold](#automatic-threshold) above,
where we would not synthesize the types until very late in the emit phase (during fixup of the metadata tokens).
- We could build on the second strategy suggested for [automatic threshold](#automatic-threshold) where we would collect string literals during binding
(and perhaps also constant arrays and u8 strings if we want to extend this support to them as well),
then before emit we would sort them by length and content and assign indices to them to be then used for the synthesized names.

<!-- links -->
[u8-literals]: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/proposals/csharp-11.0/utf8-string-literals
[constant-array-init]: https://github.com/dotnet/roslyn/pull/24621
Expand Down
3 changes: 3 additions & 0 deletions src/Compilers/CSharp/Portable/CSharpResources.resx
Original file line number Diff line number Diff line change
Expand Up @@ -8047,4 +8047,7 @@ To remove the warning, you can use /reference instead (set the Embed Interop Typ
<data name="WRN_InterceptsLocationAttributeUnsupportedSignature_Title" xml:space="preserve">
<value>'InterceptsLocationAttribute(string, int, int)' is not supported. Move to 'InterceptableLocation'-based generation of these attributes instead. (https://github.com/dotnet/roslyn/issues/72133)</value>
</data>
<data name="ERR_DataSectionStringLiteralHashCollision" xml:space="preserve">
<value>Cannot emit this string literal into the data section because it has XXHash128 collision with another string literal: {0}</value>
</data>
</root>
2 changes: 2 additions & 0 deletions src/Compilers/CSharp/Portable/Errors/ErrorCode.cs
Original file line number Diff line number Diff line change
Expand Up @@ -2360,6 +2360,8 @@ internal enum ErrorCode
ERR_ImplicitlyTypedParamsParameter = 9272,
ERR_VariableDeclarationNamedField = 9273,

ERR_DataSectionStringLiteralHashCollision = 9274,

// Note: you will need to do the following after adding errors:
// 1) Update ErrorFacts.IsBuildOnlyDiagnostic (src/Compilers/CSharp/Portable/Errors/ErrorFacts.cs)

Expand Down
4 changes: 3 additions & 1 deletion src/Compilers/CSharp/Portable/Errors/ErrorFacts.cs
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,9 @@ or ErrorCode.ERR_InterceptableMethodMustBeOrdinary
or ErrorCode.ERR_PossibleAsyncIteratorWithoutYield
or ErrorCode.ERR_PossibleAsyncIteratorWithoutYieldOrAwait
or ErrorCode.ERR_RefLocalAcrossAwait
// Update src\EditorFeatures\CSharp\LanguageServer\CSharpLspBuildOnlyDiagnostics.cs
or ErrorCode.ERR_DataSectionStringLiteralHashCollision
// Update src\Features\CSharp\Portable\Diagnostics\LanguageServer\CSharpLspBuildOnlyDiagnostics.cs
// and TestIsBuildOnlyDiagnostic in src\Compilers\CSharp\Test\Syntax\Diagnostics\DiagnosticTest.cs
// whenever new values are added here.
=> true,

Expand Down
1 change: 1 addition & 0 deletions src/Compilers/CSharp/Portable/Errors/MessageProvider.cs
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ public override void ReportDuplicateMetadataReferenceWeak(DiagnosticBag diagnost
public override int ERR_EncUpdateFailedMissingSymbol => (int)ErrorCode.ERR_EncUpdateFailedMissingSymbol;
public override int ERR_InvalidDebugInfo => (int)ErrorCode.ERR_InvalidDebugInfo;
public override int ERR_FunctionPointerTypesInAttributeNotSupported => (int)ErrorCode.ERR_FunctionPointerTypesInAttributeNotSupported;
public override int ERR_DataSectionStringLiteralHashCollision => (int)ErrorCode.ERR_DataSectionStringLiteralHashCollision;

// Generators:
public override int WRN_GeneratorFailedDuringInitialization => (int)ErrorCode.WRN_GeneratorFailedDuringInitialization;
Expand Down
5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.cs.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.de.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.es.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.fr.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.it.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.ja.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.ko.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.pl.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.pt-BR.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.ru.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions src/Compilers/CSharp/Portable/xlf/CSharpResources.tr.xlf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions src/Compilers/CSharp/Test/Emit/Emit/EmitMetadataTests.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3175,6 +3175,27 @@ .maxstack 1
""");
}

[Fact]
public void DataSectionStringLiterals_HashCollision()
{
var emitOptions = new EmitOptions
{
// Take only the first byte of each string as its hash to simulate collisions.
TestOnly_DataToHexViaXxHash128 = static (data) => data[0].ToString(),
};
var source = """
System.Console.Write("a");
System.Console.Write("b");
System.Console.Write("aa");
""";
CreateCompilation(source,
parseOptions: TestOptions.Regular.WithFeature("experimental-data-section-string-literals", "0"))
.VerifyEmitDiagnostics(emitOptions,
// (3,22): error CS9274: Cannot emit this string literal into the data section because it has XXHash128 collision with another string literal: a
// System.Console.Write("aa");
Diagnostic(ErrorCode.ERR_DataSectionStringLiteralHashCollision, @"""aa""").WithArguments("a").WithLocation(3, 22));
}

[Fact]
public void DataSectionStringLiterals_SynthesizedTypes()
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2991,6 +2991,7 @@ public void TestIsBuildOnlyDiagnostic()
case ErrorCode.ERR_PossibleAsyncIteratorWithoutYield:
case ErrorCode.ERR_PossibleAsyncIteratorWithoutYieldOrAwait:
case ErrorCode.ERR_RefLocalAcrossAwait:
case ErrorCode.ERR_DataSectionStringLiteralHashCollision:
Assert.True(isBuildOnly, $"Check failed for ErrorCode.{errorCode}");
break;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,8 @@ public void TestFieldsForEqualsAndGetHashCode()
nameof(EmitOptions.IncludePrivateMembers),
nameof(EmitOptions.InstrumentationKinds),
nameof(EmitOptions.DefaultSourceFileEncoding),
nameof(EmitOptions.FallbackSourceFileEncoding));
nameof(EmitOptions.FallbackSourceFileEncoding),
nameof(EmitOptions.TestOnly_DataToHexViaXxHash128));
}

[Fact]
Expand Down
Loading