-
-
Notifications
You must be signed in to change notification settings - Fork 753
Add string interning option #1285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This isn't on by default, and while this first trivial implementation allocates every string and passes it through `string.Intern(string)`, many optimizations could be made to it either in this library directly or by apps through overridding the virtual method. Closes MessagePack-CSharp#634
57a513d to
e422d78
Compare
neuecc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing me to Microsoft.NET.StringTools.
I now understand the concept and implementation of WeakInternedString.
Very interesting and inspiring!
Implementing this into the serializer will get a huge positive impact.
(Certainly, I'm curious the additional dictionary lookup/weakreference lookup cost)
In any case, I am very positive that this implementation will be included.
//
It would be preferable if it were not included in the options, but it is going to be difficult to use.
If the default is on, it's fine, but it's not good to turn it off (it's difficult to have a Resolver with a standard string formatter).
Add new reference, if you are going to include it in the standard, I would be concerned about Unity's support for it, but since it is a simple lib, you may as well include it.
src/MessagePack.UnityClient/Assets/Scripts/MessagePack/Resolvers/StandardResolver.cs
Outdated
Show resolved
Hide resolved
This gets us a few things: 1. Intern strings without wastefully allocating them first in the common fast paths. 1. Switch to a string collection that retains only weak references.
I'll measure it and report back.
I'm pushing an update that should fix Unity support. We already have a dependency on System.Memory and one other. This adds exactly one more (which depends on the same two that we already depended on). |
Well, I can remove all changes from the Options class and if our default interning behavior isn't what people want, they can override our string resolver. Is that good? |
Also revert the Options changes. If the app needs to customize string interning, they can do so by changing the resolver to their own formatter.
This caused a test failure on `net472`.
|
Changes confirmed. |
|
The perf impact is significant:
|
|
It looks to me like this may be inescapable perf cost if you want interning. So the question is, should we have it off by default? Should activating it be by applying a resolver that will pick this as a string formatter, or should this be a
|
|
Oh, sure, it can be hard to make it the default. |
|
I think the easiest opt-in would be based on a |
|
I'd love to have this, but with option to opt-in particular property. Maybe we should define an attribute to opt-in for string interning. |
|
@Shatl That's an interesting idea. I'll look into it. |
It is still available via the `StringInterningFormatter`, which can be turned on by default by aggregating a custom resolver, or by specifying this on a `[Key]` or `[DataMember]` member: ```cs [MessagePackFormatter(typeof(StringInterningFormatter))] ```
|
The latest iteration has it off by default, and documents a few ways by which users can turn it on. |
neuecc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By being optional, there is no problem in provide it.
However, I think that Resolver should also be provided as a counterpart to Formatter.
That is, a StringInterningResolver.
| before any of the standard ones, like this: | ||
|
|
||
| ```cs | ||
| var options = MessagePackSerializerOptions.Standard.WithResolver( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By being optional, there is no problem in provide it.
However, I think that Resolver should also be provided as a counterpart to Formatter.
That is, a StringInterningResolver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this resolver only point to the one formatter, such that it would still need to be combined with other resolvers for every case?
I'm going to complete this PR for now since we can always add a resolver in a subsequent PR based on your answer.
This isn't on by default, and while this first trivial implementation allocates every string and passes it through
string.Intern(string), many optimizations could be made to it either in this library directly or by apps through overridding the virtual method.Points to decide on:
MessagePackSerializationOptionsunchanged?stringallocation by creating our ownDictionary<ReadOnlyMemory<byte>, string>interned string collection? I'm thinking "yes" but we may want to micro-benchmark to make sure it's really a net win.StandardResolvernow includes astringresolver, such that if any app adds a string resolver, it used to work if their resolver worked after theStandardResolverbut now must test before it?string, even when the option to intern hasn't been turned on?Closes #634