-
Notifications
You must be signed in to change notification settings - Fork 831
Description
This story began half a decade ago with an adventurous implementation, and received a more reasonable, albeit more limited, implementation a mere two years ago yet as the tides of time go neither of these is (trivially) merge-able into current master. So rather than continuing to hit myself in the head with a hammer making an updated version, I'd rather put this out to the people who actually pull the strings of Pull Requests in order to determine if this effort will yield any fruit (or just a completely mashed head).
So, why's this a problem anyway? Here's a little use of structural comparison within a dictionary setting with NodaTime's Instant as the key...
module Program
(*
<PackageReference Include="BenchmarkDotNet" Version="0.12.1" />
<PackageReference Include="NodaTime" Version="3.0.0" />
*)
open System
open System.Collections.Generic
open BenchmarkDotNet.Attributes
open BenchmarkDotNet.Running
open NodaTime
[<MemoryDiagnoser>]
type ComparisonComparison () =
let mutable instants : array<Instant> = Unchecked.defaultof<_> // some bunch of instants
let checkContains (dict:IDictionary<Instant,_>) =
for i = 0 to instants.Length-1 do
if i % 2 = 0 <> (dict.ContainsKey instants.[i]) then
failwith "unexpected"
let mutable dictOperators = Unchecked.defaultof<IDictionary<Instant, obj>>
let mutable dictEqualityComparerDefault = Unchecked.defaultof<IDictionary<Instant, obj>>
let mutable dictHashIdentityStructural = Unchecked.defaultof<IDictionary<Instant, obj>>
let mutable dictHashIdentityNonStructural = Unchecked.defaultof<IDictionary<Instant, obj>>
[<Params (1, 100, 10000)>]
member val public DictSize = 0 with get, set
[<GlobalSetup>]
member this.SetupData() =
instants <-
let rng = Random 42
let range lower upper = rng.Next (lower, upper+1)
Array.init (this.DictSize*2) (fun _ ->
Instant.FromUtc (range 1900 2100, range 1 12, range 1 28, range 0 23, range 0 59, range 0 59))
let tmpDictHashIdentityStructural = Dictionary HashIdentity.Structural
let tmpDictHashIdentityNonStructural = Dictionary HashIdentity.NonStructural
let tmpDictEqualityComparerDefault = Dictionary EqualityComparer.Default
dictOperators <-
instants
|> Seq.mapi (fun idx instant -> (idx, instant))
|> Seq.filter (fun (idx,_) -> idx % 2 = 0)
|> Seq.map (fun (_,instant) ->
tmpDictEqualityComparerDefault.[instant] <- null
tmpDictHashIdentityStructural.[instant] <- null
tmpDictHashIdentityNonStructural.[instant] <- null
instant, null)
|> dict
dictEqualityComparerDefault <- tmpDictEqualityComparerDefault
dictHashIdentityStructural <- tmpDictHashIdentityStructural
dictHashIdentityNonStructural <- tmpDictHashIdentityNonStructural
[<Benchmark (Baseline=true)>]
member _.Operators () = checkContains dictOperators
[<Benchmark>]
member _.EqualityComparerDefault () = checkContains dictEqualityComparerDefault
[<Benchmark>]
member _.HashIdentityStructural () = checkContains dictHashIdentityStructural
[<Benchmark>]
member _.HashIdentityNonStructural () = checkContains dictHashIdentityNonStructural
[<EntryPoint>]
let Main args =
let switch = BenchmarkSwitcher [| typeof<ComparisonComparison> |]
switch.Run args |> ignore
0| Method | DictSize | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
|---|---|---|---|---|---|---|---|---|---|
| Operators | 1 | 1,412.23 ns | 5.008 ns | 4.439 ns | 1.00 | 0.0420 | - | - | 176 B |
| EqualityComparerDefault | 1 | 37.43 ns | 0.271 ns | 0.240 ns | 0.03 | - | - | - | - |
| HashIdentityStructural | 1 | 1,362.13 ns | 8.848 ns | 8.276 ns | 0.96 | 0.0305 | - | - | 128 B |
| HashIdentityNonStructural | 1 | 45.30 ns | 0.159 ns | 0.141 ns | 0.03 | - | - | - | - |
| Operators | 100 | 139,269.09 ns | 477.875 ns | 423.623 ns | 1.00 | 4.1504 | - | - | 17600 B |
| EqualityComparerDefault | 100 | 4,995.58 ns | 2.543 ns | 1.985 ns | 0.04 | - | - | - | - |
| HashIdentityStructural | 100 | 134,801.94 ns | 411.938 ns | 343.987 ns | 0.97 | 2.9297 | - | - | 12800 B |
| HashIdentityNonStructural | 100 | 5,993.03 ns | 4.839 ns | 4.290 ns | 0.04 | - | - | - | - |
| Operators | 10000 | 14,260,777.88 ns | 33,679.950 ns | 28,124.284 ns | 1.00 | 406.2500 | - | - | 1760021 B |
| EqualityComparerDefault | 10000 | 596,535.22 ns | 429.688 ns | 401.930 ns | 0.04 | - | - | - | 1 B |
| HashIdentityStructural | 10000 | 13,830,610.78 ns | 67,093.408 ns | 62,759.215 ns | 0.97 | 296.8750 | - | - | 1280232 B |
| HashIdentityNonStructural | 10000 | 708,332.98 ns | 643.601 ns | 502.482 ns | 0.05 | - | - | - | 1 B |
So I'm not sure if you notice there the slight difference in performance...
* ?? Huh what ?? *
That not a slight difference, that's a difference of ~20-25x slower using Structural comparison.
So why is this a problem and what can we do about it (well besides going back in a time machine and slurp in those PRs when they were ready...)
Well first of all we need to know why the decision was made to do this, why it doesn't cause pain with built in types, and then decide what we can do about it.
In original .net land, by default, reference types were equal only if ReferenceEquals is true and value types were only equal if, in a bitwise comparison, they were true (which meant if they had embedded reference types they were only considered equal if those ReferenceEquals were equal), but this didn't provide the type of experience that fsharp wanted to provide. Containers (array, ResizeArray, Dictionary, etc) were just seen as ordinary reference types.
In f-sharp land, by default, records and discriminated unions (of both the reference and value type flavours) implement a series of methods/interfaces so that their compare ever element, and if the elements are all equal, then the object is declared equal. Containers (array, list, map, etc.) also implemented these rules so they too could do comparisons.
To facilitate the f-sharp rules, new interfaces IStructuralComparable and IStructuralEquatable were introduced. The first thing here to recognise is that these interface were not generic, which meant that in order to use the comparisons for value types boxing must occur. But they had to be non-generic, because the one comparer was passed through chains of objects of different types to do the comparison. Rock and hard place land.
There was one more rule that f-sharp comparisons had that wasn't part of "normal" .net equality, and this was for floating point numbers NaN <> NaN.
So yeah that is basically the state of play. So what did I do about it? In my first attempt I created who comparison objects using lots of runtime magic to try to create efficient comparison objects that implemented all of the fsharp rules and was then dynamically created using Type.MakeGenericType. At the time this came across two obstacles. The first was that whilst the old 64-bit JIT did lots of runtime optimizations to make this very efficient, the new Ryu JIT didn't (fair enough, I really was bending, albeit not breaking, the rules). The second was that, at the time, AOT compilation was gaining traction and MakeGenericType was an obstacle to that path.
My second attempt recognised that the same magic that I was performing is basically performed by (Equality)?Comparer.Default (and AOT catered for that) that in most cases the fsharp compilers implementation of IStructural(Comparable|Equatable) was the same as it's implementation of I(IComparable|Equatable)<>. And so with a bit of RTTI I could filter out problematic cases (i.e. those that used floating point numbers due to the NaN case) and then just return the default comparer in most cases. In fact this is basically what the standard f-sharp library is doing, as it just contains a list of known types (i.e intrinsic types) where it returns an efficient, non-boxing comparer. But they have no mechanism to make is available for non-built in types.
So the crux of my implementation relies on the following recursive type check:
let canUseDefaultEqualityComparer er (rootType:Type) =
let processed = System.Collections.Generic.HashSet ()
let rec checkType idx (types:Type[]) =
if idx = types.Length then true
else
let ty = get types idx
if not (processed.Add ty) then
checkType (idx+1) types
else
let isValidGenericType ifNotType fullname =
if not (ty.IsGenericType && ty.GetGenericTypeDefinition().FullName.Equals fullname)
then ifNotType
else checkType 0 (ty.GetGenericArguments ())
let isTypeAndGenericArgumentsOK fullname = isValidGenericType false fullname
let isNotTypeOrIsTypeAndGenericArgumentsOK fullname = isValidGenericType true fullname
// avoid any types that need special handling in GenericEqualityObj
// GenericEqualityObj handles string as a special cases, but internally routes to same equality
ty.IsSealed // covers enum and value types
// ref types need to be sealed as derived class might implement IStructuralEquatable
&& not (isArray ty)
&& (er || (not (ty.Equals typeof<float>)))
&& (er || (not (ty.Equals typeof<float32>)))
&& isNotTypeOrIsTypeAndGenericArgumentsOK "System.Nullable`1"
&& not (isStructuralEquatable ty
// we accept ValueTuple even though it supports IStructuralEquatable
// if all generic arguements pass check
&& not ( isTypeAndGenericArgumentsOK "System.ValueTuple`1"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`2"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`3"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`4"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`5"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`6"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`7"
|| isTypeAndGenericArgumentsOK "System.ValueTuple`8"
|| isTypeAndGenericArgumentsOK "Microsoft.FSharp.Collections.FSharpList`1"
|| isTypeAndGenericArgumentsOK "Microsoft.FSharp.Core.FSharpOption`1"
|| isTypeAndGenericArgumentsOK "Microsoft.FSharp.Core.FSharpValueOption`1"
|| isTypeAndGenericArgumentsOK "Microsoft.FSharp.Core.FSharpResult`2"
)
)
&& checkType (idx+1) types
checkType 0 [|rootType|]So yeah, that's about it really. The rest of the PR was just tying this in so that it was efficiently used (i.e. only called once for any type) and attaching the various operators etc.
Anyway, this has come way, way, way too late to save my work code-base (where I have just ended up never using Structural Equality basically, not using default implementations of data structures, etc. sigh... maybe I'm the only one who cares about efficiency... why bother eh when you can just spin up another x number of cloud machines, he says, shaking his head...) but _if _ people are going to take this as a PR seriously then I'll update it and submit a new request...
But I ain't getting any younger!
(So yeah, if the answer is no, just close this Issue...)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status