Skip to content
This repository has been archived by the owner on Dec 5, 2022. It is now read-only.

Using HashValue as table key is unreliable #13

Closed
WCByrne opened this issue Mar 6, 2018 · 12 comments
Closed

Using HashValue as table key is unreliable #13

WCByrne opened this issue Mar 6, 2018 · 12 comments

Comments

@WCByrne
Copy link

WCByrne commented Mar 6, 2018

I've been testing a variation of your Heckel implementation in a macOS CollectionView project and found an issue. Using hashValue as the key in the table is not reliable. I've made this same mistake myself in early versions of the data structures used in that collection view implementation. My basic solution for now is to replace with [T.Element:TableEntry] and the issue hasn't come up again.

From the Apple Docs

A hash value, provided by a type’s hashValue property, is an integer that is the same for any two instances that compare equally. That is, for two instances a and b of the same type, if a == b, then a.hashValue == b.hashValue. The reverse is not true: Two instances with equal hash values are not necessarily equal to each other.

@onmyway133
Copy link
Owner

@WCByrne Hi, thanks for the suggestion. I will check if I can make the changes. Besides, do you have a test that fails because of this?

@WCByrne
Copy link
Author

WCByrne commented Mar 7, 2018

I do not. I was running into this issue while running a demo app (in the Collection View project) which uses Core Data. When I tried to reproduce the issue in a test using mock objects (not NSManagedObject) but making the same changes to the arrays, it succeeded. I finally noticed that the edits produced from the diff were different given the same source and target arrays. I'm not sure what NSManagedObject uses for it's hashValue but it is apparently less unique that you might expect.

As I mentioned, I have made the same assumption and used hashValue for keys in some of my data structures and often ran into this unexpected behavior.

@andrebraga
Copy link

andrebraga commented Mar 12, 2018

It's more likely that the usage of NSManagedObjects hashes is unreliable because if I remember correctly they're derived from the objectID. If you obtain permanent IDs prior to diffing then it's likely that the algorithm is sound.

Otherwise a weak to weak map table might work. Haven't really looked into the implementation.

@onmyway133
Copy link
Owner

@WCByrne Hi, sorry for late response. I've tried using the whole Hashable object as the key, but I don't see if it makes any difference. Dictionary will use hashValue of Hashable object as the key, so it performs the same as when we use just hashValue. I will change if there's any issues/falling tests

@WCByrne
Copy link
Author

WCByrne commented Jun 12, 2018

👍 hashValue is unique most of the time and will probably work for most applications but is not "correct" as explained by the description of hash value from the docs quoted above.

@ktraunmueller
Copy link

Hashable is not sufficient as a type constraint on elements in order to compute a diff between two collections. Two different objects with different properties may have the same hash value, as pointed out above.

In order to compute insertions, deletions, and moves, you will need at least an Equatable type constraint on the element type.

And if you want to detect (in-place) updates, you will also need some concept of identity. Maybe an "Identifiable" protocol, that uses some form of id to answer the question "are these two objects the same individual? (possibly with different properties)".

@onmyway133
Copy link
Owner

@ktraunmueller Hi, you're right. Maybe this issue may interest you #16

@ktraunmueller
Copy link

Thanks for the quick response. So, DeepDiff does not generally give correct results, because the requirements imposed on the element type are too weak, right?

@onmyway133
Copy link
Owner

@ktraunmueller It is based on Hashable and Equatable, so I think it works quite good. Do you have a test that fails?

@ktraunmueller
Copy link

Sorry, I forgot that a Hashable is an Equatable... 🤦‍♂️

@onmyway133
Copy link
Owner

@ktraunmueller No worried, I learned that the hard way too, you can read my story here onmyway133/blog#122 😅

@ahti
Copy link

ahti commented Jun 24, 2018

I've tried using the whole Hashable object as the key, but I don't see if it makes any difference. Dictionary will use hashValue of Hashable object as the key, so it performs the same as when we use just hashValue. I will change if there's any issues/falling tests

This is incorrect. You can verify easily by throwing this code in a playground:

struct Lol: Hashable {
    let str: String

    var hashValue: Int {
        return 0
    }
}

var d: [Lol: String] = [:]

d[Lol(str: "hello")] = "asdf"
d[Lol(str: "bye")] = "fdsa"

assert(d.count == 2)

Equal hash values do not mean equal objects, and DeepDiff should work with diffs containing things with colliding hash values, which from what I read here (haven't looked at the code itself) is not the case right now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants