Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix SortedSet.IsSubsetOf (and friends) 102118 #102249

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

lilinus
Copy link
Contributor

@lilinus lilinus commented May 15, 2024

Fix #102118

The required array size for the bit helper BitHelper in CheckUniqueAndUnfoundElements (with this change calculated with GetInternalIndexOfBitHelperLength) grows quite rapidly.
So I added a threshold where the algorithm for GetInternalIndexOfBitHelperLength will return -1, where the implementation in CheckUniqueAndUnfoundElements switches to a fallback algorithm, which does not have same extreme memory complexity.

The fallback algorithm tries to find the node, and add it to a HashSet to remember if it has been seen before.

The algorithm used in TreeSet was not affectd by this bug, so I have not changed the implementation in this PR.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 15, 2024
@lilinus lilinus changed the title Fix 102118 Fix SortedSet.InternalIndexOf 102118 May 15, 2024
// | 254 | 14 | 32767
// | 510 | 16 | 131071
// | 1022 | 18 | 524287
private const int InternalIndexOfCountThreshold = 254;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took this value becuase it seemed "reasonable" to me according to numbers above in comments.

@lilinus
Copy link
Contributor Author

lilinus commented May 16, 2024

Anyone has an idea of how to test the case when Count is above InternalIndexOfCountThreshold without using big collections?

Would it be viable to have InternalIndexOfCountThreshold be a mutable instance field we set in unit tests, if DEBUG is defined?

@danmoseley
Copy link
Member

Can you use the big collection but conditionalize the test on 64 bit?

@lilinus lilinus marked this pull request as ready for review May 16, 2024 20:41
@sglienke
Copy link

sglienke commented Oct 7, 2024

I recently came across this bug and was thinking about this solution.

Unless I am missing something we can use the fact that up to the height/2 of the tree, we have a complete tree and only fall back to a different way of calculating the index of nodes in levels after that.

@lilinus
Copy link
Contributor Author

lilinus commented Oct 9, 2024

I recently came across this bug and was thinking about this solution.

Unless I am missing something we can use the fact that up to the height/2 of the tree, we have a complete tree and only fall back to a different way of calculating the index of nodes in levels after that.

Hmm something like that could be used, but as soon as we are not implying a position based on a full tree, we still need to traverse a big part of the tree to simply find the number of nodes that are "left" (with a common parent node) of a specific node.

@sglienke
Copy link

sglienke commented Oct 9, 2024

That is true, and it may not even be possible - I just worry that the current implementation might degrade performance too much.

And just a small nitpick: we might confuse the terms "complete tree" and "full tree" - the following is a full but not a complete tree:

            0
      1           2
  3       4   
        5   6        

@lilinus
Copy link
Contributor Author

lilinus commented Oct 9, 2024

That is true, and it may not even be possible - I just worry that the current implementation might degrade performance too much.

Yeah it's not an optimal algorithm, the pro is the much smaller memory footprint. Also it is already used for InternalIndexOf in
TreeSubSet.

I'm planning doing a follow-up PR to speed it up a bit, reduce allocations etc. But did not want to mess too much with any existing code since this is only a bug-fix.

Any suggestion for a better algorithm for the case when the count is above the limit is welcome.

And just a small nitpick: we might confuse the terms "complete tree" and "full tree"

Gotcha, will make sure the terms are used correctly.

@lilinus
Copy link
Contributor Author

lilinus commented Oct 13, 2024

I have changed the fallback implementation now (and updated the description accordingly). PTAL @sglienke

@lilinus lilinus changed the title Fix SortedSet.InternalIndexOf 102118 Fix SortedSet.IsSubsetOf (and friends) 102118 Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Collections community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SortedSet.IsSubsetOf not working as expected
4 participants