Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify VirtualHasher input iterator #14434

Open
artemananiev opened this issue Jul 26, 2024 · 0 comments
Open

Simplify VirtualHasher input iterator #14434

artemananiev opened this issue Jul 26, 2024 · 0 comments
Assignees
Labels
Platform Data Structures Platform Virtual Map Platform Tickets pertaining to the platform Tech Debt Reduced Issues which reduce technical debt.

Comments

@artemananiev
Copy link
Member

VirtualHasher accepts a stream of dirty leaves (virtual leaf records) as its input. The stream must be sorted in virtual node path order, from the first leaf path to the last leaf path. Current hashing algorithm relies on this order.

More often than not, the first leaf path and the last leaf path are on different ranks. In this case, last leaf parent is immediately to the left of the first leaf, and a few more paths close to the end of the range are to the left, too. It means, if a tree is traversed left to right, then at some point some leaves with higher paths may be processed before leaves with lower paths. There is a mechanism to work it around in VirtualHasher, but it makes hashing code slightly more complicated than it could be.

This ticket is to make a change to order in which dirty leaves stream is sorted. For every leaf, its position in the stream should be according to the path at the first leaf rank - either the leaf itself, or its parent. That is, assume the leaf range is [N, 2N], while leaves N to M are on the first leaf rank, and leaves M+1 to 2N are on the last leaf rank. Then the stream should be [M+1, 2N] [N, M]

There are two scenarios where VirtualHasher is used, and both should be updated to use the order above. First, hashing a single copy from virtual pipeline. Dirty leaves stream is provided by VirtualNodeCache. Second, hashing during reconnect, and in this case the stream is provided by a virtual view.

@artemananiev artemananiev added Tech Debt Reduced Issues which reduce technical debt. Platform Virtual Map Platform Data Structures Platform Tickets pertaining to the platform labels Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Platform Data Structures Platform Virtual Map Platform Tickets pertaining to the platform Tech Debt Reduced Issues which reduce technical debt.
Projects
None yet
Development

No branches or pull requests

2 participants