Conversation
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
There was a problem hiding this comment.
We may not implement the merge function as the Java/C++ impl for PairTable but find another way to do the two-way merge. This is because in Rust, it's impossible to hold a mutable ref when an immutable ref is still in used, which is how PairTable::merge is used in practice:
PairTable.merge(srcPairArr, 0, srcNumPairs,
allPairs, srcNumPairs, numPairsFromArray,
allPairs, 0); // note the overlapping subarray trickThe real effect here is to perform a two-way merge of allPairs[srcNumPairs..numPairsFromArray] and srcPairArr. There should be a more proper way to do this in Rust.
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
|
This PR is now ready for review. It's mainly ported from the datasketches-cpp impl, so I tag @AlexanderSaydakov as a potential reviewer. Union and serde (compression) would be implemented as follows. But the current state is a reviewable & mergeable minimal feature set. |
| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| #![allow(dead_code)] |
There was a problem hiding this comment.
To be removed when Union and Serde get implemented.
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
Signed-off-by: tison <wander4096@gmail.com>
| dbg_macro = "deny" | ||
|
|
||
| too_many_arguments = "allow" | ||
| needless_range_loop = "allow" |
There was a problem hiding this comment.
False positive when iterating over index can be more expressive.
| fn knuth_shell_sort3(a: &mut [u32]) { | ||
| let len = a.len(); | ||
|
|
||
| let mut h = 0; | ||
| while h < len / 9 { | ||
| h = 3 * h + 1; | ||
| } | ||
|
|
||
| while h > 0 { | ||
| for i in h..len { | ||
| let v = a[i]; | ||
| let mut j = i; | ||
| while j >= h && v < a[j - h] { | ||
| a[j] = a[j - h]; | ||
| j -= h; | ||
| } | ||
| a[j] = v; | ||
| } | ||
| h /= 3; | ||
| } | ||
| } |
There was a problem hiding this comment.
Java uses std Arrays.sort here. We may use [T]::sort_stable (or unstable?) as well. But this is how C++ impl does.
Signed-off-by: tison <wander4096@gmail.com>
|
I'm going to megre this patch recently and continue on the serde(compression) part. But this patch is ported manually so I'd like more eyes on concrete code, to avoid mistakes like #63 Also, it takes about 3 seconds to accumulate 100M distinct values in my local dev with release profile. Many of the time are spent on hashing. I hope we can make some baseline and improve the performance a bit. |
Signed-off-by: tison <wander4096@gmail.com>
|
I'm going to merge this now to keep one commit maintainable. Review after merge is welcome and desired :D |
This refers to #37.
I plan to implement the following steps: