Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster ALP encode #924

Merged
merged 17 commits into from
Sep 25, 2024
Prev Previous commit
Next Next commit
fix fill bug
  • Loading branch information
lwwmanning committed Sep 25, 2024
commit 954cbbf3199fc950b3b6631f5d3ea8f8d51d75ef
28 changes: 14 additions & 14 deletions encodings/alp/src/alp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,18 @@ fn encode_chunk_unchecked<T: ALPFloat>(
let chunk_patch_count = chunk_patch_count; // immutable hereafter
assert_eq!(encoded_output.len(), num_prev_encoded + chunk.len());

// find the first successfully encoded value (i.e., not patched)
// this is our fill value for missing values
if fill_value.is_none() && (num_prev_encoded + chunk_patch_count < encoded_output.len()) {
assert_eq!(num_prev_encoded, num_prev_patches);
for i in num_prev_encoded..encoded_output.len() {
if i >= patch_indices.len() || patch_indices[i] != i as u64 {
*fill_value = Some(encoded_output[i]);
break;
}
}
}

// if there are no patches, we are done
if chunk_patch_count == 0 {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to handle the edge case of 2 chunks where chunk 0 is all patches, chunk 1 has 0 patches... which won't fill

return;
Expand All @@ -197,27 +209,15 @@ fn encode_chunk_unchecked<T: ALPFloat>(
let decoded = T::decode_single(encoded_output[i], exp);
// write() is only safe to call more than once because the values are primitive (i.e., Drop is a no-op)
patch_indices_mut[chunk_patch_index].write(i as u64);
patch_values_mut[chunk_patch_index].write(chunk[i]);
chunk_patch_index += (decoded != chunk[i]) as usize;
patch_values_mut[chunk_patch_index].write(chunk[i - num_prev_encoded]);
chunk_patch_index += (decoded != chunk[i - num_prev_encoded]) as usize;
}
assert_eq!(chunk_patch_index, chunk_patch_count);
unsafe {
patch_indices.set_len(num_prev_patches + chunk_patch_count);
patch_values.set_len(num_prev_patches + chunk_patch_count);
}

// find the first successfully encoded value (i.e., not patched)
// this is our fill value for missing values
if fill_value.is_none() {
assert_eq!(num_prev_encoded, num_prev_patches);
for i in num_prev_encoded..encoded_output.len() {
if i >= patch_indices.len() || patch_indices[i] != i as u64 {
*fill_value = Some(encoded_output[i]);
break;
}
}
}

// replace the patched values in the encoded array with the fill value
// for better downstream compression
if let Some(fill_value) = fill_value {
Expand Down