feat(gpu): improve full propagation in sum and sub #1763

guillermo-oyarzun · 2024-11-08T11:17:35Z

closes: please link all relevant issues

PR content/description

Check-list:

Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
Relevant issues are marked as resolved/closed, related issues are linked in the description
Check for breaking changes (including serialization changes) and add them to commit message following the conventional commit specification

agnesLeroy

Hey @guillermo-oyarzun! Thanks a lot for this PR 🙏 Here comes a first review: I don't know the details of the implementation so it's hard for me to go through all the logic though. Maybe if you walk me through it it would help.

agnesLeroy · 2024-11-08T14:07:24Z

backends/tfhe-cuda-backend/cuda/include/integer/integer.h

+                                              uint32_t gpu_count,
+                                              int8_t **mem_ptr_void);
+
+void scratch_cuda_integer_overflowing_sub_kb_64_inplace(


Are you sure some cuda_integer_radix_overflowing_sub are not left in some places?

I removed some unnecessary headers that were still there

agnesLeroy · 2024-11-08T14:08:06Z

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

+  int_radix_lut(cudaStream_t const *streams, uint32_t const *gpu_indexes,
+                uint32_t gpu_count, int_radix_params params, uint32_t num_luts,
+                uint32_t num_radix_blocks, uint32_t lut_count,
+                bool allocate_gpu_memory) {


It's a bit strange to have num_luts and lut_count, how about we rename lut_count to num_many_lut? (That renaming would have to be applied everywhere lut_count is used)

agnesLeroy · 2024-11-08T14:16:50Z

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

+      multi_gpu_alloc_lwe_async(streams, gpu_indexes, active_gpu_count,
+                                lwe_after_ks_vec, num_radix_blocks,
+                                params.small_lwe_dimension + 1);
+      multi_gpu_alloc_many_lwe_async(streams, gpu_indexes, active_gpu_count,


Maybe this function could be renamed: multi_gpu_alloc_lwe_many_lut_output_async?

agnesLeroy · 2024-11-08T14:24:10Z

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

+  };
+
+  // needed for the division to update the lut indexes
+  void update_lut_indexes(cudaStream_t const *streams,


I'm not sure having the function in this way is the most readable way to go? Maybe it would be better to write the whole logic for index update in the division itself instead?

I moved most of the logic to the division now

agnesLeroy · 2024-11-08T14:24:35Z

backends/tfhe-cuda-backend/cuda/include/integer/integer_utilities.h

+    luts_array_second_step->release(streams, gpu_indexes, gpu_count);
+
+    if (use_sequential_algorithm_to_resolver_group_carries) {
+      seq_group_prop_mem->release(streams, gpu_indexes, gpu_count);


delete is missing

agnesLeroy · 2024-11-08T14:47:58Z

backends/tfhe-cuda-backend/cuda/src/integer/integer.cuh

+    cudaStream_t const *streams, uint32_t const *gpu_indexes,
+    uint32_t gpu_count, Torus *lwe_array, int_radix_params params,
+    int_shifted_blocks_and_states_memory<Torus> *mem, void *const *bsks,
+    Torus *const *ksks, uint32_t num_blocks, uint32_t lut_stride,


Maybe num_blocks -> num_radix_blocks

agnesLeroy · 2024-11-08T14:52:57Z

tfhe/src/core_crypto/gpu/mod.rs

+        message_modulus,
+    );
+}
+


I don't think we need to add this to core crypto, do we?

agnesLeroy · 2024-11-08T14:54:08Z

tfhe/src/integer/gpu/server_key/radix/add.rs

@@ -227,6 +285,18 @@ impl CudaServerKey {
        streams.synchronize();
    }

+    pub fn unchecked_add_assign_with_packing<T: CudaIntegerRadixCiphertext>(


Do we need to have this entry point on the Rust side?

it is something needed for the signed overflowing add/sub, not actually tested in this PR, I could remove it and just included in the other PR we will have

agnesLeroy · 2024-11-08T14:55:02Z

tfhe/src/integer/gpu/server_key/radix/mod.rs

+    ///
+    /// - `streams` __must__ be synchronized to guarantee computation has finished, and inputs must
+    ///   not be dropped until streams is synchronized
+    pub(crate) unsafe fn new_propagate_single_carry_assign_async<T>(


Couldn't we name this one propagate_single_carry_assign_async and remove the old one?

agnesLeroy · 2024-11-08T14:55:23Z

tfhe/src/integer/gpu/server_key/radix/mod.rs

+    where
+        T: CudaIntegerRadixCiphertext,
+    {
+        self.propagate_fast_single_carry_assign_async(ct, streams, input_carry, requested_flag)


Couldn't we keep only the fast version on the Rust side, and remove the old one?

guillermo-oyarzun requested review from agnesLeroy, tmontaigu and bbarbakadze November 8, 2024 11:17

guillermo-oyarzun self-assigned this Nov 8, 2024

cla-bot bot added the cla-signed label Nov 8, 2024

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch from 6b37fe7 to 44cb537 Compare November 8, 2024 11:56

agnesLeroy reviewed Nov 8, 2024

View reviewed changes

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch 9 times, most recently from 09770dd to 935588f Compare November 13, 2024 14:52

feat(gpu): improve full propagation in sum and sub

9af4fde

guillermo-oyarzun force-pushed the go/refactor/improve-full-propagation-and-sum-algorithms branch from 935588f to 9af4fde Compare November 13, 2024 16:59

fix signed overflowing ops and reorganize code

d55598b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(gpu): improve full propagation in sum and sub #1763

feat(gpu): improve full propagation in sum and sub #1763

guillermo-oyarzun commented Nov 8, 2024

agnesLeroy left a comment

agnesLeroy Nov 8, 2024

guillermo-oyarzun Nov 12, 2024

agnesLeroy Nov 8, 2024 •

edited

Loading

agnesLeroy Nov 8, 2024 •

edited

Loading

agnesLeroy Nov 8, 2024

guillermo-oyarzun Nov 12, 2024

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

guillermo-oyarzun Nov 12, 2024

agnesLeroy Nov 8, 2024

agnesLeroy Nov 8, 2024

feat(gpu): improve full propagation in sum and sub #1763

Are you sure you want to change the base?

feat(gpu): improve full propagation in sum and sub #1763

Conversation

guillermo-oyarzun commented Nov 8, 2024

PR content/description

Check-list:

agnesLeroy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agnesLeroy Nov 8, 2024 •

edited

Loading

agnesLeroy Nov 8, 2024 •

edited

Loading