Skip to content

Commit 5963edf

Browse files
authored
doxygen fixes (#138)
* doxygen fixes * fix local memory size requirement documentation
1 parent 14892b8 commit 5963edf

File tree

7 files changed

+53
-83
lines changed

7 files changed

+53
-83
lines changed

src/portfft/committed_descriptor_impl.hpp

Lines changed: 0 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -138,34 +138,6 @@ detail::layout get_layout(const Descriptor& desc, direction dir) {
138138
return detail::layout::UNPACKED;
139139
}
140140

141-
/*
142-
Compute functions in the `committed_descriptor_impl` call `dispatch_kernel` and `dispatch_kernel_helper`. These two
143-
functions ensure the kernel is run with a supported subgroup size. Next `dispatch_kernel_helper` calls `run_kernel`. The
144-
`run_kernel` member function picks appropriate implementation and calls the static `run_kernel of that implementation`.
145-
The implementation specific `run_kernel` handles differences between forward and backward computations, casts the memory
146-
(USM or buffers) from complex to scalars and launches the kernel. Each function described in this doc has only one
147-
templated overload that handles both directions of transforms and buffer and USM memory.
148-
149-
Device functions make no assumptions on the size of a work group or the number of workgroups in a kernel. These numbers
150-
can be tuned for each device.
151-
152-
Implementation-specific `run_kernel` function make the size of the FFT that is handled by the individual workitems
153-
compile time constant. The one for subgroup implementation also calls `cross_sg_dispatcher` that makes the
154-
cross-subgroup factor of FFT size compile time constant. They do that by using a switch on the FFT size for one
155-
workitem, before calling `workitem_impl`, `subgroup_impl` or `workgroup_impl` . The `_impl` functions take the FFT size
156-
for one workitem as a template parameter. Only the calls that are determined to fit into available registers (depending
157-
on the value of PORTFFT_TARGET_REGS_PER_WI macro) are actually instantiated.
158-
159-
The `_impl` functions iterate over the batch of problems, loading data for each first in
160-
local memory then from there into private one. This is done in these two steps to avoid non-coalesced global memory
161-
accesses. `workitem_impl` loads one problem per workitem, `subgroup_impl` loads one problem per subgroup and
162-
`workgroup_impl` loads one problem per workgroup. After doing computations by the calls to `wi_dft` for workitem,
163-
`sg_dft` for subgroup and `wg_dft` for workgroup, the data is written out, going through local memory again.
164-
165-
The computational parts of the implementations are further documented in files with their implementations
166-
`workitem.hpp`, `subgroup.hpp` and `workgroup.hpp`.
167-
*/
168-
169141
/**
170142
* A committed descriptor that contains everything that is needed to run FFT.
171143
*

src/portfft/common/global.hpp

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ namespace detail {
4545
/**
4646
* Gets the precomputed inclusive scan of the factors at a particular index.
4747
*
48-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
48+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
4949
* @param num_factors Number of factors
5050
* @param level_num factor number
5151
* @return Outer batch product
@@ -72,9 +72,9 @@ PORTFFT_INLINE inline IdxGlobal get_outer_batch_product(const IdxGlobal* inclusi
7272
* required m-dimensional loop into the single loop (dispatch level), and this function calculates the offset.
7373
* Precomputed inclusive scans are used to further reduce the number of calculations required.
7474
*
75-
* @param factors global memory pointer containing factors of the input
76-
* @param inner_batches global memory pointer containing the inner batch for each factor
77-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
75+
* @param factors pointer to global memory containing factors of the input
76+
* @param inner_batches pointer to global memory containing the inner batch for each factor
77+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
7878
* @param num_factors Number of factors
7979
* @param iter_value Current iterator value of the flattened n-dimensional loop
8080
* @param outer_batch_product Inclusive Scan of factors at position level_num-1
@@ -122,14 +122,14 @@ PORTFFT_INLINE inline IdxGlobal get_outer_batch_offset(const IdxGlobal* factors,
122122
* @param output output pointer
123123
* @param input_imag input pointer for imaginary data
124124
* @param output_imag output pointer for imaginary data
125-
* @param implementation_twiddles global twiddles pointer containing twiddles for the sub implementation
125+
* @param implementation_twiddles pointer to global memory containing twiddles for the sub implementation
126126
* @param store_modifier store modifier data
127-
* @param input_loc local memory for storing the input
128-
* @param twiddles_loc local memory for storing the twiddles for sub-implementation
129-
* @param store_modifier_loc local memory for store modifier data
130-
* @param factors global memory pointer containing factors of the input
131-
* @param inner_batches global memory pointer containing the inner batch for each factor
132-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
127+
* @param input_loc pointer to local memory for storing the input
128+
* @param twiddles_loc pointer to local memory for storing the twiddles for sub-implementation
129+
* @param store_modifier_loc pointer to local memory for store modifier data
130+
* @param factors pointer to global memory containing factors of the input
131+
* @param inner_batches pointer to global memory containing the inner batch for each factor
132+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
133133
* @param batch_size Batch size for the corresponding input
134134
* @param global_data global data
135135
* @param kh kernel handler
@@ -187,10 +187,10 @@ PORTFFT_INLINE void dispatch_level(const Scalar* input, Scalar* output, const Sc
187187
* @param loc_for_store_modifier local memory for store modifier data
188188
* @param multipliers_between_factors twiddles to be multiplied between factors
189189
* @param impl_twiddles twiddles required for sub implementation
190-
* @param factors global memory pointer containing factors of the input
191-
* @param inner_batches global memory pointer containing the inner batch for each factor
192-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
193-
* @param n_transforms batch size corresposding to the factor
190+
* @param factors pointer to global memory containing factors of the input
191+
* @param inner_batches pointer to global memory containing the inner batch for each factor
192+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
193+
* @param n_transforms batch size corresponding to the factor
194194
* @param input_batch_offset offset for the input pointer
195195
* @param launch_params launch configuration, the global and local range with which the kernel will get launched
196196
* @param cgh associated command group handler
@@ -246,10 +246,10 @@ void launch_kernel(sycl::accessor<const Scalar, 1, sycl::access::mode::read>& in
246246
* @param loc_for_store_modifier local memory for store modifier data
247247
* @param multipliers_between_factors twiddles to be multiplied between factors
248248
* @param impl_twiddles twiddles required for sub implementation
249-
* @param factors global memory pointer containing factors of the input
250-
* @param inner_batches global memory pointer containing the inner batch for each factor
251-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
252-
* @param n_transforms batch size corresposding to the factor
249+
* @param factors pointer to global memory containing factors of the input
250+
* @param inner_batches pointer to global memory containing the inner batch for each factor
251+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
252+
* @param n_transforms batch size corresponding to the factor
253253
* @param input_batch_offset offset for the input pointer
254254
* @param launch_params launch configuration, the global and local range with which the kernel will get launched
255255
* @param cgh associated command group handler
@@ -297,9 +297,9 @@ void launch_kernel(const Scalar* input, Scalar* output, const Scalar* input_imag
297297
* @param input input pointer
298298
* @param output output accessor
299299
* @param loc 2D local memory
300-
* @param factors global memory pointer containing factors of the input
301-
* @param inner_batches global memory pointer containing the inner batch for each factor
302-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
300+
* @param factors pointer to global memory containing factors of the input
301+
* @param inner_batches pointer to global memory containing the inner batch for each factor
302+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
303303
* @param output_offset offset to output pointer
304304
* @param ldb leading dimension of the output
305305
* @param lda leading dimension of the input
@@ -357,9 +357,9 @@ static void dispatch_transpose_kernel_impl(const Scalar* input,
357357
* @param input input pointer
358358
* @param output output pointer
359359
* @param loc 2D local memory
360-
* @param factors global memory pointer containing factors of the input
361-
* @param inner_batches global memory pointer containing the inner batch for each factor
362-
* @param inclusive_scan global memory pointer containing the inclusive scan of the factors
360+
* @param factors pointer to global memory containing factors of the input
361+
* @param inner_batches pointer to global memory containing the inner batch for each factor
362+
* @param inclusive_scan pointer to global memory containing the inclusive scan of the factors
363363
* @param output_offset offset to output pointer
364364
* @param ldb leading dimension of the output
365365
* @param lda leading dimension of the input
@@ -418,7 +418,7 @@ static void dispatch_transpose_kernel_impl(const Scalar* input, Scalar* output,
418418
* @param kd_struct kernel data struct
419419
* @param input input pointer
420420
* @param output output usm/buffer
421-
* @param factors_triple global memory pointer containing factors, inner batches corresponding per factor, and the
421+
* @param factors_triple pointer to global memory containing factors, inner batches corresponding per factor, and the
422422
* inclusive scan of the factors
423423
* @param committed_size committed size of the FFT
424424
* @param num_batches_in_l2 number of batches in l2
@@ -481,8 +481,8 @@ sycl::event transpose_level(const typename committed_descriptor_impl<Scalar, Dom
481481
* @param output output pointer
482482
* @param input_imag input usm/buffer for imaginary data
483483
* @param output_imag output pointer for imaginary data
484-
* @param twiddles_ptr global pointer containing the input
485-
* @param factors_triple global memory pointer containing factors, inner batches corresponding per factor, and the
484+
* @param twiddles_ptr pointer to global memory containing the input
485+
* @param factors_triple pointer to global memory containing factors, inner batches corresponding per factor, and the
486486
* inclusive scan of the factors
487487
* @param intermediate_twiddle_offset offset value to the global pointer for twiddles in between factors
488488
* @param subimpl_twiddle_offset offset value to to the global pointer for obtaining the twiddles required for sub

src/portfft/common/workgroup.hpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -60,15 +60,15 @@ namespace detail {
6060
* @tparam SubgroupSize Size of the subgroup
6161
* @tparam LocalT The type of the local view
6262
* @tparam T Scalar type
63-
* @param loc local accessor containing the input
63+
* @param loc View of the local memory containing the input
6464
* @param loc_twiddles Pointer to twiddles to be used by sub group FFTs
6565
* @param wg_twiddles Pointer to precalculated twiddles which are to be used before second set of FFTs
6666
* @param scaling_factor Scalar factor with which the result is to be scaled
6767
* @param max_num_batches_in_local_mem Number of batches local memory is allocated for
6868
* @param batch_num_in_local Id of the local memory batch to work on
6969
* @param load_modifier_data Pointer to the load modifier data in global Memory
7070
* @param store_modifier_data Pointer to the store modifier data in global Memory
71-
* @param batch_num_in_kernel Absosulte batch from which batches loaded in local memory will be computed
71+
* @param batch_num_in_kernel Absolute batch from which batches loaded in local memory will be computed
7272
* @param dft_size Size of each DFT to calculate
7373
* @param stride_within_dft Stride between elements of each DFT - also the number of the DFTs in the inner dimension
7474
* @param ndfts_in_outer_dimension Number of DFTs in outer dimension
@@ -300,13 +300,13 @@ __attribute__((always_inline)) inline void dimension_dft(
300300
* @tparam LocalT Local memory view type
301301
* @tparam T Scalar type
302302
*
303-
* @param loc A view of a local accessor containing input
303+
* @param loc View of the local memory containing the input
304304
* @param loc_twiddles Pointer to twiddles to be used by sub group FFTs
305305
* @param wg_twiddles Pointer to precalculated twiddles which are to be used before second set of FFTs
306306
* @param scaling_factor Scalar factor with which the result is to be scaled
307307
* @param max_num_batches_in_local_mem Number of batches local memory is allocated for
308308
* @param batch_num_in_local Id of the local memory batch to work on
309-
* @param batch_num_in_kernel Absosulte batch from which batches loaded in local memory will be computed
309+
* @param batch_num_in_kernel Absolute batch from which batches loaded in local memory will be computed
310310
* @param load_modifier_data Pointer to the load modifier data in global Memory
311311
* @param store_modifier_data Pointer to the store modifier data in global Memory
312312
* @param fft_size Problem Size

src/portfft/dispatcher/global_dispatcher.hpp

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,10 +68,10 @@ inline std::pair<IdxGlobal, IdxGlobal> get_launch_params(IdxGlobal fft_size, Idx
6868

6969
/**
7070
* Transposes A into B, for complex inputs only
71-
* @param a Input pointer a
72-
* @param b Input pointer b
73-
* @param lda leading dimension A
74-
* @param ldb leading Dimension B
71+
* @param a Input pointer
72+
* @param b Output pointer
73+
* @param lda leading dimension of `a`
74+
* @param ldb leading dimension of `b`
7575
* @param num_elements Total number of complex values in the matrix
7676
*/
7777
template <typename T>

src/portfft/dispatcher/subgroup_dispatcher.hpp

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,19 +64,18 @@ IdxGlobal get_global_size_subgroup(IdxGlobal n_transforms, Idx factor_sg, Idx su
6464
* @tparam LayoutOut Output Layout
6565
* @tparam SubgroupSize size of the subgroup
6666
* @tparam T type of the scalar used for computations
67-
* @param input accessor or pointer to global memory containing input data. If complex storage (from
67+
* @param input pointer to global memory containing input data. If complex storage (from
6868
* `SpecConstComplexStorage`) is split, this is just the real part of data.
69-
* @param output accessor or pointer to global memory for output data. If complex storage (from
69+
* @param output pointer to global memory for output data. If complex storage (from
7070
* `SpecConstComplexStorage`) is split, this is just the real part of data.
71-
* @param input accessor or pointer to global memory containing imaginary part of the input data if complex storage
71+
* @param input pointer to global memory containing imaginary part of the input data if complex storage
7272
* (from `SpecConstComplexStorage`) is split. Otherwise unused.
73-
* @param output accessor or pointer to global memory containing imaginary part of the input data if complex storage
73+
* @param output pointer to global memory containing imaginary part of the input data if complex storage
7474
* (from `SpecConstComplexStorage`) is split. Otherwise unused.
75-
* @param loc local accessor. Must have enough space for 2*FactorWI*FactorSG*SubgroupSize
75+
* @param loc pointer to local memory. Size requirement is determined by `num_scalars_in_local_mem_struct`.
76+
* @param loc_twiddles pointer to local memory for twiddle factors. Must have enough space for `2 * FactorWI * FactorSG`
7677
* values
77-
* @param loc_twiddles local accessor for twiddle factors. Must have enough space for 2*FactorWI*FactorSG
78-
* values
79-
* @param n_transforms number of FT transforms to do in one call
78+
* @param n_transforms number of FFT transforms to do in one call
8079
* @param global_data global data for the kernel
8180
* @param kh kernel handler associated with the kernel launch
8281
* @param twiddles pointer containing twiddles

src/portfft/dispatcher/workgroup_dispatcher.hpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,15 +81,15 @@ IdxGlobal get_global_size_workgroup(IdxGlobal n_transforms, Idx subgroup_size, I
8181
* @tparam SubgroupSize size of the subgroup
8282
* @tparam T Scalar type
8383
*
84-
* @param input accessor or pointer to global memory containing input data. If complex storage (from
84+
* @param input pointer to global memory containing input data. If complex storage (from
8585
* `SpecConstComplexStorage`) is split, this is just the real part of data.
86-
* @param output accessor or pointer to global memory for output data. If complex storage (from
86+
* @param output pointer to global memory for output data. If complex storage (from
8787
* `SpecConstComplexStorage`) is split, this is just the real part of data.
88-
* @param input_imag accessor or pointer to global memory containing imaginary part of the input data if complex storage
88+
* @param input_imag pointer to global memory containing imaginary part of the input data if complex storage
8989
* (from `SpecConstComplexStorage`) is split. Otherwise unused.
90-
* @param output_imag accessor or pointer to global memory containing imaginary part of the input data if complex
90+
* @param output_imag pointer to global memory containing imaginary part of the input data if complex
9191
* storage (from `SpecConstComplexStorage`) is split. Otherwise unused.
92-
* @param loc Pointer to local memory
92+
* @param loc Pointer to local memory. Size requirement is determined by `num_scalars_in_local_mem_struct`.
9393
* @param loc_twiddles pointer to local allocation for subgroup level twiddles
9494
* @param n_transforms number of fft batches
9595
* @param global_data global data for the kernel

src/portfft/dispatcher/workitem_dispatcher.hpp

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -81,16 +81,15 @@ PORTFFT_INLINE void apply_modifier(Idx num_elements, PrivT priv, const T* modifi
8181
* @tparam LayoutOut Output Layout
8282
* @tparam SubgroupSize size of the subgroup
8383
* @tparam T type of the scalar used for computations
84-
* @param input accessor or pointer to global memory containing input data. If complex storage (from
84+
* @param input pointer to global memory containing input data. If complex storage (from
8585
* `SpecConstComplexStorage`) is split, this is just the real part of data.
86-
* @param output accessor or pointer to global memory for output data. If complex storage (from
86+
* @param output pointer to global memory for output data. If complex storage (from
8787
* `SpecConstComplexStorage`) is split, this is just the real part of data.
88-
* @param input accessor or pointer to global memory containing imaginary part of the input data if complex storage
88+
* @param input pointer to global memory containing imaginary part of the input data if complex storage
8989
* (from `SpecConstComplexStorage`) is split. Otherwise unused.
90-
* @param output accessor or pointer to global memory containing imaginary part of the input data if complex storage
90+
* @param output pointer to global memory containing imaginary part of the input data if complex storage
9191
* (from `SpecConstComplexStorage`) is split. Otherwise unused.
92-
* @param loc local memory pointer. Must have enough space for 2*fft_size*SubgroupSize
93-
* values
92+
* @param loc local memory pointer. Size requirement is determined by `num_scalars_in_local_mem_struct`.
9493
* @param n_transforms number of FT transforms to do in one call
9594
* @param global_data global data for the kernel
9695
* @param kh kernel handler associated with the kernel launch

0 commit comments

Comments
 (0)