Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify distance/detail to make is easier to dispatch to different kernel implementations #1142

Merged
merged 67 commits into from
Mar 10, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
8d3e8a0
contractions: Concentrate tile index calculations
ahendriksen Sep 2, 2022
cb7baab
pairwise_distance_base: Remove all ldgXY(0) calls
ahendriksen Sep 2, 2022
066bf3b
pairwise_distance_base: Move all logic into run loop
ahendriksen Sep 2, 2022
a15d5fc
pairwise_distance_base: Fix typo
ahendriksen Oct 5, 2022
71c6da6
Remove deprecated header
ahendriksen Jan 11, 2023
4bbedf6
Replace lambdas by raft::void_op
ahendriksen Jan 12, 2023
c3d1f6e
Use an operator for L1 distance
ahendriksen Jan 12, 2023
3e3478b
Add launch function
ahendriksen Jan 12, 2023
264a9d2
l1: Replace run-time -> compile-time dispatch
ahendriksen Jan 13, 2023
b232057
pairwise matrix: move files into subdirectories
ahendriksen Jan 13, 2023
06f6ffa
pairwise matrix: Untangle dispatching and kernel template parameters
ahendriksen Jan 13, 2023
2f41faa
l2 unexp: Use pairwise matrix dispatch
ahendriksen Jan 13, 2023
7938614
l2 exp: Use pairwise matrix dispatch
ahendriksen Jan 13, 2023
7afe6cc
Add template for distance operator
ahendriksen Jan 13, 2023
5fe3292
Reenable cutlass-based kernels for CUDA 12.0
ahendriksen Jan 13, 2023
c623332
pairwise matrix l2: Add support for CUTLASS kernels
ahendriksen Jan 13, 2023
27511fc
Canberra: use dispatching mechanism
ahendriksen Jan 13, 2023
58ce6f8
Chebyshev: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
d397c17
Correlation: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
7005a4f
Hamming: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
7831deb
Hellinger: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
4dc72ce
Jensen-Shannon: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
b0d36c1
remove old hamming code
ahendriksen Jan 13, 2023
e95a65b
KL divergence: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
f1c105b
Minkowski: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
ac66e3f
Russel-Rao: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
a89896a
Cosine: use pairwise matrix dispatch
ahendriksen Jan 13, 2023
16b2acd
Fix include for l1 op
ahendriksen Jan 13, 2023
1326e34
kl_divergence: Use raft::log instead of raft::myLog
ahendriksen Feb 10, 2023
0169b26
distance_op: Add expensive_inner_loop marker
ahendriksen Feb 10, 2023
52e95e1
Update copyright notices
ahendriksen Feb 10, 2023
28cd57b
Reusable dispatch mechanism
ahendriksen Feb 10, 2023
c44aece
Dispatch mechanism using switch statement
ahendriksen Feb 10, 2023
7c3bd76
Remove one ".template" from kernel_sm60
ahendriksen Feb 10, 2023
d62eeb7
Dispatch on veclen instead of byte_alignment
ahendriksen Feb 10, 2023
5c3dcaf
Use many template parameters again
ahendriksen Feb 20, 2023
2613e8a
Remove duplicate DistanceType enum definition
ahendriksen Feb 20, 2023
62ed53a
Remove pairwiseDistanceMatKernel
ahendriksen Feb 20, 2023
c334ba3
Remove distance::detail::pairwise_distance_impl
ahendriksen Feb 20, 2023
8e43238
distance_ops: Include cuda_utils.cuh
ahendriksen Feb 21, 2023
e176351
Replace DistanceImpl with method overloads
ahendriksen Feb 21, 2023
6ddd14f
Remove impl files and move doc strings
ahendriksen Feb 21, 2023
34ccddc
Update readme
ahendriksen Feb 21, 2023
b27cdca
Merge branch 'rapids/branch-23.04' into wip-refactor-distance
ahendriksen Feb 21, 2023
6a12ded
Reenable device code generation
ahendriksen Feb 21, 2023
486393e
Readd overload of raft::distance::detail::distance
ahendriksen Feb 21, 2023
ca29e2d
Fix style
ahendriksen Feb 21, 2023
28c95a1
Fix 11.8 compilation error
ahendriksen Feb 22, 2023
a5592b9
Rename minkowski -> lp_unexp
ahendriksen Feb 22, 2023
265ba07
Rename Chebyshev -> l_inf
ahendriksen Feb 22, 2023
7ccb8a7
Rename euc -> l2
ahendriksen Feb 22, 2023
874d014
Update copyright headers
ahendriksen Feb 22, 2023
757fb44
Remove misleading note about workspace nullptr
ahendriksen Feb 22, 2023
d6e9261
Remove notes file
ahendriksen Feb 22, 2023
885bda6
Put template on struct instead of methods
ahendriksen Feb 22, 2023
cd38ec6
Fix style
ahendriksen Feb 22, 2023
e7a8e89
Merge branch 'branch-23.04' into wip-refactor-distance
cjnolet Feb 22, 2023
6467221
Update cpp/include/raft/distance/detail/distance_ops/canberra.cuh
ahendriksen Mar 6, 2023
a83461e
Update cpp/include/raft/distance/detail/distance.cuh
ahendriksen Mar 6, 2023
393edf3
Add note about alignment in case of byte input
ahendriksen Mar 6, 2023
48a0c21
Fix
ahendriksen Mar 7, 2023
393c546
Merge branch 'branch-23.04' into wip-refactor-distance
cjnolet Mar 8, 2023
c9cbf2c
Merge branch 'branch-23.04' into wip-refactor-distance
cjnolet Mar 9, 2023
569b2c2
Add issue for TODO
ahendriksen Mar 9, 2023
46fd8e5
Merge PR 1319 into wip-refactor-distance
ahendriksen Mar 9, 2023
d089e80
Merge branch 'branch-23.04' into wip-refactor-distance
cjnolet Mar 9, 2023
6d17ac4
Merge branch 'branch-23.04' into wip-refactor-distance
cjnolet Mar 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
pairwise matrix: move files into subdirectories
  • Loading branch information
ahendriksen committed Feb 8, 2023
commit b23205707497252512d8007d6c434c096977b89e
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2018-2023, NVIDIA CORPORATION.
* Copyright (c) 2023, NVIDIA CORPORATION.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
Expand All @@ -15,37 +15,37 @@
*/

#pragma once
#include <raft/distance/detail/pairwise_distance_base.cuh>

namespace raft::distance::detail {

namespace raft::distance::detail::ops {

// Describes the computation the l1 distance
struct l1_distance_op {
// Whether norms of data should be loaded.
// Do not load norms of data, the computation of L1 distance does not use them.
static constexpr bool use_norms = false;

// Size of shared memory. This is normally decided by the kernel policy, but
// some ops such as correlation_distance_op use more.
template <typename Policy>
constexpr size_t shared_mem_size() {
constexpr size_t shared_mem_size()
{
return Policy::SmemSize;
}

template <typename AccT, typename DataT>
DI void core(AccT & acc, DataT & x, DataT & y) const {
DI void core(AccT& acc, DataT& x, DataT& y) const
{
acc += raft::abs(x - y);
};

template <typename Policy, typename AccT, typename DataT, typename IdxT>
DI void epilog(AccT acc[Policy::AccRowsPerTh][Policy::AccColsPerTh],
DataT * regxn,
DataT * regyn,
DataT* regxn,
DataT* regyn,
IdxT gridStrideX,
IdxT gridStrideY) const {
IdxT gridStrideY) const
{
return;
};

};

} // namespace raft::distance::detail
} // namespace raft::distance::detail::ops
134 changes: 4 additions & 130 deletions cpp/include/raft/distance/detail/l1.cuh
Original file line number Diff line number Diff line change
Expand Up @@ -15,139 +15,13 @@
*/

#pragma once
#include <raft/distance/detail/distance_operators.cuh>
#include <raft/distance/detail/pairwise_distance_base.cuh>
#include <raft/distance/detail/pairwise_distance_op.cuh>
#include "distance_ops/l1.cuh"
#include "pairwise_matrix/dispatch.cuh"

namespace raft {
namespace distance {
namespace detail {

template <typename PCT>
static void distance_matrix_launch(typename PCT::opT distance_op,
typename PCT::FinOpT fin_op,
const typename PCT::DataT* x,
const typename PCT::DataT* y,
const typename PCT::DataT* _xn,
const typename PCT::DataT* _yn,
typename PCT::IdxT m,
typename PCT::IdxT n,
typename PCT::IdxT k,
typename PCT::IdxT lda,
typename PCT::IdxT ldb,
typename PCT::IdxT ldd,
typename PCT::OutT* dOutput,
cudaStream_t stream)
{
using Policy = typename PCT::PolicyT;

dim3 blk(Policy::Nthreads);
size_t smem_size = distance_op.template shared_mem_size<Policy>();
dim3 grid = launchConfigGenerator<Policy>(m, n, smem_size, pairwiseDistanceOpKernel<PCT>);

pairwiseDistanceOpKernel<PCT><<<grid, blk, smem_size, stream>>>(
x, y, _xn, _yn, m, n, k, lda, ldb, ldd, dOutput, distance_op, fin_op);

RAFT_CUDA_TRY(cudaGetLastError());
}

// Determine the largest number of elements that can be loaded in one
// instruction without causing misalignment errors.
template <typename DataT>
int max_aligned_load(const DataT* x, const DataT* y, int ldx, int ldy)
{
auto base_x = reinterpret_cast<uintptr_t>(x);
auto base_y = reinterpret_cast<uintptr_t>(y);
size_t stride_X = sizeof(DataT) * ldx; // stride in bytes
size_t stride_Y = sizeof(DataT) * ldy; // stride in bytes

bool base_16B_aligned = base_x % 16 == 0 && base_y % 16 == 0;
bool base_8B_aligned = base_x % 8 == 0 && base_y % 8 == 0;

bool stride_16B_aligned = stride_X % 16 == 0 && stride_Y % 16 == 0;
bool stride_8B_aligned = stride_X % 8 == 0 && stride_Y % 8 == 0;

if (16 % sizeof(DataT) == 0 && base_16B_aligned && stride_16B_aligned) {
return 16 / sizeof(DataT);
} else if (8 % sizeof(DataT) == 0 && base_8B_aligned && stride_8B_aligned) {
return 8 / sizeof(DataT);
} else {
return 1;
}
}

template <typename opT,
typename DataT,
typename AccT,
typename OutT,
typename FinOpT,
typename IdxT = int>
void distance_matrix_dispatch(opT distance_op,
int m_,
int n_,
int k_,
const DataT* x_,
const DataT* y_,
OutT* out,
FinOpT fin_op,
cudaStream_t stream,
bool is_row_major)
{
// Determine leading dimensions and possibly flip order of passing x and y if
// column_major.
//
// ldx, ldy, and ld_out are the leading dimensions of x, y, and out
const DataT* x;
const DataT* y;
int ldx, ldy, ld_out;
int m, n, k;
if (is_row_major) {
// Pass x, y, m, n, k in order
x = x_, y = y_;
m = m_, n = n_, k = k_;
ldx = k_, ldy = k_, ld_out = n_;
} else {
// Flip x, y, and m, n, k.
x = y_, y = x_;
m = n_, n = m_, k = k_;
ldx = n_, ldy = m_, ld_out = m_;
}

int vectorized_load_num_elem = max_aligned_load(x, y, ldx, ldy);

// We dispatch based on
// - vectorized_load_num_elem
// - is_row_major

// Create run-time parameter struct that does the dispatching
using PRT = params_RT<DataT, AccT, OutT, IdxT, decltype(distance_op), FinOpT>;
PRT run_time_params{vectorized_load_num_elem, is_row_major};

// Turn run-time parameters into compile-time parameters.
bool dispatch_success = run_time_params.dispatch_with_compile_time_params(
// We pass a lambda that receives the compile-time parameters and can use these
// to call the correct kernel.
[&](auto compile_time_params) {
// compile_time_params is an empty struct that we can convert back to a type
// using decltype.
return distance_matrix_launch<decltype(compile_time_params)>(
distance_op,
fin_op,
x,
y,
nullptr,
nullptr, // TODO: use _xn, _yn for non-l1 distances
m,
n,
k,
ldx,
ldy,
ld_out,
out,
stream);
});
}

template <typename DataT, typename AccT, typename OutT, typename FinOpT, typename IdxT = int>
void l1Impl(int m,
int n,
Expand All @@ -159,9 +33,9 @@ void l1Impl(int m,
cudaStream_t stream,
bool is_row_major)
{
l1_distance_op distance_op{};
ops::l1_distance_op distance_op{};

distance_matrix_dispatch<l1_distance_op, DataT, AccT, OutT, FinOpT, IdxT>(
distance_matrix_dispatch<ops::l1_distance_op, DataT, AccT, OutT, FinOpT, IdxT>(
distance_op, m, n, k, x, y, out, fin_op, stream, is_row_major);
}

Expand Down
Loading