Provide a few optimizations to the `matchindex` C++ function: #695

stefvanbuuren · 2025-02-27T11:09:41Z

Changes

The matchindex C++ function has the following changes:

Use d[x] instead of d(x) for NumericVector
Use std::clamp() instead of manual if conditions for k
Remove redundant std::vector ysort(n1)
Use std::distance() instead of unnecessary iterator assignments
Use seq_len(k) directly instead of creating an extra IntegerVector kv(k)
Avoid redundant copies of sampled indices
Replace unnecessary lambda [yshuf] capture with direct reference
Use std::clamp() for restricting k values

Correctness

The result is exactly the same as in the original:

> # Inputs need not be sorted
> d <- c(-5, 5, 0, 10, 12)
> t <- c(-6, -4, 0, 2, 4, -2, 6)
> 
> # Index (in vector d) of closest match
> set.seed(1)
> idx <- matchindex(d, t)
> idx
[1] 5 2 2 1 4 3 1
> 
> # Compare with optimized version
> set.seed(1)
> idx <- matchindex_optimized(d, t)
> idx
[1] 5 2 2 1 4 3 1

Speed-up

library(Rcpp)
library(microbenchmark)

# Load original and optimized versions of matchindex()
sourceCpp("original_matchindex.cpp")
sourceCpp("optimized_matchindex.cpp")

# Generate test data
set.seed(42)
n_d <- 100000  # Number of donor cases
n_t <- 10000   # Number of target cases

d <- runif(n_d, -10, 10)  # Random donor values
t <- runif(n_t, -10, 10)  # Random target values
k <- 5  # Number of nearest neighbors to sample

benchmark_results <- microbenchmark(
  original = matchindex(d, t, k),
  optimized = matchindex_optimized(d, t, k),
  times = 10
)

summary(benchmark_results)

Result:

       expr  min   lq mean median   uq  max neval cld
1  original 14.6 14.7 15.7   14.8 15.6 19.2    10  a 
2 optimized 13.9 14.1 14.5   14.6 15.0 15.0    10   b

In typical use cases, changes result in a speed-up of about 10%.

- Use d[x] instead of d(x) for NumericVector - Use std::clamp() instead of manual if conditions for k - Remove redundant std::vector<double> ysort(n1) - Use std::distance() instead of unnecessary iterator assignments - Use seq_len(k) directly instead of creating an extra IntegerVector kv(k) - Avoid redundant copies of sampled indices - Replace unnecessary lambda [yshuf] capture with direct reference //' # - Use std::clamp() for restricting k values In typical use cases, changes result in a speed-up of about 10%.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a few optimizations to the `matchindex` C++ function: #695

Provide a few optimizations to the `matchindex` C++ function: #695

stefvanbuuren commented Feb 27, 2025 •

edited

Loading

Provide a few optimizations to the matchindex C++ function: #695

Are you sure you want to change the base?

Provide a few optimizations to the matchindex C++ function: #695

Conversation

stefvanbuuren commented Feb 27, 2025 • edited Loading

Changes

Correctness

Speed-up

Provide a few optimizations to the `matchindex` C++ function: #695

Provide a few optimizations to the `matchindex` C++ function: #695

stefvanbuuren commented Feb 27, 2025 •

edited

Loading