Skip to content

Commit

Permalink
rnndescent parameters for search included into the controls_ann() fun…
Browse files Browse the repository at this point in the history
…ction
  • Loading branch information
BERENZ committed May 8, 2024
1 parent 4632aee commit f097077
Show file tree
Hide file tree
Showing 4 changed files with 11 additions and 13 deletions.
5 changes: 4 additions & 1 deletion R/controls.R
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,10 @@ controls_ann <- function(
weight_by_degree = FALSE,
prune_reverse = FALSE,
progress = "bar",
obs = "R"),
obs = "R",
##
max_search_fraction = 1,
epsilon = 0.1),
hnsw = list(M = 25,
ef_c = 200,
ef_s = 200,
Expand Down
6 changes: 3 additions & 3 deletions R/method_nnd.R
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,12 @@ method_nnd <- function(x,
l_1nn <- rnndescent::rnnd_query(index = l_ind,
query = y,
k = if (nrow(x) < control$k_search) nrow(x) else control$k_search,
epsilon = 0.1,
max_search_fraction = 1,
epsilon = control$nnd$epsilon,
max_search_fraction = control$nnd$max_search_fraction,
init = NULL,
verbose = verbose,
n_threads = n_threads,
obs = "R")
obs = control$nnd$obs)

# if (!is.null(path)) {
# if (grepl("(/|\\\\)$", path)) {
Expand Down
3 changes: 2 additions & 1 deletion man/controls_ann.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 2 additions & 8 deletions vignettes/v2-reclin.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -157,14 +157,14 @@ It seems that the default parameters of the NND method result in an FNR of `r sp
```{r}
set.seed(2024)
ann_control_pars <- controls_ann()
ann_control_pars$k_search <- 60
ann_control_pars$nnd$epsilon <- 0.2
result3 <- blocking(x = census$txt, y = cis$txt, verbose = 1,
true_blocks = matches[, .(x, y, block)], n_threads = 8,
control_ann = ann_control_pars)
```

Changing the `k_search` parameter from 30 to 60 decreased the FDR to `r sprintf("%.1f",result3$metrics["fnr"]*100)`%.
Changing the `epsilon` search parameter from 0.1 to 0.2 decreased the FDR to `r sprintf("%.1f",result3$metrics["fnr"]*100)`%.

```{r}
result3
Expand All @@ -184,12 +184,6 @@ It seems that the HNSW algorithm performed better with `r sprintf("%.2f",result4
result4
```

However, this comes at a cost, especially in terms of computation:

1. the HNSW does not handle sparse matrices, so a sparse matrix of tokens must be converted to dense or provided line by line.
2. The HNSW algorithm is slower than NND.

Computation times are: 16 seconds for NND and about 60 for HNSW (on M2 MacBook AIR). We can improve the time by changing the parameters `M` and `ef_s` in the `controls_ann()` function (e.g. setting `M=16` and `ef_s=15` leads to about 16 seconds with 1\% FNR).

## Compare results

Expand Down

0 comments on commit f097077

Please sign in to comment.