vignette about asymptotic complexity of line search #5

tdhock · 2023-03-29T23:30:33Z

after going through a few iterations of the first for loop in https://github.com/tdhock/aum/blob/main/vignettes/line-search.Rmd
I executed this code

  N.seq <- as.integer(10^seq(2,log10(max(diff.list$subtrain$example)),l=10))
  N.seq <- as.integer(10^seq(log10(100),log10(1600),l=10))
  atime.list <- atime::atime(
    N=N.seq,
    setup={
      maxIterations <- N*(N-1)/2
      X.subtrain <- X.keep[index.list$subtrain,]
      X.sub <- X.subtrain[1:N,]
      diff.sub <- diff.list$subtrain[example %in% seq(0,N-1)]
    },
    result=TRUE,
    seconds.limit=1,
    times=1,
    aum_line_search={
      nb.weight.search <- aum::aum_line_search(
        diff.sub,
        maxIterations=maxIterations,
        feature.mat=X.sub,
        weight.vec=weight.vec)
      list(
        total.iterations=nrow(nb.weight.search$line_search_result),
        iterations.to.min=nb.weight.search$line_search_result[,which.min(aum)])
    })
  atime.list$measurements[, total.iterations := sapply(
    result, function(L)L$total.iterations)]
  atime.list$measurements[, iterations.to.min := sapply(
    result, function(L)L$iterations.to.min)]

  best.list <- atime::references_best(atime.list, unit.col.vec=c("total.iterations","iterations.to.min"))
best.ref <- best.list$ref[each.sign.rank==1]
  library(ggplot2)
  gg <- ggplot()+
    theme_bw()+
    facet_grid(unit ~ ., scales="free")+
    geom_line(aes(
      N, empirical, color=expr.name),
      data=best.list$meas)+
    scale_x_log10()+
    scale_y_log10("median line, min/max band")
  gg.show <- gg+
     directlabels::geom_dl(aes(
       N, empirical, color=expr.name, label=expr.name),
       method="right.polygons",
       data=best.list$meas)+
     theme(legend.position="none")+
     coord_cartesian(xlim=c(min(N.seq),max(best.list$meas$N)*5))
  gg.ref <- gg.show+
    geom_line(aes(
      N, reference, group=paste(fun.name, expr.name)),
      color="grey",
      data=best.ref)
    gg.ref+
      directlabels::geom_dl(aes(
        N, reference,
        label.group=paste(fun.name, expr.name),
        label=fun.name),
        data=best.ref,
        color="grey",
        method="left.polygons")

and I got this plot

which suggests that the number of iterations in line search is quadratic, and so is the number of iterations to get to the min.
@phase Would be nice to have a vignette that explores this more systematically,

do this asymptotic analysis for every step of gradient descent on full data?
do gradient descent on different data sizes, and keep track of these metrics at each step?

tdhock · 2023-04-11T16:38:46Z

I did an analysis of all the neuroblastoma-data using the new code in #6 (keep doing more line search iterations until AUM increases) and I observed that the number of iterations that takes is quadratic in the number of input breakpoints/lines. So probably too slow for a vignette on CRAN, closing.

source: https://github.com/tdhock/max-generalized-auc/blob/master/figure-line-search-complexity.R

tdhock · 2023-04-19T21:25:36Z

previous plot was "keep doing more iterations of line search while subtrain aum is decreasing."
what would the plot look like if we did validation aum instead of subtrain?

tdhock · 2023-04-19T21:27:22Z

data for "keep doing more iterations of line search while subtrain aum is decreasing" here https://github.com/tdhock/max-generalized-auc/blob/master/figure-line-search-complexity.csv
can we add the total number of iterations of approx/constant line search? (rather than keep going line search) it should be linear (smaller slope)

tdhock · 2023-05-04T18:12:05Z

actually, even the approx line search (exactL, linear number of iterations of exact line search algorithm) does a quadratic number of iterations, same as min.aum (keep doing more iterations while subtrain aum is decreasing), see below:

To explain the result above, we can examine the number of steps of gradient descent, which is larger for exactL and smaller for exactQ (quadratic number of iterations, full exact line search algorithm), and smaller for min.aum, see below:

The overall timings (including overhead of R memory allocation etc) are shown below, and suggest that the aum.min method is slightly faster, but all three methods are about the same,

source code: https://github.com/tdhock/max-generalized-auc/blob/9574892ed8204771cef360d06756a5aacecd5e99/figure-line-search-complexity-compare.R
Also max validation AUC is about the same between methods, see below,

There is a slight increase of AUC for min.aum/exactQ over exactL.

tdhock · 2023-05-05T18:55:35Z

On this data set, init=zero gets larger valid AUC than init=IRCV. And for IRCV we see that maxIterations=min.aum is consistently better than grid search.

phase · 2023-05-07T02:28:36Z

Here are some graphs from my tests. I think I've reproduced exactL taking a large amount of steps of gradient descent.

This makes me wonder if doing the full quadratic amount of iterations and then checking a few grid points would improve hybrid.

tdhock · 2023-05-08T16:23:15Z

THanks for sharing, those results look consistent.
"This makes me wonder if doing the full quadratic amount of iterations and then checking a few grid points would improve hybrid." -> checking a few grid points would not help quadratic because the quadratic already checks all possible step sizes.

phase · 2023-05-08T20:32:11Z

oh yes - not sure what I was thinking!

I ran some more tests with different hybrid variants and got similar results

tdhock mentioned this issue Mar 29, 2023

add example of references_best(unit.col.vec) tdhock/atime#6

Closed

tdhock closed this as completed Apr 11, 2023

tdhock reopened this Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vignette about asymptotic complexity of line search #5

vignette about asymptotic complexity of line search #5

tdhock commented Mar 29, 2023

tdhock commented Apr 11, 2023

tdhock commented Apr 19, 2023

tdhock commented Apr 19, 2023

tdhock commented May 4, 2023 •

edited

Loading

tdhock commented May 5, 2023

phase commented May 7, 2023 •

edited

Loading

tdhock commented May 8, 2023

phase commented May 8, 2023

vignette about asymptotic complexity of line search #5

vignette about asymptotic complexity of line search #5

Comments

tdhock commented Mar 29, 2023

tdhock commented Apr 11, 2023

tdhock commented Apr 19, 2023

tdhock commented Apr 19, 2023

tdhock commented May 4, 2023 • edited Loading

tdhock commented May 5, 2023

phase commented May 7, 2023 • edited Loading

tdhock commented May 8, 2023

phase commented May 8, 2023

tdhock commented May 4, 2023 •

edited

Loading

phase commented May 7, 2023 •

edited

Loading