Jonathan Hillman and Toby Dylan Hocking, Optimizing ROC Curves with a Sort-Based Surrogate Loss Function for Binary Classification and Changepoint Detection, arXiv:2107.01285
This paper proposed a new surrogate loss function, Area Under Min of FP and FN (AUM). Minimizing AUM was shown to result in maximizing Area Under the ROC Curve (AUC). Reference implementation of Algorithm 1 (AUM and directional derivative computation) in C++ can be found in R package aum. AUM implementation in pytorch can be found in figure-aum-neural-networks-data.py.
Before running some of the replication scripts below, you will need to clone the data repositories: https://github.com/tdhock/feature-learning-benchmark and https://github.com/tdhock/neuroblastoma-data.
- Figure 1: Example changepoint data with non-monotonic FP. Left PNG (data), Right PNG (error), R code.
- Figure 2: synthetic example with AUC greater than one. Left PNG (AUM), right PNG (AUC), R code.
- Figure 3: real data example with AUC greater than one. left PNG (error functions), right PNG (AUM/AUC functions), R code.
- Figure 4: optimizing AUM maximizes AUC. Left PNG, R code (iterations); Right PNG, R code (ROC curves).
- Figure 5: Test AUM/AUC comparison across multiple seeds and datasetsTop PNG, Bottom PNG, R code
- Figure 6: binary classification test AUC. Left PNG (Comparing AUM variants), Right PNG (AUM compared to baselines), Right PNG, R code.
- Figure 7: PNG (Test AUC of neural network for image classification), R code.
- Figure 8: speed comparison. PNG, R code.
Interactive version: http://ml.nau.edu/viz/2021-11-12-aum-convexity/
Source: figure-aum-convexity-interactive.R
- Aligning two error curves (peak detection).
- 8 error curves with 2 different alignments.
- Some ROC curves with AUC>1.
figure-auc-improved-interactive.R
Title: Efficient line search for optimizing Area Under the ROC Curve in gradient descent
Abstract: Receiver Operating Characteristic (ROC) curves are useful for evaluation in binary classification and changepoint detection, but difficult to use for learning since the Area Under the Curve (AUC) is piecewise constant (gradient zero almost everywhere). Recently the Area Under Min (AUM) of false positive and false negative rates has been proposed as a differentiable surrogate for AUC. In this paper we study the piecewise linear/constant nature of the AUM/AUC, and propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of gradient descent (line search), when optimizing a linear model. Remarkably, our proposed line search algorithm has the same log-linear asymptotic time complexity as gradient descent with constant step size, but it computes a complete representation of the AUM/AUC as a function of step size. In our empirical study of binary classification problems, we verify that our proposed algorithm is fast and exact; in changepoint detection problems we show that the proposed algorithm is just as accurate as grid search, but faster.
Slides PDF
- https://github.com/tdhock/max-generalized-auc/blob/master/HOCKING-slides-toronto.pdf
- https://github.com/tdhock/max-generalized-auc/blob/master/HOCKING-slides-sherbrooke.pdf
See also https://github.com/tdhock/two-new-algos-sci-ml
data_Classif_batchtools_best_valid.R makes
data_Classif.R makes data_Classif.RData
data_Classif_figure.R makes
figure-aum-convexity-new.R makes
figure-line-search-example-binary.R makes
figure-more-than-one-new.R makes new figures for AUM line search paper:
figure-line-search-example.R make figure/demo for new paper/slides.
figure-more-than-one.R modified to make heat map version of SM figure, and rate version of AUM:
JSM Slides “Efficient line search optimization of penalty functions in supervised changepoint detection,” PDF, LaTeX source.
figure-line-search-example.R make figure/demo for new paper/slides.
figure-line-search-interactive.R makes new interactive figure showing line search.
figure-line-search-complexity-compare.R studies asymptotic complexity of different versions of line search.
figure-line-search-complexity.R is a more systematic version of this experiment, computing how many iterations it takes to get to a min aum step size, tdhock/aum#5
result CSV: figure-line-search-complexity.csv and figure below,
figure-line-grid-search-interactive.R compares grid search to line search with linear number of iterations (same as number of input diffs/breakpoints B).
figure-auc-improved.R makes figure below which shows one way to measure irregularity of ROC curves, via difference with AUC of monotonic version.
HOCKING-slides-prescott.tex makes HOCKING-slides-prescott.pdf
with new figure code figure-aum-grad-speed-binary.R which makes
figure-compare-hinge-loss.R makes
New image classification experiment figure-aum-neural-networks-data.py adapted from torch AUM code, https://tdhock.github.io/blog/2022/aum-learning/
figure-aum-neural-networks.R makes
Additional figures in figure-more-than-one.R
Figure below from aum package accuracy comparison vignette suggests that experiments on sonar data could provide convincing evidence of superior accuracy.
figure-sonar-comparisons-data.R makes figure-sonar-comparisons.csv
figure-sonar-comparisons.R reads that and makes
figure-aum-convexity-interactive.R makes interactive figure
Interactive versions:
- 2 Feb 2023, bigger text size http://ml.nau.edu/viz/2021-11-12-aum-convexity/
- 7 Nov 2021, continuity in pred.diff interaction http://bl.ocks.org/tdhock/raw/e3f56fa419a6638f943884a3abe1dc0b
- 6 Nov 2021, no continuity in pred.diff interaction http://bl.ocks.org/tdhock/raw/de3979318d5255dd6e21ff907e2f3fb4
HOCKING-slides.tex makes HOCKING-slides.pdf for ML lab / Math colloq.
figure-aum-grad-speed-binary-cpp-data.R makes binary classification timing data, figure-aum-grad-speed-binary-cpp-data.csv
figure-aum-grad-speed-binary-cpp.R makes
figure-aum-grad-speed.R updated to make
figure-unbalanced-grad-desc.R updated to make new figure (useful for slides probly)
Updated figure-aum-convexity.R new figures
Updated figure-aum-grad-speed.R new figure
figure-aum-grad-speed-binary.R makes
figure above shows time differences between sorted (linear) and unsorted (log-linear) predictions.
figure below shows differences between algos (aum comparable to logistic, whether or not predictions are sorted).
figure-aum-grad-speed-data.R makes figure-aum-grad-speed-data.csv
figure-aum-grad-speed.R reads that and makes
figure-unbalanced-grad-desc-data.R makes figure-unbalanced-grad-desc-data.rds
figure-unbalanced-grad-desc.R reads that and makes
The figure above shows that the AUM variant which uses total number of errors (count) is more accurate than the AUM variant which uses the normalized error (rate).
The figure above shows that the AUM is at least as accurate as squared.hinge.all.pairs, whereas logistic.weighted is less accurate.
figure-logistic-weights.R makes
This figure shows that cv.glmnet does fine with 5% positive labels, but stops learning when we get down to 1% positive labels. This suggests that we should try 1% for comparing aum.rate and aum.count.
figure-DNA-Sonar-subtrain-valid-data.R makes
figure-DNA-Sonar-subtrain-valid-data.csv.gz
figure-DNA-Sonar-subtrain-valid.R analyzes those data.
figure-binary-test-auc-data.R makes figure-binary-test-auc-data.rds
figure-binary-test-auc.R makes
figure-test-fold-monotonic.R makes
> meta.dt[, .(data.name, test.fold, features, n.train, mean.breaks)] data.name test.fold features n.train mean.breaks 1: ATAC_JV_adipose 4 29 341 6.665689 2: H3K27ac-H3K4me3_TDHAM_BP 2 26 1865 4.145845 3: H3K4me3_XJ_immune 2 28 216 5.902778 4: H3K4me3_XJ_immune 4 28 216 6.134259 5: systematic 1 117 3322 1.010235 > (meta.stats <- meta.tall[, .( + min=min(value), + max=max(value) + ), by=variable]) variable min max 1: features 26.000000 117.000000 2: n.train 216.000000 3322.000000 3: mean.breaks 1.010235 6.665689
figure-aum-train-both.R makes
figure-aum-train-data.R makes figure-aum-train-data.rds
figure-aum-train.R makes
figure-aum-optimized-data.R makes figure-aum-optimized-data.rds
figure-aum-optimized.R reads those data and makes
This shows N=54 predicted values with min error, then predicted values optimized via aum gradient descent.
- TODO do same with linear model, train error/auc.
- TODO aum figs?
figure-binary-class.R makes a figure showing what fp/fn curves look like for binary class,
figure-aum-convexity.R makes
figure-fn-not-monotonic.R makes
figure-more-than-one.R makes
figure-linear-model-test-analyze.R makes
Some R scripts for interactive experimentation with grad desc algo for learning linear model that minimizes AUM:
- figure-linear-model.R uses penaltyLearning::IntervalRegressionCV for initialization.
- figure-linear-model-zero-init.R uses zero vector for init.
R script with OneFold function that computes train/valid/test error, can be parallelized over 198 test folds on the cluster:
Initial results on two data sets (ATAC, CTCF) show that
- Train AUM decreases as a function of iterations (each iteration does line search so that is expected).
- IntervalRegressionCV init is much more accurate (in terms of test AUM, AUC, errors) than zero init. Best linear model is not as accurate as best predictions, after running gradient descent on just the predicted values (without linear model).
- Using early stopping regularization (select number of iterations with min AUM on validation set) does not decrease test AUM using IntervalRegressionCV initialization.
- The linear model which is best in terms of test AUM, over all iterations, is not much better than the initial iteration, for these two data sets.
- Do we see any improvement on other test folds / data sets?
figure-compare-hinge-loss-data.R makes figure-compare-hinge-loss-data.csv
figure-compare-hinge-loss.R makes
figure-neuroblastomaProcessed-combinations.R makes new figure that highlights counter-examples for the proposition (AUC=1 implies AUM=0) and shows that there are no counter-examples for the converse.
auc.improved.R copied from https://github.com/tdhock/feature-learning-benchmark/blob/master/auc.improved.R
figure-curveAlignment.R computes derivative of area under min(fp,fn), updated viz: http://ml.nau.edu/viz/2019-08-19-curveAlignment-aub-deriv/
figure-neuroblastomaProcessed-combinations-interactive.R makes
http://ml.nau.edu/viz/2019-08-16-generalized-roc/
curveAlignment.R and figure-curveAlignment.R
http://members.cbio.mines-paristech.fr/~thocking/figure-max-auc/
figure-aub-convexity.R creates figures which show that the aub function is continuous but not convex:
figure-neuroblastomaProcessed-complex-loon.R has code for an interactive plot using loon.
figure-neuroblastomaProcessed-combinations.R creates the following figure which plots auc vs aub:
Note that the min AUM=0 has AUC=1, and the points with AUC>1 have AUM>0. Thus minimizing AUM seems like a reasonable criterion.
figure-neuroblastomaProcessed-complex.R creates http://members.cbio.mines-paristech.fr/~thocking/figure-neuroblastomaProcessed-complex/ which shows 8 labeled neuroblastoma data sequences with two different ROC curves / predictions. Strangely both achieve 0 errors, but the one with predictions in the finite interval has a highly non-monotonic ROC curve, and much smaller area inside the ROC polygon.
figure-neuroblastomaProcessed-combinations.R creates the following figure which shows the auc values for all of the 2^8 unique combinations of predicted values for 8 labeled profiles.
Each labeled profiles has two minima: one in an infinite interval, and one in a finite interval. The panel titles show the difference d from the infinite interval limit to the predicted value, e.g. (-Inf, 1.2) with d=1 results in a predicted value of 0.2. The overall pattern is that d is relevant for AUC, in a range 0.001 to 10, but it has no effect outside that range. Surprisingly there are AUC values greater than zero, which happens when there are cycles. One example is highlighted with a circle in the plot above, and the ROC curves are shown below.
https://github.com/tdhock/neuroblastoma-data/blob/master/figure-max-auc.R creates http://members.cbio.mines-paristech.fr/~thocking/figure-max-auc/