Skip to content

Commit c61ccc3

Browse files
committed
New post from 2022
1 parent 8935736 commit c61ccc3

File tree

4 files changed

+203
-181
lines changed

4 files changed

+203
-181
lines changed
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: A Regularization Proof
3+
author: Brian Zhang
4+
date: '2022-09-19'
5+
slug: a-regularization-proof
6+
categories: []
7+
tags: []
8+
description: 'Investigating behavior of a function minimum as we add regularization.'
9+
---
10+
11+
Say we have a loss function $l(w)$. With no regularization, we might obtain the minimum at $w = w_0$. Now consider the setting with regularization:
12+
$$
13+
f_\lambda(w) = l(w) + \lambda R(w),
14+
$$
15+
where $R(w) \geq 0$ is some regularization function and $\lambda \geq 0$. What can we say if we consider the minimizing inputs $w_1$ for $f_{\lambda_1}(w)$ and $w_2$ for $f_{\lambda_2}(w)$, with $0 \leq \lambda_1 < \lambda_2$?
16+
$$
17+
w_1 = argmin_w \left[ l(w) + \lambda_1 R(w) \right],\\
18+
w_2 = argmin_w \left[ l(w) + \lambda_2 R(w) \right].
19+
$$
20+
21+
Intuitively, as we increase $\lambda$ from $\lambda_1$ to $\lambda_2$, the function $f_\lambda(w)$ places more importance on the regularization term $R(w)$. We should expect $l(w)$ evaluated at the optimum $w$ to increase, and the regularization term $R(w)$ evaluated at the optimum $w$ to decrease.
22+
23+
By the properties of the optimum, we have
24+
\begin{gather}
25+
l(w_1) + \lambda_1 R(w_1) \leq l(w_2) + \lambda_1 R(w_2), \quad (1)\\
26+
l(w_2) + \lambda_2 R(w_2) \leq l(w_1) + \lambda_2 R(w_1). \quad (2)
27+
\end{gather}
28+
The only other information we have relating these terms is that $R(w) \geq 0$ (for all $w$) and $0 \leq \lambda_1 < \lambda_2$. So we work with what we have. First, leveraging $(1)$,
29+
\begin{align*}
30+
f_{\lambda_1}(w_1) &= l(w_1) + \lambda_1 R(w_1)\\
31+
&\leq l(w_2) + \lambda_1 R(w_2)\\
32+
&\leq l(w_2) + \lambda_2 R(w_2)\\
33+
&= f_{\lambda_2}(w_2),
34+
\end{align*}
35+
so the minimum of the optimized function increases (or stays the same) as we increase $\lambda$. This can also be proved as $f_{\lambda_2}(w) \geq f_{\lambda_1}(w)$ for all $w$.
36+
37+
The other inequalities are trickier. Observe (starting with $(2)$):
38+
\begin{align*}
39+
l(w_1) + \lambda_2 R(w_1) &\geq l(w_2) + \lambda_2 R(w_2)\\
40+
&= l(w_2) + (\lambda_1 + \lambda_2 - \lambda_1) R(w_2)\\
41+
&= \left[l(w_2) + \lambda_1 R(w_2)\right] + (\lambda_2 - \lambda_1) R(w_2)\\
42+
&\geq \left[l(w_1) + \lambda_1 R(w_1)\right] + (\lambda_2 - \lambda_1) R(w_2).
43+
\end{align*}
44+
Subtracting $(l(w_1) + \lambda_1 R(w_1))$ from both sides, we have
45+
$$
46+
(\lambda_2 - \lambda_1) R(w_1) \geq (\lambda_2 - \lambda_1) R(w_2).
47+
$$
48+
$\lambda_2 - \lambda_1 > 0$, so dividing on both sides,
49+
$$
50+
R(w_1) \geq R(w_2).
51+
$$
52+
In words, the minimum of the regularization component (not including the factor of $\lambda$) decreases (or stays the same) as we increase $\lambda$.^[An alternate proof, by adding $(1)$ with $(2)$:
53+
$$
54+
l(w_1) + l(w_2) + \lambda_1 R(w_1) + \lambda_2 R(w_2) \leq l(w_1) + l(w_2) + \lambda_1 R(w_2) + \lambda_2 R(w_1),\\
55+
\lambda_1 R(w_1) + \lambda_2 R(w_2) \leq \lambda_1 R(w_2) + \lambda_2 R(w_1),\\
56+
(\lambda_2 - \lambda_1) R(w_2) \leq (\lambda_2 - \lambda_1) R(w_1),\\
57+
R(w_2) \leq R(w_1).
58+
$$
59+
]
60+
61+
Starting with $(1)$ and leveraging this fact, we additionally have
62+
\begin{align*}
63+
l(w_1) + \lambda_1 R(w_1) &\leq l(w_2) + \lambda_1 R(w_2)\\
64+
&\leq l(w_2) + \lambda_1 R(w_1)
65+
\end{align*}
66+
Subtracting $\lambda_1 R(w_1)$ from both sides, we obtain
67+
$$
68+
l(w_1) \leq l(w_2).
69+
$$
70+
In words, the minimum of the loss function component increases (or stays the same) as we increase $\lambda$.^[An alternate proof, by adding $1/\lambda_1$ times $(1)$ with $1/\lambda_2$ times $(2)$:
71+
$$
72+
\frac{l(w_1)}{\lambda_1} + \frac{l(w_2)}{\lambda_2} + R(w_1) + R(w_2) \leq \frac{l(w_2)}{\lambda_1} + \frac{l(w_1)}{\lambda_2} + R(w_2) + R(w_1),\\
73+
\frac{l(w_1)}{\lambda_1} + \frac{l(w_2)}{\lambda_2} \leq \frac{l(w_2)}{\lambda_1} + \frac{l(w_1)}{\lambda_2},\\
74+
\left(\frac{1}{\lambda_1} - \frac{1}{\lambda_2}\right) l(w_1) \leq \left(\frac{1}{\lambda_1} - \frac{1}{\lambda_2}\right) l(w_2) ,\\
75+
l(w_1) \leq l(w_2).
76+
$$
77+
]
78+
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
title: A Regularization Proof
3+
author: Brian Zhang
4+
date: '2022-09-19'
5+
slug: a-regularization-proof
6+
categories: []
7+
tags: []
8+
description: 'Investigating behavior of a function minimum as we add regularization.'
9+
---
10+
11+
12+
13+
<p>Say we have a loss function <span class="math inline">\(l(w)\)</span>. With no regularization, we might obtain the minimum at <span class="math inline">\(w = w_0\)</span>. Now consider the setting with regularization:
14+
<span class="math display">\[
15+
f_\lambda(w) = l(w) + \lambda R(w),
16+
\]</span>
17+
where <span class="math inline">\(R(w) \geq 0\)</span> is some regularization function and <span class="math inline">\(\lambda \geq 0\)</span>. What can we say if we consider the minimizing inputs <span class="math inline">\(w_1\)</span> for <span class="math inline">\(f_{\lambda_1}(w)\)</span> and <span class="math inline">\(w_2\)</span> for <span class="math inline">\(f_{\lambda_2}(w)\)</span>, with <span class="math inline">\(0 \leq \lambda_1 &lt; \lambda_2\)</span>?
18+
<span class="math display">\[
19+
w_1 = argmin_w \left[ l(w) + \lambda_1 R(w) \right],\\
20+
w_2 = argmin_w \left[ l(w) + \lambda_2 R(w) \right].
21+
\]</span></p>
22+
<p>Intuitively, as we increase <span class="math inline">\(\lambda\)</span> from <span class="math inline">\(\lambda_1\)</span> to <span class="math inline">\(\lambda_2\)</span>, the function <span class="math inline">\(f_\lambda(w)\)</span> places more importance on the regularization term <span class="math inline">\(R(w)\)</span>. We should expect <span class="math inline">\(l(w)\)</span> evaluated at the optimum <span class="math inline">\(w\)</span> to increase, and the regularization term <span class="math inline">\(R(w)\)</span> evaluated at the optimum <span class="math inline">\(w\)</span> to decrease.</p>
23+
<p>By the properties of the optimum, we have
24+
<span class="math display">\[\begin{gather}
25+
l(w_1) + \lambda_1 R(w_1) \leq l(w_2) + \lambda_1 R(w_2), \quad (1)\\
26+
l(w_2) + \lambda_2 R(w_2) \leq l(w_1) + \lambda_2 R(w_1). \quad (2)
27+
\end{gather}\]</span>
28+
The only other information we have relating these terms is that <span class="math inline">\(R(w) \geq 0\)</span> (for all <span class="math inline">\(w\)</span>) and <span class="math inline">\(0 \leq \lambda_1 &lt; \lambda_2\)</span>. So we work with what we have. First, leveraging <span class="math inline">\((1)\)</span>,
29+
<span class="math display">\[\begin{align*}
30+
f_{\lambda_1}(w_1) &amp;= l(w_1) + \lambda_1 R(w_1)\\
31+
&amp;\leq l(w_2) + \lambda_1 R(w_2)\\
32+
&amp;\leq l(w_2) + \lambda_2 R(w_2)\\
33+
&amp;= f_{\lambda_2}(w_2),
34+
\end{align*}\]</span>
35+
so the minimum of the optimized function increases (or stays the same) as we increase <span class="math inline">\(\lambda\)</span>. This can also be proved as <span class="math inline">\(f_{\lambda_2}(w) \geq f_{\lambda_1}(w)\)</span> for all <span class="math inline">\(w\)</span>.</p>
36+
<p>The other inequalities are trickier. Observe (starting with <span class="math inline">\((2)\)</span>):
37+
<span class="math display">\[\begin{align*}
38+
l(w_1) + \lambda_2 R(w_1) &amp;\geq l(w_2) + \lambda_2 R(w_2)\\
39+
&amp;= l(w_2) + (\lambda_1 + \lambda_2 - \lambda_1) R(w_2)\\
40+
&amp;= \left[l(w_2) + \lambda_1 R(w_2)\right] + (\lambda_2 - \lambda_1) R(w_2)\\
41+
&amp;\geq \left[l(w_1) + \lambda_1 R(w_1)\right] + (\lambda_2 - \lambda_1) R(w_2).
42+
\end{align*}\]</span>
43+
Subtracting <span class="math inline">\((l(w_1) + \lambda_1 R(w_1))\)</span> from both sides, we have
44+
<span class="math display">\[
45+
(\lambda_2 - \lambda_1) R(w_1) \geq (\lambda_2 - \lambda_1) R(w_2).
46+
\]</span>
47+
<span class="math inline">\(\lambda_2 - \lambda_1 &gt; 0\)</span>, so dividing on both sides,
48+
<span class="math display">\[
49+
R(w_1) \geq R(w_2).
50+
\]</span>
51+
In words, the minimum of the regularization component (not including the factor of <span class="math inline">\(\lambda\)</span>) decreases (or stays the same) as we increase <span class="math inline">\(\lambda\)</span>.<a href="#fn1" class="footnote-ref" id="fnref1"><sup>1</sup></a></p>
52+
<p>Starting with <span class="math inline">\((1)\)</span> and leveraging this fact, we additionally have
53+
<span class="math display">\[\begin{align*}
54+
l(w_1) + \lambda_1 R(w_1) &amp;\leq l(w_2) + \lambda_1 R(w_2)\\
55+
&amp;\leq l(w_2) + \lambda_1 R(w_1)
56+
\end{align*}\]</span>
57+
Subtracting <span class="math inline">\(\lambda_1 R(w_1)\)</span> from both sides, we obtain
58+
<span class="math display">\[
59+
l(w_1) \leq l(w_2).
60+
\]</span>
61+
In words, the minimum of the loss function component increases (or stays the same) as we increase <span class="math inline">\(\lambda\)</span>.<a href="#fn2" class="footnote-ref" id="fnref2"><sup>2</sup></a></p>
62+
<div class="footnotes">
63+
<hr />
64+
<ol>
65+
<li id="fn1"><p>An alternate proof, by adding <span class="math inline">\((1)\)</span> with <span class="math inline">\((2)\)</span>:
66+
<span class="math display">\[
67+
l(w_1) + l(w_2) + \lambda_1 R(w_1) + \lambda_2 R(w_2) \leq l(w_1) + l(w_2) + \lambda_1 R(w_2) + \lambda_2 R(w_1),\\
68+
\lambda_1 R(w_1) + \lambda_2 R(w_2) \leq \lambda_1 R(w_2) + \lambda_2 R(w_1),\\
69+
(\lambda_2 - \lambda_1) R(w_2) \leq (\lambda_2 - \lambda_1) R(w_1),\\
70+
R(w_2) \leq R(w_1).
71+
\]</span><a href="#fnref1" class="footnote-back">↩︎</a></p></li>
72+
<li id="fn2"><p>An alternate proof, by adding <span class="math inline">\(1/\lambda_1\)</span> times <span class="math inline">\((1)\)</span> with <span class="math inline">\(1/\lambda_2\)</span> times <span class="math inline">\((2)\)</span>:
73+
<span class="math display">\[
74+
\frac{l(w_1)}{\lambda_1} + \frac{l(w_2)}{\lambda_2} + R(w_1) + R(w_2) \leq \frac{l(w_2)}{\lambda_1} + \frac{l(w_1)}{\lambda_2} + R(w_2) + R(w_1),\\
75+
\frac{l(w_1)}{\lambda_1} + \frac{l(w_2)}{\lambda_2} \leq \frac{l(w_2)}{\lambda_1} + \frac{l(w_1)}{\lambda_2},\\
76+
\left(\frac{1}{\lambda_1} - \frac{1}{\lambda_2}\right) l(w_1) \leq \left(\frac{1}{\lambda_1} - \frac{1}{\lambda_2}\right) l(w_2) ,\\
77+
l(w_1) \leq l(w_2).
78+
\]</span><a href="#fnref2" class="footnote-back">↩︎</a></p></li>
79+
</ol>
80+
</div>

packages.txt

Lines changed: 44 additions & 180 deletions
Original file line numberDiff line numberDiff line change
@@ -1,180 +1,44 @@
1-
[1] "Today's date is 2020-02-04"
2-
Package Version
3-
animation 2.5
4-
AnnotationDbi 1.28.2
5-
ape 5.3
6-
assertthat 0.2.0
7-
backports 1.1.2
8-
base64enc 0.1-3
9-
BB 2014.10-1
10-
beeswarm 0.2.3
11-
BH 1.65.0-1
12-
bindr 0.1
13-
bindrcpp 0.2
14-
Biobase 2.26.0
15-
BiocGenerics 0.12.1
16-
BiocInstaller 1.16.2
17-
bit 1.1-12
18-
bit64 0.9-7
19-
bitops 1.0-6
20-
blob 1.1.0
21-
blogdown 0.4
22-
bookdown 0.5
23-
brew 1.0-6
24-
broom 0.4.3
25-
calibrate 1.7.2
26-
callr 1.0.0
27-
caTools 1.17.1
28-
cellranger 1.1.0
29-
cli 1.0.0
30-
clipr 0.4.0
31-
coda 0.19-1
32-
colorspace 1.3-2
33-
commonmark 1.4
34-
crayon 1.3.4
35-
curl 3.1
36-
data.table 1.11.6
37-
DBI 0.7
38-
dbplyr 1.2.0
39-
debug 1.3.1
40-
deldir 0.1-14
41-
desc 1.1.1
42-
devtools 1.13.4
43-
dfoptim 2017.12-1
44-
dichromat 2.0-0
45-
digest 0.6.14
46-
dplyr 0.7.4
47-
edgeR 3.8.5
48-
ellipse 0.4.1
49-
evaluate 0.10.1
50-
farver 1.1.0
51-
FLtools 0.0.2
52-
forcats 0.2.0
53-
foreach 1.4.4
54-
gdata 2.18.0
55-
GenomeInfoDb 1.2.4
56-
getopt 1.20.2
57-
gganimate 1.0.0
58-
ggplot2 3.2.1
59-
ggrepel 0.8.1
60-
git2r 0.21.0
61-
glmnet 2.0-13
62-
glue 1.2.0
63-
gplots 3.0.1
64-
gtable 0.2.0
65-
gtools 3.5.0
66-
haven 1.1.0
67-
highr 0.6
68-
hms 0.4.0
69-
htmltools 0.3.6
70-
htmlwidgets 0.9
71-
httpuv 1.5.2
72-
httr 1.3.1
73-
igraph 1.2.1
74-
inline 0.3.14
75-
IRanges 2.0.1
76-
iterators 1.0.9
77-
itertools 0.1-3
78-
jsonlite 1.5
79-
knitr 1.20
80-
labeling 0.3
81-
later 0.8.0
82-
latex2exp 0.4.0
83-
lazyeval 0.2.1
84-
limma 3.22.4
85-
lineprof 0.1.9001
86-
lme4 1.1-17
87-
lubridate 1.7.1
88-
magrittr 1.5
89-
manipulate 1.0.1
90-
mapproj 1.2-5
91-
maps 3.2.0
92-
maptools 0.9-2
93-
markdown 0.8
94-
matrixStats 0.52.2
95-
memoise 1.1.0
96-
microbenchmark 1.4-3
97-
mime 0.5
98-
miniUI 0.1.1
99-
minqa 1.2.4
100-
mnormt 1.5-5
101-
modelr 0.1.1
102-
munsell 0.5.0
103-
mvbutils 2.7.4.1
104-
nloptr 1.0.4
105-
numDeriv 2016.8-1
106-
openssl 0.9.9
107-
optextras 2016-8.8
108-
optimx 2013.8.7
109-
optparse 1.6.0
110-
org.Hs.eg.db 3.0.0
111-
packrat 0.4.9-3
112-
pillar 1.1.0
113-
pkgconfig 2.0.1
114-
plogr 0.1-1
115-
plyr 1.8.4
116-
praise 1.0.0
117-
prettyunits 1.0.2
118-
profvis 0.3.4
119-
progress 1.2.0
120-
promises 1.0.1
121-
proto 1.0.0
122-
pryr 0.1.3
123-
psych 1.7.8
124-
purrr 0.2.4
125-
qqman 0.1.4
126-
quadprog 1.5-5
127-
R6 2.2.2
128-
Rcgmin 2013-2.21
129-
RColorBrewer 1.1-2
130-
Rcpp 1.0.0
131-
RcppEigen 0.3.3.3.1
132-
readr 1.1.1
133-
readxl 1.0.0
134-
rematch 1.0.1
135-
reprex 0.1.1
136-
reshape2 1.4.3
137-
reticulate 1.6
138-
rgdal 1.2-16
139-
rgeos 0.3-26
140-
rlang 0.3.1
141-
rmarkdown 1.9
142-
roxygen2 6.0.1
143-
rprojroot 1.3-2
144-
RSQLite 2.0
145-
rstudio 0.98.994
146-
rstudioapi 0.7
147-
rtweet 0.6.0
148-
rvest 0.3.2
149-
Rvmmin 2017-7.18
150-
S4Vectors 0.4.0
151-
scales 1.0.0
152-
selectr 0.3-1
153-
servr 0.8
154-
setRNG 2013.9-1
155-
sgt 2.0
156-
shapefiles 0.7
157-
shiny 1.3.2
158-
sourcetools 0.1.6
159-
sp 1.2-6
160-
stringi 1.1.6
161-
stringr 1.2.0
162-
svUnit 0.7-12
163-
testit 0.8
164-
testthat 2.0.0
165-
tibble 1.4.1
166-
tidyr 0.7.2
167-
tidyselect 0.2.3
168-
tidyverse 1.2.1
169-
tkrplot 0.0-23
170-
tweenr 1.0.1
171-
ucminf 1.1-4
172-
utf8 1.1.3
173-
viridisLite 0.2.0
174-
whisker 0.3-2
175-
withr 2.1.1
176-
xaringan 0.4.4
177-
XKCDdata 0.1.0
178-
xml2 1.1.1
179-
xtable 1.8-2
180-
yaml 2.1.16
1+
[1] "Today's date is 2022-09-20"
2+
Package Version
3+
base64enc 0.1-3
4+
blogdown 1.12
5+
bookdown 0.29
6+
bslib 0.4.0
7+
cachem 1.0.6
8+
digest 0.6.29
9+
eulerr 6.1.0
10+
evaluate 0.16
11+
fastmap 1.1.0
12+
fs 1.5.2
13+
GenSA 1.1.7
14+
glue 1.6.2
15+
gridExtra 2.3
16+
gtable 0.3.0
17+
highr 0.9
18+
htmltools 0.5.3
19+
httpuv 1.6.6
20+
jquerylib 0.1.4
21+
jsonlite 1.8.0
22+
knitr 1.40
23+
later 1.3.0
24+
magrittr 2.0.3
25+
markdown 1.1
26+
memoise 2.0.1
27+
mime 0.12
28+
polyclip 1.10-0
29+
polylabelr 0.2.0
30+
promises 1.2.0.1
31+
R6 2.5.1
32+
rappdirs 0.3.3
33+
Rcpp 1.0.7
34+
RcppArmadillo 0.10.6.0.0
35+
rlang 1.0.5
36+
rmarkdown 2.16
37+
rprojroot 2.0.3
38+
sass 0.4.2
39+
servr 0.24
40+
stringi 1.7.8
41+
stringr 1.4.1
42+
tinytex 0.41
43+
xfun 0.33
44+
yaml 2.3.5

public

0 commit comments

Comments
 (0)