-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathoptimisation.tex
142 lines (104 loc) · 6.49 KB
/
optimisation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
\documentclass[a4paper]{scrartcl}
\usepackage[
fancytheorems,
fancyproofs,
% noindent,
% nokoma
]{adam}
\title{Optimization}
\author{Adam Kelly (\texttt{ak2316@cam.ac.uk})}
\date{\today}
\allowdisplaybreaks
\begin{document}
\maketitle
This is a short description of the course. It should give a little flavour of what the course is about, and what will be roughly covered in the notes.
This article constitutes my notes for the `Optimization' course, held in Easter 2021 at Cambridge. These notes are \emph{not a transcription of the lectures}, and differ significantly in quite a few areas. Still, all lectured material should be covered.
\section{Convex Optimization}
\subsection{Optimization Problems}
In this course, we care about problems that look like `minimize $f(x)$ subject to $x \in X$', or problems with constraints that look like `minimize\footnote{We will almost always talk about minimizing, since maximizing is basically the same thing.} $f(x)$ subject to $x \in X$ where $h(x) = b$'.
In such problems, we call $f$ the \vocab{objective function}, and if $x \in \R^n$, we call the components \vocab{decision variables}. Constraints such as the one above can be given as a \vocab{functional constraint} $h(x) = b$ and a \vocab{regional constraints} $x \in X$. If both of the constraints hold for a given $x$, we say it is in the \vocab{feasible set} $X(b)$. A problems is feasible if the feasible set is non-empty and bounded if the minimum of this set is bounded.
A point $x^* \in X(b)$ is \vocab{optimal} if it minimizes $f$ over $(b)$. The value $f(x^*)$ is the \vocab{optimal cost}.
\subsection{Convex Functions}
We will first study optimization problems involving convex functions without constraints. In particular, we are going to study the problem:
\begin{center}
Minimize $f(x)$, where $f : \R^n \rightarrow \R$ is a convex function.
\end{center}
The reason we will care specifically about convex functions is, as you will see, that local information about the function gives us global information about the function.
It is likely that you are already familiar with the notions of convex sets and convex functions.
\begin{definition}[Convex Set]
A set $S \in \R^n$ is \vocab{convex} if for all $x, y \in S$ and all $\lambda \in [0, 1]$ we have $\lambda x + (1 - \lambda) y \in S$.
\end{definition}
Informally, a set is convex if the line segment between any two points is totally contained in the set. We can also define convexity for functions.
\begin{definition}[Convex Function]
A function $f: S \rightarrow \R$ is \vocab{convex} if $S$ is convex and for all $x, y \in S$ and $\lambda \in [0, 1]$ we have
$$
f(\lambda x + (1 - \lambda) y) \leq \lambda f(x) + (1 - \lambda) f(y).
$$
If this inequality is strict then the function is \vocab{strictly convex}. A function is called \vocab{concave} if $-f$ is convex.
\end{definition}
Again informally, this says that if we consider the graph of the function then the chord between two points will always be above the function.
In general, checking this definition by hand is hard, and we typically use some other conditions to check for convexity.
\begin{proposition}[First-Order Conditions for Convexity]
A differentiable function $f: \R^n \rightarrow \R$ is convex if and only if for all $x, y \in \R^n$ we have
$$
f(y) \geq f(x) + (y - x) \cdot \nabla f(x).
$$
\end{proposition}
\begin{proof}
We will first check that convexity implies that this condition holds by showing it holds for $n = 1$ and using that for the general case.
If $f: \R \rightarrow \R$ is convex, and $x, y \in \R$ then for any $t \in [0, 1]$ we have
\begin{align*}
f((1 - t)x + ty) &= f(x + t(y - x)) \\
& \leq (1 - t)f(x) + tf(y) \\
\implies f(y) & \geq f(x) + \frac{f(x + t(y - x)) - f(x)}{t}.
\end{align*}
Taking the limit as $t \rightarrow 0$, we get that $f(x) \geq f(x) + f'(x)(y - x)$.
Now for $f: \R^n \rightarrow \R$, taking any $x, y \in \R^n$ we can define the restricted function $g(t) = f((1 - t)x + ty)$. Since $f$ is convex, $g$ is also convex. We can then compute
$$
g'(t) = (y - x) \cdot \nabla f((1 - t) x + ty)
$$
So using the $n = 1$ case, we have $g(1) \geq g(0) + g'(0) (1 - 0)$, that is, $f(y) \geq f(x) + (y - x) \cdot \nabla f(x)$, so the condition holds as required.
Now we will show that the first-order conditions imply convexity.
For some $t \in [0, 1]$, we define $x_t = (1 - t) x + ty$. Then we know that
\begin{align*}
f(x) &\geq f(x_t) + (x - x_t) \cdot \nabla f(x_t), \\
f(y) &\geq f(x_t) + (y - x_t) \cdot \nabla f(x_t).
\end{align*}
With a small amount of algebraic manipulation, summing gives
$$
(1 - t)f(x) + tf(y) \geq f(x_t),
$$
as required.
\end{proof}
A possibly more familiar condition for convexity is the second-order conditions. This involves the \emph{Hessian}, the generalization of the second derivative to $\R^n$.
\begin{definition}[Hessian]
If $f: \R^n \rightarrow \R$ is a twice differentiable function, then the \vocab{Hessian matrix} $\nabla^2 f$ is the symmetric matrix of square derivatives given by
$$
(\nabla^2 f)_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}.
$$
\end{definition}
In particular, we care about a specific property of the Hessian matrix.
\begin{definition}[Positive Semidefinite]
A matrix $A$ is \vocab{positive semidefinite} if for all $x \in \R^n$ we have $x \cdot Ax \geq 0$. Equivalently, if all of the eigenvalues of $A$ are non-negative.
\end{definition}
\begin{proposition}[Second-Order Conditions for Convexity]
A twice differentiable function $f:\R^n \rightarrow \R$ is convex if $\nabla^2 f(x)$ is positive semidefinite at all $x \in \R^n$.
\end{proposition}
\begin{proof}
Using the Taylor expansion of $f$, we have
$$
f(y) = f(x) + (y - x) \cdot \nabla f(x) + \frac{1}{2}(y - x) \cdot ( \nabla^2f(z)(y - x)),
$$
where $z = (1 - \lambda)x + \lambda y$ for some $\lambda \in [0, 1]$. Then by the first order conditions, $f$ is convex.
\end{proof}
\subsection{Gradient Descent}
To solve our problem of minimizing $f(x)$ where $f : \R^n \rightarrow \R$ is a convex function, we can begin by noting that convexity implies that a local minimum is a global minimum. So if we find a point where the gradient of the function is 0, then we would be
% \section{Introduction}
% We look at
% \begin{enumerate}
% \item Convex Optimisation
% \item Lagrange method
% \item Linear programming
% \item Applicaitons of linear programming
% \end{enumerate}
\end{document}