Skip to content

Commit

Permalink
word版本更新
Browse files Browse the repository at this point in the history
  • Loading branch information
fengdu78 committed Mar 27, 2018
1 parent 69c5a6b commit 7218aec
Show file tree
Hide file tree
Showing 13 changed files with 47 additions and 45 deletions.
Binary file modified docx/week1.docx
Binary file not shown.
Binary file modified docx/week5.docx
Binary file not shown.
Binary file added docx/机器学习个人笔记完整版v5.1.docx
Binary file not shown.
18 changes: 8 additions & 10 deletions html/week1.html

Large diffs are not rendered by default.

50 changes: 24 additions & 26 deletions html/week2.html

Large diffs are not rendered by default.

Binary file removed images/91259aa5d0d28587fb50b1a9b2a46bd8.png
Binary file not shown.
Binary file removed images/a551794073098649c7d11a8d1d659a65.png
Binary file not shown.
Binary file removed images/a6efa292d096f7b8290e4c2f13b28508.png
Binary file not shown.
Binary file removed images/ef564af7fe890e454e557b47115092e4.png
Binary file not shown.
Binary file removed images/f045799e2e3a9fd4aca785291fe8c82e.png
Binary file not shown.
16 changes: 11 additions & 5 deletions markdown/week1.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,19 +371,25 @@ ${\theta_{j}}:={\theta_{j}}-\alpha \frac{\partial }{\partial {\theta_{j}}}J\left

对我们之前的线性回归问题运用梯度下降法,关键在于求出代价函数的导数,即:

![](../images/a6efa292d096f7b8290e4c2f13b28508.png)
$\frac{\partial }{\partial {{\theta }_{j}}}J({{\theta }_{0}},{{\theta }_{1}})=\frac{\partial }{\partial {{\theta }_{j}}}\frac{1}{2m}{{\sum\limits_{i=1}^{m}{\left( {{h}_{\theta }}({{x}^{(i)}})-{{y}^{(i)}} \right)}}^{2}}$

![](../images/a551794073098649c7d11a8d1d659a65.png)
$\frac{\partial }{\partial {{\theta }_{0}}}J({{\theta }_{0}},{{\theta }_{1}})=\frac{1}{m}{{\sum\limits_{i=1}^{m}{\left( {{h}_{\theta }}({{x}^{(i)}})-{{y}^{(i)}} \right)}}}$

$j=0$ 时:

![](../images/f045799e2e3a9fd4aca785291fe8c82e.png)
$\frac{\partial }{\partial {{\theta }_{1}}}J({{\theta }_{0}},{{\theta }_{1}})=\frac{1}{m}\sum\limits_{i=1}^{m}{\left( \left( {{h}_{\theta }}({{x}^{(i)}})-{{y}^{(i)}} \right)\cdot {{x}^{(i)}} \right)}$

$j=1$ 时:

则算法改写成:

![](../images/ef564af7fe890e454e557b47115092e4.png)
**Repeat {**

​ ${\theta_{0}}:={\theta_{0}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{ \left({{h}_{\theta }}({{x}^{(i)}})-{{y}^{(i)}} \right)}$

​ ${\theta_{1}}:={\theta_{1}}-a\frac{1}{m}\sum\limits_{i=1}^{m}{\left( \left({{h}_{\theta }}({{x}^{(i)}})-{{y}^{(i)}} \right)\cdot {{x}^{(i)}} \right)}$

**}**

我们刚刚使用的算法,有时也称为批量梯度下降。实际上,在机器学习中,通常不太会给算法起名字,但这个名字”**批量梯度下降**”,指的是在梯度下降的每一步中,我们都用到了所有的训练样本,在梯度下降中,在计算微分求导项时,我们需要进行求和运算,所以,在每一个单独的梯度下降中,我们最终都要计算这样一个东西,这个项需要对所有$m$个训练样本求和。因此,批量梯度下降法这个名字说明了我们需要考虑所有这一"批"训练样本,而事实上,有时也有其他类型的梯度下降法,不是这种"批量"型的,不考虑整个的训练集,而是每次只关注训练集中的一些小的子集。在后面的课程中,我们也将介绍这些方法。

Expand Down Expand Up @@ -504,7 +510,7 @@ $A{{A}^{-1}}={{A}^{-1}}A=I$

例:

![](../images/91259aa5d0d28587fb50b1a9b2a46bd8.png)
${{\left| \begin{matrix} a& b \\ b& d \\ e& f \\\end{matrix} \right|}^{T}}=\left|\begin{matrix} a& c & e \\ b& d & f \\\end{matrix} \right|$

矩阵的转置基本性质:

Expand Down
2 changes: 1 addition & 1 deletion markdown/week2.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,7 +141,7 @@ ${{{h}}_{\theta}}(x)={{\theta }_{0}}\text{+}{{\theta }_{1}}(size)+{{\theta}_{1}}

或者:

${{t{h}}_{\theta}}(x)={{\theta }_{0}}\text{+}{{\theta }_{1}}(size)+{{\theta }_{1}}\sqrt{size}$
${{{h}}_{\theta}}(x)={{\theta }_{0}}\text{+}{{\theta }_{1}}(size)+{{\theta }_{1}}\sqrt{size}$

注:如果我们采用多项式回归模型,在运行梯度下降算法前,特征缩放非常有必要。

Expand Down
6 changes: 3 additions & 3 deletions markdown/week5.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ $K$类分类:$S_L=k, y_i = 1$表示分到第i类;$(k>2)$

我们回顾逻辑回归问题中我们的代价函数为:

$$ J\left(\theta \right)=-\frac{1}{m}\left[\sum_\limits{i=1}^{m}{y}^{(i)}\log{h_\theta({x}^{(i)})}+\left(1-{y}^{(i)}\right)log\left(1-h_\theta\left({x}^{(i)}\right)\right)\right]+\frac{\lambda}{2m}\sum_\limits{j=1}^{n}{\theta_j}^{2} $$
$ J\left(\theta \right)=-\frac{1}{m}\left[\sum_\limits{i=1}^{m}{y}^{(i)}\log{h_\theta({x}^{(i)})}+\left(1-{y}^{(i)}\right)log\left(1-h_\theta\left({x}^{(i)}\right)\right)\right]+\frac{\lambda}{2m}\sum_\limits{j=1}^{n}{\theta_j}^{2} $


在逻辑回归中,我们只有一个输出变量,又称标量(scalar),也只有一个因变量$y$,但是在神经网络中,我们可以有很多输出变量,我们的$h_\theta(x)$是一个维度为$K$的向量,并且我们训练集中的因变量也是同样维度的一个向量,因此我们的代价函数会比逻辑回归更加复杂一些,为:$\newcommand{\subk}[1]{ #1_k }$
$$h_\theta\left(x\right)\in \mathbb{R}^{K}$$ $${\left({h_\theta}\left(x\right)\right)}_{i}={i}^{th} \text{output}$$

$$J(\Theta) = -\frac{1}{m} \left[ \sum\limits_{i=1}^{m} \sum\limits_{k=1}^{k} {y_k}^{(i)} \log \subk{(h_\Theta(x^{(i)}))} + \left( 1 - y_k^{(i)} \right) \log \left( 1- \subk{\left( h_\Theta \left( x^{(i)} \right) \right)} \right) \right] + \frac{\lambda}{2m} \sum\limits_{l=1}^{L-1} \sum\limits_{i=1}^{s_l} \sum\limits_{j=1}^{s_l+1} \left( \Theta_{ji}^{(l)} \right)^2$$
$J(\Theta) = -\frac{1}{m} \left[ \sum\limits_{i=1}^{m} \sum\limits_{k=1}^{k} {y_k}^{(i)} \log \subk{(h_\Theta(x^{(i)}))} + \left( 1 - y_k^{(i)} \right) \log \left( 1- \subk{\left( h_\Theta \left( x^{(i)} \right) \right)} \right) \right] + \frac{\lambda}{2m} \sum\limits_{l=1}^{L-1} \sum\limits_{i=1}^{s_l} \sum\limits_{j=1}^{s_l+1} \left( \Theta_{ji}^{(l)} \right)^2$

这个看起来复杂很多的代价函数背后的思想还是一样的,我们希望通过代价函数来观察算法预测的结果与真实情况的误差有多大,唯一不同的是,对于每一行特征,我们都会给出$K$个预测,基本上我们可以利用循环,对每一行特征都预测$K$个不同结果,然后在利用循环在$K$个预测中选择可能性最高的一个,将其与$y$中的实际数据进行比较。

Expand Down Expand Up @@ -81,7 +81,7 @@ $ D_{ij}^{(l)} :=\frac{1}{m}\Delta_{ij}^{(l)}+\lambda\Theta_{ij}^{(l)}$

$ D_{ij}^{(l)} :=\frac{1}{m}\Delta_{ij}^{(l)}$ ${if}\; j = 0$

在Octave 中,如果我们要使用 `fminuc`这样的优化算法来求解求出权重矩阵,我们需要将矩阵首先展开成为向量,在利用算法求出最优解后再重新转换回矩阵。
**Octave** 中,如果我们要使用 `fminuc`这样的优化算法来求解求出权重矩阵,我们需要将矩阵首先展开成为向量,在利用算法求出最优解后再重新转换回矩阵。

假设我们有三个权重矩阵,Theta1,Theta2 和 Theta3,尺寸分别为 10*11,10*11 和1*11,
下面的代码可以实现这样的转换:
Expand Down

0 comments on commit 7218aec

Please sign in to comment.