Skip to content

Commit

Permalink
Merge pull request #1 from datawhalechina/master
Browse files Browse the repository at this point in the history
update
  • Loading branch information
LeoLRH authored Apr 7, 2019
2 parents 3fdbab1 + 50c8f94 commit aa839e9
Show file tree
Hide file tree
Showing 7 changed files with 214 additions and 12 deletions.
33 changes: 26 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,13 +16,13 @@ https://datawhalechina.github.io/pumpkin-book/
- 第5章 [神经网络](https://datawhalechina.github.io/pumpkin-book/#/chapter5/chapter5)
- 第6章 [支持向量机](https://datawhalechina.github.io/pumpkin-book/#/chapter6/chapter6)
- 第7章 [贝叶斯分类器](https://datawhalechina.github.io/pumpkin-book/#/chapter7/chapter7)
- 第8章 集成学习
- 第8章 [集成学习](https://datawhalechina.github.io/pumpkin-book/#/chapter8/chapter8)
- 第9章 聚类
- 第10章 降维与度量学习
- 第11章 特征选择与稀疏学习
- 第12章 计算学习理论
- 第13章 半监督学习
- 第14章 概率图模型
- 第13章 [半监督学习](https://datawhalechina.github.io/pumpkin-book/#/chapter13/chapter13)
- 第14章 [概率图模型](https://datawhalechina.github.io/pumpkin-book/#/chapter14/chapter14)
- 第15章 规则学习
- 第16章 强化学习

Expand Down Expand Up @@ -61,14 +61,33 @@ pumpkin-book
### 公式全解文档规范:
```
## 公式编号
$$(公式的LaTeX表达式)$$
[推导]:(公式推导步骤) or [解析]:(公式解析说明)
## 附录
## 附录(可选)
(附录内容)
```
样例参见`docs/chapter2/chapter2.md``docs/chapter3/chapter3.md`

## 关注我们
例如:
<img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/example.png">


# 主要贡献者(按首字母排名)
[@awyd234](https://github.com/awyd234)
[@Heitao5200](https://github.com/Heitao5200)
[@juxiao](https://github.com/juxiao)
[@LongJH](https://github.com/LongJH)
[@LilRachel](https://github.com/LilRachel)
[@Majingmin](https://github.com/Majingmin)
[@spareribs](https://github.com/spareribs)
[@sunchaothu](https://github.com/sunchaothu)
[@StevenLzq](https://github.com/StevenLzq)
[@Sm1les](https://github.com/Sm1les)
[@Ye980226](https://github.com/Ye980226)

# 关注我们

<div align=center><img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/qrcode.jpeg" width = "250" height = "270"></div>

Expand Down
15 changes: 14 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,20 @@
> 版次:2016年1月第1版<br>
> 勘误表:http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/MLbook2016.htm
## 关注我们
# 主要贡献者(按首字母排名)
[@awyd234](https://github.com/awyd234)
[@Heitao5200](https://github.com/Heitao5200)
[@juxiao](https://github.com/juxiao)
[@LongJH](https://github.com/LongJH)
[@LilRachel](https://github.com/LilRachel)
[@Majingmin](https://github.com/Majingmin)
[@spareribs](https://github.com/spareribs)
[@sunchaothu](https://github.com/sunchaothu)
[@StevenLzq](https://github.com/StevenLzq)
[@Sm1les](https://github.com/Sm1les)
[@Ye980226](https://github.com/Ye980226)

# 关注我们

<div align=center><img src="https://raw.githubusercontent.com/datawhalechina/pumpkin-book/master/res/qrcode.jpeg" width = "250" height = "270"></div>

Expand Down
5 changes: 3 additions & 2 deletions docs/_sidebar.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@
- [第5章 神经网络](chapter5/chapter5.md)
- [第6章 支持向量机](chapter6/chapter6.md)
- [第7章 贝叶斯分类器](chapter7/chapter7.md)


- [第8章 集成学习](chapter8/chapter8.md)
- [第13章 半监督学习](chapter13/chapter13.md)
- [第14章 概率图模型](chapter14/chapter14.md)
131 changes: 131 additions & 0 deletions docs/chapter14/chapter14.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
## 14.26

$$p(x^t)T(x^{t-1}|x^t)=p(x^{t-1})T(x^t|x^{t-1})$$

[解析]:假设变量$x$所在的空间有$n$个状态($s_1,s_2,..,s_n$), 定义在该空间上的一个转移矩阵$T(n\times n)$如果满足一定的条件则该马尔可夫过程存在一个稳态分布$\pi$, 使得
$$
\begin{aligned}
\pi T=\pi
\end{aligned}
\tag{1}
$$
其中, $\pi$是一个是一个$n$维向量,代表​$s_1,s_2,..,s_n$对应的概率. 反过来, 如果我们希望采样得到符合某个分布​$\pi$的一系列变量​$x_1,x_2,..,x_t$, 应当采用哪一个转移矩阵​$T(n\times n)​$呢?

事实上,转移矩阵只需要满足马尔可夫细致平稳条件
$$
\begin{aligned}
\pi (i)T(i,j)=\pi (j)T(j,i)
\end{aligned}
\tag{2}
$$
即公式$14.26​$,这里采用的符号与西瓜书略有区别以便于理解. 证明如下
$$
\begin{aligned}
\pi T(j) = \sum _i \pi (i)T(i,j) = \sum _i \pi (j)T(j,i) = \pi(j)
\end{aligned}
\tag{3}
$$
假设采样得到的序列为$x_1,x_2,..,x_{t-1},x_t​$,则可以使用$MH​$算法来使得$x_{t-1}​$(假设为状态$s_i​$)转移到$x_t​$(假设为状态$s_j​$)的概率满足式$(2)​$.

## 14.28

$$A(x^* | x^{t-1}) = \min\left ( 1,\frac{p(x^*)Q(x^{t-1} | x^*) }{p(x^{t-1})Q(x^* | x^{t-1})} \right )$$

[推导]:这个公式其实是拒绝采样的一个trick,因为基于式$14.27​$只需要
$$
\begin{aligned}
A(x^* | x^{t-1}) &= p(x^*)Q(x^{t-1} | x^*) \\
A(x^{t-1} | x^*) &= p(x^{t-1})Q(x^* | x^{t-1})
\end{aligned}
\tag{4}
$$
即可满足式$14.26$,但是实际上等号右边的数值可能比较小,比如各为0.1和0.2,那么好不容易才到的样本只有百分之十几得到利用,所以不妨将接受率设为0.5和1,则细致平稳分布条件依然满足,样本利用率大大提高, 所以可以将$(4)$改进为
$$
\begin{aligned}
A(x^* | x^{t-1}) &= \frac{p(x^*)Q(x^{t-1} | x^*)}{norm} \\
A(x^{t-1} | x^*) &= \frac{p(x^{t-1})Q(x^* | x^{t-1}) }{norm}
\end{aligned}
\tag{5}
$$
其中
$$
\begin{aligned}
norm = \max\left (p(x^{t-1})Q(x^* | x^{t-1}),p(x^*)Q(x^{t-1} | x^*) \right )
\end{aligned}
\tag{6}
$$
即教材的$14.28​$.

## 14.32

$${\rm ln}p(x)=\mathcal{L}(q)+{\rm KL}(q \parallel p)$$

[推导]

根据条件概率公式$p(x,z)=p(z|x)*p(x)$,可以得到$p(x)=\frac{p(x,z)}{p(z|x)}$

然后两边同时作用${\rm ln}$函数,可得${\rm ln}p(x)={\rm ln}\frac{p(x,z)}{p(z|x)}$ (1)

因为$q(z)$是概率密度函数,所以$1=\int q(z)dz$

等式两边同时乘以${\rm ln}p(x)$,因为${\rm ln}p(x)$是不关于变量$z$的函数,所以${\rm ln}p(x)$可以拿进积分里面,得到${\rm ln}p(x)=\int q(z){\rm ln}p(x)dz$
$$
\begin{align}
{\rm ln}p(x)&=\int q(z){\rm ln}p(x) \\
&=\int q(z){\rm ln}\frac{p(x,z)}{p(z|x)}\qquad(带入公式(1))\\
&=\int q(z){\rm ln}\bigg\{\frac{p(x,z)}{q(z)}\cdot\frac{q(z)}{p(z|x)}\bigg\} \\
&=\int q(z)\bigg({\rm ln}\frac{p(x,z)}{q(z)}-{\rm ln}\frac{p(z|x)}{q(z)}\bigg) \\
&=\int q(z){\rm ln}\bigg\{\frac{p(x,z)}{q(z)}\bigg\}-\int q(z){\rm ln}\frac{p(z|x)}{q(z)} \\
&=\mathcal{L}(q)+{\rm KL}(q \parallel p)\qquad(根据\mathcal{L}和{\rm KL}的定义)
\end{align}
$$


## 14.36

$$
\begin{align}
\mathcal{L}(q)&=\int \prod_{i}q_{i}\bigg\{ {\rm ln}p({\rm \mathbf{x},\mathbf{z}})-\sum_{i}{\rm ln}q_{i}\bigg\}d{\rm\mathbf{z}} \\
&=\int q_{j}\bigg\{\int p(x,z)\prod_{i\ne j}q_{i}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{j}}}-\int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_{j}}}+{\rm const} \\
&=\int q_{j}{\rm ln}\tilde{p}({\rm \mathbf{x},\mathbf{z_{j}}})d{\rm\mathbf{z_{j}}}-\int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_{j}}}+{\rm const}
\end{align}
$$

[推导]
$$
\mathcal{L}(q)=\int \prod_{i}q_{i}\bigg\{ {\rm ln}p({\rm \mathbf{x},\mathbf{z}})-\sum_{i}{\rm ln}q_{i}\bigg\}d{\rm\mathbf{z}}=\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}}-\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}
$$
公式可以看做两个积分相减,我们先来看左边积分$\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}}$的推导。
$$
\begin{align}
\int\prod_{i}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}} &= \int q_{j}\prod_{i\ne j}q_{i}{\rm ln}p({\rm \mathbf{x},\mathbf{z}})d{\rm\mathbf{z}} \\
&= \int q_{j}\bigg\{\int{\rm ln}p({\rm \mathbf{x},\mathbf{z}})\prod_{i\ne j}q_{i}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{j}}}\qquad (先对{\rm\mathbf{z_{j}}}求积分,再对{\rm\mathbf{z_{i}}}求积分)
\end{align}
$$
这个就是教材中的$14.36$左边的积分部分。

我们现在看下右边积分的推导$\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}$的推导。

在此之前我们看下$\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}$的计算
$$
\begin{align}
\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}&= \int q_{i^{\prime}}\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}\qquad (选取一个变量q_{i^{\prime}}, i^{\prime}\ne k) \\
&=\int q_{i^{\prime}}\bigg\{\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}}\bigg\}d{\rm\mathbf{z_{i^{\prime}}}}
\end{align}
$$
$\bigg\{\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}}\bigg\}$部分与变量$q_{i^{\prime}}$无关,所以可以拿到积分外面。又因为$\int q_{i^{\prime}}d{\rm\mathbf{z_{i^{\prime}}}}=1$,所以
$$
\begin{align}
\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}}&=\int\prod_{i\ne i^{\prime}}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z_{i}}} \\
&= \int q_{k}{\rm ln}q_{k}d{\rm\mathbf{z_k}}\qquad (所有k以外的变量都可以通过上面的方式消除)
\end{align}
$$
有了这个结论,我们再来看公式
$$
\begin{align}
\int\prod_{i}q_{i}\sum_{i}{\rm ln}q_{i}d{\rm\mathbf{z}}&= \int\prod_{i}q_{i}{\rm ln}q_{j}d{\rm\mathbf{z}} + \sum_{k\ne j}\int\prod_{i}q_{i}{\rm ln}q_{k}d{\rm\mathbf{z}} \\
&= \int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_j}} + \sum_{z\ne j}\int q_{k}{\rm ln}q_{k}d{\rm\mathbf{z_k}}\qquad (根据上面结论) \\
&= \int q_{j}{\rm ln}q_{j}d{\rm\mathbf{z_j}} + {\rm const} \qquad (这里我们关心的是q_{j},其他变量可以视为{\rm const})
\end{align}
$$
这个就是$14.36$右边的积分部分。
25 changes: 23 additions & 2 deletions docs/chapter3/chapter3.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ $$
& = \cfrac{\sum_{i=1}^{m}(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^{m}(x_i-\bar{x})^2}
\end{aligned}
$$
若令$ \mathbf{X}=(x_1,x_2,...,x_m) $,$\mathbf{X}_{demean}$为去均值后的$ \mathbf{X} $,$ \mathbf{y}=(y_1,y_2,...,y_m) $,$ \mathbf{y}_{demean} $为去均值后的$ \mathbf{y} $,其中$ \mathbf{X} $、$ \mathbf{X}_{demean} $、$ \mathbf{y} $、$ \mathbf{y}_{demean} $均为m行1列的列向量,代入上式可得:
$$ w=\cfrac{\mathbf{X}_{demean}\mathbf{y}_{demean}^T}{\mathbf{X}_{demean}\mathbf{X}_{demean}^T}$$
若令$ \boldsymbol{x}=(x_1,x_2,...,x_m) $,$ \boldsymbol{x}_{d} $为去均值后的$ \boldsymbol{x} $,$ \boldsymbol{y}=(y_1,y_2,...,y_m) $,$ \boldsymbol{y}_{d} $为去均值后的$ \boldsymbol{y} $,其中$ \boldsymbol{x} $、$ \boldsymbol{x}_{d} $、$ \boldsymbol{y} $、$ \boldsymbol{y}_{d} $均为m行1列的列向量,代入上式可得:
$$ w=\cfrac{\boldsymbol{y}_{d}^T\boldsymbol{x}_{d}}{\boldsymbol{x}_d^T\boldsymbol{x}_{d}}$$
## 3.10

$$ \cfrac{\partial E_{\hat{w}}}{\partial \hat{w}}=2\mathbf{X}^T(\mathbf{X}\hat{w}-\mathbf{y}) $$
Expand Down Expand Up @@ -107,3 +107,24 @@ $$\begin{aligned}
又$\boldsymbol S_b=\boldsymbol S_b^T,\boldsymbol S_w=\boldsymbol S_w^T$,则:
$$\cfrac{\partial l(\boldsymbol w)}{\partial \boldsymbol w} = -2\boldsymbol S_b\boldsymbol w+2\lambda\boldsymbol S_w\boldsymbol w$$
令导函数等于0即可得式3.37。

## 3.43

$$\begin{aligned}
\boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
&= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
\end{aligned}$$
[推导]:由式3.40、3.41、3.42可得:
$$\begin{aligned}
\boldsymbol S_b &= \boldsymbol S_t - \boldsymbol S_w \\
&= \sum_{i=1}^m(\boldsymbol x_i-\boldsymbol\mu)(\boldsymbol x_i-\boldsymbol\mu)^T-\sum_{i=1}^N\sum_{\boldsymbol x\in X_i}(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T \\
&= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x-\boldsymbol\mu)^T-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x-\boldsymbol\mu_i)^T\right)\right) \\
&= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left((\boldsymbol x-\boldsymbol\mu)(\boldsymbol x^T-\boldsymbol\mu^T)-(\boldsymbol x-\boldsymbol\mu_i)(\boldsymbol x^T-\boldsymbol\mu_i^T)\right)\right) \\
&= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(\boldsymbol x\boldsymbol x^T - \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T-\boldsymbol x\boldsymbol x^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
&= \sum_{i=1}^N\left(\sum_{\boldsymbol x\in X_i}\left(- \boldsymbol x\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol x^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol x\boldsymbol\mu_i^T+\boldsymbol\mu_i\boldsymbol x^T-\boldsymbol\mu_i\boldsymbol\mu_i^T\right)\right) \\
&= \sum_{i=1}^N\left(-\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol x^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu\boldsymbol\mu^T+\sum_{\boldsymbol x\in X_i}\boldsymbol x\boldsymbol\mu_i^T+\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol x^T-\sum_{\boldsymbol x\in X_i}\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
&= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T-m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
&= \sum_{i=1}^N\left(-m_i\boldsymbol\mu_i\boldsymbol\mu^T-m_i\boldsymbol\mu\boldsymbol\mu_i^T+m_i\boldsymbol\mu\boldsymbol\mu^T+m_i\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
&= \sum_{i=1}^Nm_i\left(-\boldsymbol\mu_i\boldsymbol\mu^T-\boldsymbol\mu\boldsymbol\mu_i^T+\boldsymbol\mu\boldsymbol\mu^T+\boldsymbol\mu_i\boldsymbol\mu_i^T\right) \\
&= \sum_{i=1}^N m_i(\boldsymbol\mu_i-\boldsymbol\mu)(\boldsymbol\mu_i-\boldsymbol\mu)^T
\end{aligned}$$
17 changes: 17 additions & 0 deletions docs/chapter8/chapter8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
## 8.3
$$
\begin{aligned} P(H(\boldsymbol{x}) \neq f(\boldsymbol{x})) &=\sum_{k=0}^{\lfloor T / 2\rfloor} \left( \begin{array}{c}{T} \\ {k}\end{array}\right)(1-\epsilon)^{k} \epsilon^{T-k} \\ & \leqslant \exp \left(-\frac{1}{2} T(1-2 \epsilon)^{2}\right) \end{aligned}
$$
[推导]:由基分类器相互独立,设X为T个基分类器分类正确的次数,因此$\mathrm{X} \sim \mathrm{B}(\mathrm{T}, 1-\mathrm{\epsilon})$
$$
\begin{aligned} P(H(x) \neq f(x))=& P(X \leq\lfloor T / 2\rfloor) \\ & \leqslant P(X \leq T / 2)
\\ & =P\left[X-(1-\varepsilon) T \leqslant \frac{T}{2}-(1-\varepsilon) T\right]
\\ & =P\left[X-
(1-\varepsilon) T \leqslant -\frac{T}{2}\left(1-2\varepsilon\right)]\right]
\end{aligned}
$$
根据Hoeffding不等式$P(X-(1-\epsilon)T\leqslant -kT) \leq \exp (-2k^2T)$
令$k=\frac {(1-2\epsilon)}{2}$得
$$
\begin{aligned} P(H(\boldsymbol{x}) \neq f(\boldsymbol{x})) &=\sum_{k=0}^{\lfloor T / 2\rfloor} \left( \begin{array}{c}{T} \\ {k}\end{array}\right)(1-\epsilon)^{k} \epsilon^{T-k} \\ & \leqslant \exp \left(-\frac{1}{2} T(1-2 \epsilon)^{2}\right) \end{aligned}
$$
Binary file added res/example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit aa839e9

Please sign in to comment.