-
Notifications
You must be signed in to change notification settings - Fork 242
/
Copy pathStanford机器学习课程笔记1-监督学习.html
470 lines (433 loc) · 27.5 KB
/
Stanford机器学习课程笔记1-监督学习.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
margin: 0; padding: 0; vertical-align: baseline; border: none; }
table.sourceCode { width: 100%; line-height: 100%; }
td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
td.sourceCode { padding-left: 5px; }
code > span.kw { color: #007020; font-weight: bold; }
code > span.dt { color: #902000; }
code > span.dv { color: #40a070; }
code > span.bn { color: #40a070; }
code > span.fl { color: #40a070; }
code > span.ch { color: #4070a0; }
code > span.st { color: #4070a0; }
code > span.co { color: #60a0b0; font-style: italic; }
code > span.ot { color: #007020; }
code > span.al { color: #ff0000; font-weight: bold; }
code > span.fu { color: #06287e; }
code > span.er { color: #ff0000; font-weight: bold; }
</style>
<link rel="stylesheet" href="../stylesheets/Github.css" type="text/css" />
<title>Stanford机器学习课程笔记1-监督学习</title>
</head>
<body>
<div id="header"><center>
<p class="header_titleline">
<a href="../index.html" target="_self" title="主页">主页 </a><a href="../Search.html" target="_self" title="站内搜索">站内搜索 </a><a href="../Projects.html" target="_self" title="项目研究">项目研究 </a><a href="../Archives.html" target="_self" title="文章存档">文章存档 </a><a href="../README.html" target="_self" title="分类目录">分类目录 </a><a href="../AboutMe.html" target="_self" title="关于我">关于我 </a>
</p>
</center></div>
<h1>Stanford机器学习课程笔记1-监督学习</h1>
<h4>2015-04-08 / xiahouzuoxin</h4>
<h4>Tags: Maching Learning</h4>
转载请注明出处: <a href="http://xiahouzuoxin.github.io/notes/">http://xiahouzuoxin.github.io/notes/</a>
<div id="TOC">
<ul>
<li><a href="#课程计划">课程计划</a></li>
<li><a href="#linear-regression与预测问题">Linear Regression与预测问题</a><ul>
<li><a href="#locally-weighted-linear-regression">Locally Weighted Linear Regression</a></li>
</ul></li>
<li><a href="#logistic-regression与分类问题">Logistic Regression与分类问题</a><ul>
<li><a href="#特征映射与过拟合over-fitting">特征映射与过拟合(over-fitting)</a></li>
</ul></li>
</ul>
</div>
<!---title:Stanford机器学习课程笔记1-监督学习-->
<!---keywords:Maching Learning-->
<!---date:2015-04-08-->
<p>Stanford机器学习课程的主页是: <a href="http://cs229.stanford.edu/" class="uri">http://cs229.stanford.edu/</a></p>
<h2 id="课程计划">课程计划</h2>
<p>主讲人Andrew Ng是机器学习界的大牛,创办最大的公开课网站<a href="www.coursera.org">coursera</a>,前段时间还听说加入了百度。他讲的机器学习课程可谓每个学计算机的人必看。整个课程的大纲大致如下:</p>
<ol style="list-style-type: decimal">
<li>Introduction (1 class) Basic concepts.</li>
<li>Supervised learning. (6 classes) Supervised learning setup. LMS. Logistic regression. Perceptron. Exponential family. Generative learning algorithms. Gaussian discriminant analysis.<br />Naive Bayes. Support vector machines. Model selection and feature selection. Ensemble methods: Bagging, boosting, ECOC.</li>
<li>Learning theory. (3 classes) Bias/variance tradeoff. Union and Chernoff/Hoeffding bounds. VC dimension. Worst case (online) learning. Advice on using learning algorithms.</li>
<li>Unsupervised learning. (5 classes) Clustering. K-means. EM. Mixture of Gaussians. Factor analysis. PCA. MDS. pPCA. Independent components analysis (ICA).<br /></li>
<li>Reinforcement learning and control. (4 classes) MDPs. Bellman equations. Value iteration. Policy iteration. Linear quadratic regulation (LQR). LQG. Q-learning. Value function approximation. Policy search. Reinforce. POMDPs.</li>
</ol>
<p>本笔记主要是关于Linear Regression和Logistic Regression部分的学习实践记录。</p>
<h2 id="linear-regression与预测问题">Linear Regression与预测问题</h2>
<p>举了一个房价预测的例子,</p>
<table>
<thead>
<tr class="header">
<th align="left">Area(feet^2)</th>
<th align="left">#bedrooms</th>
<th align="left">Price(1000$)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">2014</td>
<td align="left">3</td>
<td align="left">400</td>
</tr>
<tr class="even">
<td align="left">1600</td>
<td align="left">3</td>
<td align="left">330</td>
</tr>
<tr class="odd">
<td align="left">2400</td>
<td align="left">3</td>
<td align="left">369</td>
</tr>
<tr class="even">
<td align="left">1416</td>
<td align="left">2</td>
<td align="left">232</td>
</tr>
<tr class="odd">
<td align="left">3000</td>
<td align="left">4</td>
<td align="left">540</td>
</tr>
<tr class="even">
<td align="left">3670</td>
<td align="left">4</td>
<td align="left">620</td>
</tr>
<tr class="odd">
<td align="left">4500</td>
<td align="left">5</td>
<td align="left">800</td>
</tr>
</tbody>
</table>
<p>Assume:房价与“面积和卧室数量”是线性关系,用线性关系进行放假预测。因而给出线性模型, <span class="math"><em>h</em><sub><em>θ</em></sub>(<em>x</em>) = ∑<em>θ</em><sup><em>T</em></sup><em>x</em></span> ,其中 <span class="math"><em>x</em> = [<em>x</em><sub>1</sub>, <em>x</em><sub>2</sub>]</span> , 分别对应面积和卧室数量。 为得到预测模型,就应该根据表中已有的数据拟合得到参数 <span class="math"><em>θ</em></span> 的值。课程通过从概率角度进行解释(主要用到了大数定律即“线性拟合模型的误差满足高斯分布”的假设,通过最大似然求导就能得到下面的表达式)为什么应该求解如下的最小二乘表达式来达到求解参数的目的,</p>
<p><img src="http://latex.codecogs.com/gif.latex? J(\theta)=\frac{1}{2}\sum_{i=1}^{m}(y_i-h_{\theta}(x_i))^2"></p>
<p>上述 <span class="math"><em>J</em>(<em>θ</em>)</span> 称为cost function, 通过 <span class="math"><em>m</em><em>i</em><em>n</em><em>J</em>(<em>θ</em>)</span> 即可得到拟合模型的参数。</p>
<p>解 <span class="math"><em>m</em><em>i</em><em>n</em><em>J</em>(<em>θ</em>)</span> 的方法有多种, 包括Gradient descent algorithm和Newton's method,这两种都是运筹学的数值计算方法,非常适合计算机运算,这两种算法不仅适合这里的线性回归模型,对于非线性模型如下面的Logistic模型也适用。除此之外,Andrew Ng还通过线性代数推导了最小均方的算法的闭合数学形式,</p>
<p><img src="http://latex.codecogs.com/gif.latex? \theta=(X^TX)^{-1}X^T\bold{y}"></p>
<p>Gradient descent algorithm执行过程中又分两种方法:batch gradient descent和stochastic gradient descent。batch gradient descent每次更新 <span class="math"><em>θ</em></span> 都用到所有的样本数据,而stochastic gradient descent每次更新则都仅用单个的样本数据。两者更新过程如下:</p>
<ol style="list-style-type: decimal">
<li><p>batch gradient descent</p>
<p><img src="http://latex.codecogs.com/gif.latex? \theta_j:=\theta_j+\alpha\sum_{i=1}^{m}(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}"></p></li>
<li><p>stochastic gradient descent</p>
<p>for i=1 to m</p>
<p><img src="http://latex.codecogs.com/gif.latex? \theta_j:=\theta_j+\alpha(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}"></p></li>
</ol>
<p>两者只不过一个将样本放在了for循环上,一者放在了。事实证明,只要选择合适的学习率 <span class="math"><em>α</em></span> , Gradient descent algorithm总是能收敛到一个接近最优值的值。学习率选择过大可能造成cost function的发散,选择太小,收敛速度会变慢。</p>
<p>关于收敛条件,Andrew Ng没有在课程中提到更多,我给出两种收敛准则:</p>
<ol style="list-style-type: decimal">
<li>J小于一定值收敛</li>
<li>两次迭代之间的J差值,即|J-J_pre|<一定值则收敛</li>
</ol>
<p>下面是使用batch gradient descent拟合上面房价问题的例子,</p>
<pre class="sourceCode matlab"><code class="sourceCode matlab">clear all;
clc
<span class="co">%% 原数据</span>
x = [<span class="fl">2014</span>, <span class="fl">1600</span>, <span class="fl">2400</span>, <span class="fl">1416</span>, <span class="fl">3000</span>, <span class="fl">3670</span>, <span class="fl">4500</span>;...
<span class="fl">3</span>,<span class="fl">3</span>,<span class="fl">3</span>,<span class="fl">2</span>,<span class="fl">4</span>,<span class="fl">4</span>,<span class="fl">5</span>;];
y = [<span class="fl">400</span> <span class="fl">330</span> <span class="fl">369</span> <span class="fl">232</span> <span class="fl">540</span> <span class="fl">620</span> <span class="fl">800</span>];
error = Inf;
threshold = <span class="fl">4300</span>;
alpha = <span class="fl">10</span>^(-<span class="fl">10</span>);
x = [zeros(<span class="fl">1</span>,size(x,<span class="fl">2</span>)); x]; <span class="co">% x0 = 0,拟合常数项</span>
theta = [<span class="fl">0</span>;<span class="fl">0</span>;<span class="fl">0</span>]; <span class="co">% 常数项为0</span>
J = <span class="fl">1</span>/<span class="fl">2</span>*sum((y-theta'*x).^<span class="fl">2</span>);
costs = [];
while error > threshold
tmp = y-theta'*x;
theta(<span class="fl">1</span>) = theta(<span class="fl">1</span>) + alpha*sum(tmp.*x(<span class="fl">1</span>,:));
theta(<span class="fl">2</span>) = theta(<span class="fl">2</span>) + alpha*sum(tmp.*x(<span class="fl">2</span>,:));
theta(<span class="fl">3</span>) = theta(<span class="fl">3</span>) + alpha*sum(tmp.*x(<span class="fl">3</span>,:));
<span class="co">% J_last = J;</span>
J = <span class="fl">1</span>/<span class="fl">2</span>*sum((y-theta'*x).^<span class="fl">2</span>);
<span class="co">% error = abs(J-J_last);</span>
error = J;
costs =[costs, error];
end
<span class="co">%% 绘制</span>
figure,
subplot(<span class="fl">211</span>);
plot3(x(<span class="fl">2</span>,:),x(<span class="fl">3</span>,:),y, <span class="st">'*'</span>);
grid on;
xlabel(<span class="st">'Area'</span>);
ylabel(<span class="st">'#bedrooms'</span>);
zlabel(<span class="st">'Price(1000$)'</span>);
hold on;
H = theta'*x;
plot3(x(<span class="fl">2</span>,:),x(<span class="fl">3</span>,:),H,<span class="st">'r*'</span>);
hold on
hx(<span class="fl">1</span>,:) = zeros(<span class="fl">1</span>,<span class="fl">20</span>);
hx(<span class="fl">2</span>,:) = <span class="fl">1000</span>:<span class="fl">200</span>:<span class="fl">4800</span>;
hx(<span class="fl">3</span>,:) = <span class="fl">1</span>:<span class="fl">20</span>;
[X,Y] = meshgrid(hx(<span class="fl">2</span>,:), hx(<span class="fl">3</span>,:));
H = theta(<span class="fl">2</span>:<span class="fl">3</span>)'*[X(:)'; Y(:)'];
H = reshape(H,[<span class="fl">20</span>,<span class="fl">20</span>]);
mesh(X,Y,H);
legend(<span class="st">'原数据'</span>, <span class="st">'对原数据的拟合'</span>, <span class="st">'拟合平面'</span>);
subplot(<span class="fl">212</span>);
plot(costs, <span class="st">'.-'</span>);
grid on
title(<span class="st">'error=J(\theta)的迭代收敛过程'</span>);
xlabel(<span class="st">'迭代次数'</span>);
ylabel(<span class="st">'J(\theta)'</span>);</code></pre>
<p>拟合及收敛过程如下:</p>
<div class="figure">
<img src="../images/Stanford机器学习课程笔记1-监督学习/GradientDescentAlgorithm.png" />
</div>
<p>不管是梯度下降,还是线性回归模型,都是工具!!分析结果更重要,从上面的拟合平面可以看到,影响房价的主要因素是面积而非卧室数量。</p>
<p>很多情况下,模型并不是线性的,比如股票随时间的涨跌过程。这种情况下, <span class="math"><em>h</em><sub><em>θ</em></sub>(<em>x</em>) = <em>θ</em><sup><em>T</em></sup><em>x</em></span> 的假设不再成立。此时,有两种方案:</p>
<ol style="list-style-type: decimal">
<li>建立一个非线性的模型,比如指数或者其它的符合数据变化的模型</li>
<li>局部线性模型,对每段数据进行局部建立线性模型。Andrew Ng课堂上讲解了Locally Weighted Linear Regression,即局部加权的线性模型</li>
</ol>
<h3 id="locally-weighted-linear-regression">Locally Weighted Linear Regression</h3>
<p><img src="http://latex.codecogs.com/gif.latex? J(\theta)=\frac{1}{2}\sum_{i=1}^{m}w^{(i)}(y^{(i)}-h_{\theta}(x^{(i)}))^2"></p>
<p>其中权值的一种好的选择方式是:</p>
<p><img src="http://latex.codecogs.com/gif.latex? w^{(i)}=\bold{exp}(-\frac{(x^{(i)}-x)^2}{2\tau^2})"></p>
<h2 id="logistic-regression与分类问题">Logistic Regression与分类问题</h2>
<p>Linear Regression解决的是连续的预测和拟合问题,而Logistic Regression解决的是离散的分类问题。两种方式,但本质殊途同归,两者都可以算是指数函数族的特例。</p>
<p>在分类问题中,y取值在{0,1}之间,因此,上述的Liear Regression显然不适应。修改模型如下</p>
<p><img src="http://latex.codecogs.com/gif.latex? h_{\theta}(x)=g(\theta^Tx)=\frac{1}{1+\bold{e}^{-\theta^Tx}}"></p>
<p>该模型称为Logistic函数或Sigmoid函数。为什么选择该函数,我们看看这个函数的图形就知道了,</p>
<div class="figure">
<img src="../images/Stanford机器学习课程笔记1-监督学习/Sigmoid.png" />
</div>
<p>Sigmoid函数范围在[0,1]之间,参数 <span class="math"><em>θ</em></span> 只不过控制曲线的陡峭程度。以0.5为截点,>0.5则y值为1,< 0.5则y值为0,这样就实现了两类分类的效果。</p>
<p>假设 <span class="math"><em>P</em>(<em>y</em> = 1|<em>x</em>; <em>θ</em>) = <em>h</em><sub><em>θ</em></sub>(<em>x</em>)</span> , <span class="math"><em>P</em>(<em>y</em> = 0|<em>x</em>; <em>θ</em>) = 1 − <em>h</em><sub><em>θ</em></sub>(<em>x</em>)</span> , 写得更紧凑一些,</p>
<p><img src="http://latex.codecogs.com/gif.latex? P(y|x;\theta)=(h_{\theta}(x))^y(1-h_{\theta}(x))^{1-y}"></p>
<p>对m个训练样本,使其似然函数最大,则有</p>
<p><img src="http://latex.codecogs.com/gif.latex? \bold{max}L(\theta)=\bold{max}\prod_{i=1}{m}(h_{\theta}(x^{(i)}))^y^{(i)}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}}"></p>
<p>同样的可以用梯度下降法求解上述的最大值问题,只要将最大值求解转化为求最小值,则迭代公式一模一样,</p>
<p><img src="http://latex.codecogs.com/gif.latex? \bold{min}J(\theta)=\bold{min}\{-\log\bold{L}(\theta)\}"></p>
<p>最后的梯度下降方式和Linear Regression一致。我做了个例子(<a href="../enclosure/Stanford机器学习课程笔记1-监督学习/LogisticInput.txt">数据集链接</a>),下面是Logistic的Matlab代码,</p>
<pre><code>function Logistic
clear all;
close all
clc
data = load('LogisticInput.txt');
x = data(:,1:2);
y = data(:,3);
% Plot Original Data
figure,
positive = find(y==1);
negtive = find(y==0);
hold on
plot(x(positive,1), x(positive,2), 'k+', 'LineWidth',2, 'MarkerSize', 7);
plot(x(negtive,1), x(negtive,2), 'bo', 'LineWidth',2, 'MarkerSize', 7);
% Compute Likelihood(Cost) Function
[m,n] = size(x);
x = [ones(m,1) x];
theta = zeros(n+1, 1);
[cost, grad] = cost_func(theta, x, y);
threshold = 0.1;
alpha = 10^(-1);
costs = [];
while cost > threshold
theta = theta + alpha * grad;
[cost, grad] = cost_func(theta, x, y);
costs = [costs cost];
end
% Plot Decision Boundary
hold on
plot_x = [min(x(:,2))-2,max(x(:,2))+2];
plot_y = (-1./theta(3)).*(theta(2).*plot_x + theta(1));
plot(plot_x, plot_y, 'r-');
legend('Positive', 'Negtive', 'Decision Boundary')
xlabel('Feature Dim1');
ylabel('Feature Dim2');
title('Classifaction Using Logistic Regression');
% Plot Costs Iteration
figure,
plot(costs, '*');
title('Cost Function Iteration');
xlabel('Iterations');
ylabel('Cost Fucntion Value');
end
function g=sigmoid(z)
g = 1.0 ./ (1.0+exp(-z));
end
function [J,grad] = cost_func(theta, X, y)
% Computer Likelihood Function and Gradient
m = length(y); % training examples
hx = sigmoid(X*theta);
J = (1./m)*sum(-y.*log(hx)-(1.0-y).*log(1.0-hx));
grad = (1./m) .* X' * (y-hx);
end</code></pre>
<div class="figure">
<img src="../images/Stanford机器学习课程笔记1-监督学习/LogisticRegression.png" />
</div>
<p>判决边界(Decesion Boundary)的计算是令h(x)=0.5得到的。当输入新的数据,计算h(x):h(x)>0.5为正样本所属的类,h(x)< 0.5 为负样本所属的类。</p>
<h3 id="特征映射与过拟合over-fitting">特征映射与过拟合(over-fitting)</h3>
<p>这部分在Andrew Ng课堂上没有讲,参考了网络上的资料。</p>
<p>上面的数据可以通过直线进行划分,但实际中存在那么种情况,无法直接使用直线判决边界(看后面的例子)。</p>
<p>为解决上述问题,必须将特征映射到高维,然后通过非直线判决界面进行划分。特征映射的方法将已有的特征进行多项式组合,形成更多特征,</p>
<p><img src="http://latex.codecogs.com/gif.latex? mapFeature=\left[\begin{array}{c}1 \\ x_1 \\ x_2 \\ x_1^2 \\ x_1x_2 \\ x_2^2 \end{array}\right]"></p>
<p>上面将二维特征映射到了2阶(还可以映射到更高阶),这便于形成非线性的判决边界。</p>
<p>但还存在问题,尽管上面方法便于对非线性的数据进行划分,但也容易由于高维特性造成过拟合。因此,引入泛化项应对过拟合问题。似然函数添加泛化项后变成,</p>
<p><img src="http://latex.codecogs.com/gif.latex? J(\theta)=\sum_{i=1}^{m}[-y^{(i)}\log h(x^{(i)})-(1-y^{(i)})\log(1-h(x^{(i)}))]+\frac{\lambda}{2}\sum_{j=1}^n\theta_j"></p>
<p>此时梯度下降算法发生改变,</p>
<p><img src="http://latex.codecogs.com/gif.latex? \theta_j=\theta_j+\alpha\left[\sum_{i=1}^{m}(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}-\lambda\theta_j\right]"></p>
<p>最后来个例子,<a href="../enclosure/Stanford机器学习课程笔记1-监督学习/ex2data2.txt">样本数据链接</a>,对应的含泛化项和特征映射的matlab分类代码如下:</p>
<pre class="sourceCode matlab"><code class="sourceCode matlab">function LogisticEx2
clear all;
close all
clc
data = load(<span class="st">'ex2data2.txt'</span>);
x = data(:,<span class="fl">1</span>:<span class="fl">2</span>);
y = data(:,<span class="fl">3</span>);
<span class="co">% Plot Original Data</span>
figure,
positive = find(y==<span class="fl">1</span>);
negtive = find(y==<span class="fl">0</span>);
subplot(<span class="fl">1</span>,<span class="fl">2</span>,<span class="fl">1</span>);
hold on
plot(x(positive,<span class="fl">1</span>), x(positive,<span class="fl">2</span>), <span class="st">'k+'</span>, <span class="st">'LineWidth'</span>,<span class="fl">2</span>, <span class="st">'MarkerSize'</span>, <span class="fl">7</span>);
plot(x(negtive,<span class="fl">1</span>), x(negtive,<span class="fl">2</span>), <span class="st">'bo'</span>, <span class="st">'LineWidth'</span>,<span class="fl">2</span>, <span class="st">'MarkerSize'</span>, <span class="fl">7</span>);
<span class="co">% Compute Likelihood(Cost) Function</span>
[m,n] = size(x);
x = mapFeature(x);
theta = zeros(size(x,<span class="fl">2</span>), <span class="fl">1</span>);
lambda = <span class="fl">1</span>;
[cost, grad] = cost_func(theta, x, y, lambda);
threshold = <span class="fl">0.53</span>;
alpha = <span class="fl">10</span>^(-<span class="fl">1</span>);
costs = [];
while cost > threshold
theta = theta + alpha * grad;
[cost, grad] = cost_func(theta, x, y, lambda);
costs = [costs cost];
end
<span class="co">% Plot Decision Boundary </span>
hold on
plotDecisionBoundary(theta, x, y);
legend(<span class="st">'Positive'</span>, <span class="st">'Negtive'</span>, <span class="st">'Decision Boundary'</span>)
xlabel(<span class="st">'Feature Dim1'</span>);
ylabel(<span class="st">'Feature Dim2'</span>);
title(<span class="st">'Classifaction Using Logistic Regression'</span>);
<span class="co">% Plot Costs Iteration</span>
<span class="co">% figure,</span>
subplot(<span class="fl">1</span>,<span class="fl">2</span>,<span class="fl">2</span>);plot(costs, <span class="st">'*'</span>);
title(<span class="st">'Cost Function Iteration'</span>);
xlabel(<span class="st">'Iterations'</span>);
ylabel(<span class="st">'Cost Fucntion Value'</span>);
end
function f=mapFeature(x)
<span class="co">% Map features to high dimension</span>
degree = <span class="fl">6</span>;
f = ones(size(x(:,<span class="fl">1</span>)));
for i = <span class="fl">1</span>:degree
for j = <span class="fl">0</span>:i
f(:, end+<span class="fl">1</span>) = (x(:,<span class="fl">1</span>).^(i-j)).*(x(:,<span class="fl">2</span>).^j);
end
end
end
function g=sigmoid(z)
g = <span class="fl">1.0</span> ./ (<span class="fl">1.0</span>+exp(-z));
end
function [J,grad] = cost_func(theta, X, y, lambda)
<span class="co">% Computer Likelihood Function and Gradient</span>
m = length(y); <span class="co">% training examples</span>
hx = sigmoid(X*theta);
J = (<span class="fl">1</span>./m)*sum(-y.*log(hx)-(<span class="fl">1.0</span>-y).*log(<span class="fl">1.0</span>-hx)) + (lambda./(<span class="fl">2</span>*m)*norm(theta(<span class="fl">2</span>:end))^<span class="fl">2</span>);
regularize = (lambda/m).*theta;
regularize(<span class="fl">1</span>) = <span class="fl">0</span>;
grad = (<span class="fl">1</span>./m) .* X' * (y-hx) - regularize;
end
function plotDecisionBoundary(theta, X, y)
<span class="co">%PLOTDECISIONBOUNDARY Plots the data points X and y into a new figure with</span>
<span class="co">%the decision boundary defined by theta</span>
<span class="co">% PLOTDECISIONBOUNDARY(theta, X,y) plots the data points with + for the </span>
<span class="co">% positive examples and o for the negative examples. X is assumed to be </span>
<span class="co">% a either </span>
<span class="co">% 1) Mx3 matrix, where the first column is an all-ones column for the </span>
<span class="co">% intercept.</span>
<span class="co">% 2) MxN, N>3 matrix, where the first column is all-ones</span>
<span class="co">% Plot Data</span>
<span class="co">% plotData(X(:,2:3), y);</span>
hold on
if size(X, <span class="fl">2</span>) <= <span class="fl">3</span>
<span class="co">% Only need 2 points to define a line, so choose two endpoints</span>
plot_x = [min(X(:,<span class="fl">2</span>))-<span class="fl">2</span>, max(X(:,<span class="fl">2</span>))+<span class="fl">2</span>];
<span class="co">% Calculate the decision boundary line</span>
plot_y = (-<span class="fl">1</span>./theta(<span class="fl">3</span>)).*(theta(<span class="fl">2</span>).*plot_x + theta(<span class="fl">1</span>));
<span class="co">% Plot, and adjust axes for better viewing</span>
plot(plot_x, plot_y)
<span class="co">% Legend, specific for the exercise</span>
legend(<span class="st">'Admitted'</span>, <span class="st">'Not admitted'</span>, <span class="st">'Decision Boundary'</span>)
axis([<span class="fl">30</span>, <span class="fl">100</span>, <span class="fl">30</span>, <span class="fl">100</span>])
else
<span class="co">% Here is the grid range</span>
u = linspace(-<span class="fl">1</span>, <span class="fl">1.5</span>, <span class="fl">50</span>);
v = linspace(-<span class="fl">1</span>, <span class="fl">1.5</span>, <span class="fl">50</span>);
z = zeros(length(u), length(v));
<span class="co">% Evaluate z = theta*x over the grid</span>
for i = <span class="fl">1</span>:length(u)
for j = <span class="fl">1</span>:length(v)
z(i,j) = mapFeature([u(i), v(j)])*theta;
end
end
z = z'; <span class="co">% important to transpose z before calling contour</span>
<span class="co">% Plot z = 0</span>
<span class="co">% Notice you need to specify the range [0, 0]</span>
contour(u, v, z, [<span class="fl">0</span>, <span class="fl">0</span>], <span class="st">'LineWidth'</span>, <span class="fl">2</span>)
end
end</code></pre>
<div class="figure">
<img src="../images/Stanford机器学习课程笔记1-监督学习/NonlinearLogistic.png" />
</div>
<p>我们再回过头来看Logistic问题:对于非线性的问题,只不过使用了一个叫Sigmoid的非线性映射成一个线性问题。那么,除了Sigmoid函数,还有其它的函数可用吗?Andrew Ng老师还讲了指数函数族。</p>
<div class="ds-thread" data-thread-key="Stanford机器学习课程笔记1-监督学习" data-title="Stanford机器学习课程笔记1-监督学习" data-url="xiahouzuoxin.github.io/notes/html/Stanford机器学习课程笔记1-监督学习.html"></div>
<script>window._bd_share_config={"common":{"bdSnsKey":{},"bdText":"","bdMini":"2","bdMiniList":false,"bdPic":"","bdStyle":"0","bdSize":"16"},"slide":{"type":"slide","bdImg":"5","bdPos":"right","bdTop":"300"},"image":{"viewList":["qzone","tsina","tqq","renren","weixin"],"viewText":"分享到:","viewSize":"16"},"selectShare":{"bdContainerClass":null,"bdSelectMiniList":["qzone","tsina","tqq","renren","weixin"]}};with(document)0[(getElementsByTagName('head')[0]||body).appendChild(createElement('script')).src='http://bdimg.share.baidu.com/static/api/js/share.js?v=89860593.js?cdnversion='+~(-new Date()/36e5)];</script>
<!-- 多说公共JS代码 start (一个网页只需插入一次) -->
<script type="text/javascript">
var duoshuoQuery = {short_name:"xiahouzuoxin"};
(function() {
var ds = document.createElement('script');
ds.type = 'text/javascript';ds.async = true;
ds.src = (document.location.protocol == 'https:' ? 'https:' : 'http:') + '//static.duoshuo.com/embed.js';
ds.charset = 'UTF-8';
(document.getElementsByTagName('head')[0]
|| document.getElementsByTagName('body')[0]).appendChild(ds);
})();
</script>
<!-- 多说公共JS代码 end -->
<div id="footer">
<p class="footer_subline">联系邮箱: xiahouzuoxin@163.com</p>
<p class="footer_subline">声明: 本站所有文章如非特别说明均为原创,转载请注明出处!
<script type="text/javascript">var cnzz_protocol = (("https:" == document.location.protocol) ? " https://" : " http://");document.write(unescape("%3Cspan id='cnzz_stat_icon_1253219218'%3E%3C/span%3E%3Cscript src='" + cnzz_protocol + "s4.cnzz.com/z_stat.php%3Fid%3D1253219218%26show%3Dpic' type='text/javascript'%3E%3C/script%3E"));</script>
</p>
</div>
<!-- 回到顶部 -->
<script>
lastScrollY=0;
function heartBeat(){
var diffY;
if (document.documentElement && document.documentElement.scrollTop)
diffY = document.documentElement.scrollTop;
else if (document.body)
diffY = document.body.scrollTop
else
{/*Netscape stuff*/}
percent=.1*(diffY-lastScrollY);
if(percent>0)percent=Math.ceil(percent);
else percent=Math.floor(percent);
document.getElementById("full").style.top=parseInt(document.getElementById("full").style.top)+percent+"px";
lastScrollY=lastScrollY+percent;
}
suspendcode="<div id=\"full\" style='right:1px;POSITION:absolute;TOP:600px;z-index:100'><a onclick='window.scrollTo(0,0);'><img src='../images/top.png'></a><br></div>"
document.write(suspendcode);
window.setInterval("heartBeat()",1);
</script>
</body>
</html>