|
| 1 | +本文结构: |
| 2 | + |
| 3 | +- 为什么用双向 LSTM |
| 4 | +- 什么是双向 LSTM |
| 5 | +- 例子 |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +#### 为什么用双向 LSTM? |
| 10 | + |
| 11 | +单向的 RNN,是根据前面的信息推出后面的,但有时候只看前面的词是不够的, |
| 12 | +例如, |
| 13 | + |
| 14 | +我今天不舒服,我打算____一天。 |
| 15 | + |
| 16 | +只根据‘不舒服‘,可能推出我打算‘去医院‘,‘睡觉‘,‘请假‘等等,但如果加上后面的‘一天‘,能选择的范围就变小了,‘去医院‘这种就不能选了,而‘请假‘‘休息‘之类的被选择概率就会更大。 |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +#### 什么是双向 LSTM? |
| 21 | + |
| 22 | +双向卷积神经网络的隐藏层要保存两个值, A 参与正向计算, A' 参与反向计算。 |
| 23 | +最终的输出值 y 取决于 A 和 A': |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | +即正向计算时,隐藏层的 s_t 与 s_t-1 有关;反向计算时,隐藏层的 s_t 与 s_t+1 有关: |
| 28 | + |
| 29 | + |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +在某些任务中,双向的 lstm 要比单向的 lstm 的表现要好: |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +--- |
| 38 | + |
| 39 | +#### 例子 |
| 40 | + |
| 41 | +下面是一个 keras 实现的 双向LSTM 应用的小例子,任务是对序列进行分类, |
| 42 | +例如如下 10 个随机数: |
| 43 | + |
| 44 | +`0.63144003 0.29414551 0.91587952 0.95189228 0.32195638 0.60742236 0.83895793 0.18023048 0.84762691 0.29165514` |
| 45 | + |
| 46 | +累加值超过设定好的阈值时可标记为 1,否则为 0,例如阈值为 2.5,则上述输入的结果为: |
| 47 | + |
| 48 | +`0 0 0 1 1 1 1 1 1 1` |
| 49 | + |
| 50 | +和单向 LSTM 的区别是用到 Bidirectional: |
| 51 | +`model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_timesteps, 1)))` |
| 52 | + |
| 53 | + |
| 54 | +``` |
| 55 | +from random import random |
| 56 | +from numpy import array |
| 57 | +from numpy import cumsum |
| 58 | +from keras.models import Sequential |
| 59 | +from keras.layers import LSTM |
| 60 | +from keras.layers import Dense |
| 61 | +from keras.layers import TimeDistributed |
| 62 | +from keras.layers import Bidirectional |
| 63 | +
|
| 64 | +# create a sequence classification instance |
| 65 | +def get_sequence(n_timesteps): |
| 66 | + # create a sequence of random numbers in [0,1] |
| 67 | + X = array([random() for _ in range(n_timesteps)]) |
| 68 | + # calculate cut-off value to change class values |
| 69 | + limit = n_timesteps/4.0 |
| 70 | + # determine the class outcome for each item in cumulative sequence |
| 71 | + y = array([0 if x < limit else 1 for x in cumsum(X)]) |
| 72 | + # reshape input and output data to be suitable for LSTMs |
| 73 | + X = X.reshape(1, n_timesteps, 1) |
| 74 | + y = y.reshape(1, n_timesteps, 1) |
| 75 | + return X, y |
| 76 | +
|
| 77 | +# define problem properties |
| 78 | +n_timesteps = 10 |
| 79 | +
|
| 80 | +# define LSTM |
| 81 | +model = Sequential() |
| 82 | +model.add(Bidirectional(LSTM(20, return_sequences=True), input_shape=(n_timesteps, 1))) |
| 83 | +model.add(TimeDistributed(Dense(1, activation='sigmoid'))) |
| 84 | +model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc']) |
| 85 | +
|
| 86 | +# train LSTM |
| 87 | +for epoch in range(1000): |
| 88 | + # generate new random sequence |
| 89 | + X,y = get_sequence(n_timesteps) |
| 90 | + # fit model for one epoch on this sequence |
| 91 | + model.fit(X, y, epochs=1, batch_size=1, verbose=2) |
| 92 | + |
| 93 | +# evaluate LSTM |
| 94 | +X,y = get_sequence(n_timesteps) |
| 95 | +yhat = model.predict_classes(X, verbose=0) |
| 96 | +for i in range(n_timesteps): |
| 97 | + print('Expected:', y[0, i], 'Predicted', yhat[0, i]) |
| 98 | +
|
| 99 | +``` |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +学习资料: |
| 104 | +https://zybuluo.com/hanbingtao/note/541458 |
| 105 | +https://maxwell.ict.griffith.edu.au/spl/publications/papers/ieeesp97_schuster.pdf |
| 106 | +http://machinelearningmastery.com/develop-bidirectional-lstm-sequence-classification-python-keras/ |
| 107 | + |
| 108 | +--- |
| 109 | +推荐阅读 |
| 110 | +[历史技术博文链接汇总](http://blog.csdn.net/aliceyangxi1987/article/details/71911003) |
| 111 | +也许可以找到你想要的: |
| 112 | +[入门问题][TensorFlow][深度学习][强化学习][神经网络][机器学习][自然语言处理][聊天机器人] |
0 commit comments