|
2 | 2 | \usepackage[hyperref]{acl2017}
|
3 | 3 | \usepackage{times}
|
4 | 4 | \usepackage{latexsym}
|
5 |
| - |
| 5 | +\usepackage{graphicx} |
6 | 6 | \usepackage{hyperref}
|
7 | 7 | \hypersetup{
|
8 | 8 | colorlinks=true,
|
@@ -234,6 +234,49 @@ \section{Hierarchical Probabilistic Neural Network Language Model} % (fold)
|
234 | 234 | % section hierarchical_probabilistic_neural_network_language_model (end)
|
235 | 235 |
|
236 | 236 |
|
| 237 | +\section{Better Word Representations with Recursive Neural Networks for Morphology} % (fold) |
| 238 | +\label{sec:better_word_representations_with_recursive_neural_networks_for_morphology} |
| 239 | + |
| 240 | + \textbf{Goal:} |
| 241 | + The paper aims to address the inaccuracy in vector representations of complex and rare words, supposedly caused by the lack of relation between morphologically related words. \cite{luong2013better} |
| 242 | + |
| 243 | + \textbf{Approach:} |
| 244 | + \begin{itemize} |
| 245 | + \item |
| 246 | + The authors treat each morpheme as a basic unit in the RNNs and construct representations for morpho- logically complex words on the fly from their morphemes. By training a neural language model (NLM) and integrating RNN structures for complex words, they utilize contextual information to learn morphemic semantics and their compositional properties. |
| 247 | + \item |
| 248 | + Discusses a problem that the Word2Vec syntactic relations like $$x_{apples} - x_{apple} \approx x_{cars} - x_{car}$$ might not hold true if the vector representation of a rare word is inaccurate to begin with. |
| 249 | + \item |
| 250 | + \texttt{morphoRNN} operates at the morpheme level rather than the word level. An example of the this is illustrated in Figure \ref{fig:rnn-morphology}. |
| 251 | + \begin{figure}[ht] |
| 252 | + \centering |
| 253 | + \includegraphics[width=.4\textwidth]{rnn-morphology} |
| 254 | + \caption{morphoRNN} |
| 255 | + \label{fig:rnn-morphology} |
| 256 | + \end{figure} |
| 257 | + \item |
| 258 | + Parent words are created by combining a stem vector and an affix vector, as shown in Equation \ref{eqn:parent-vector}. |
| 259 | + \begin{equation} \label{eqn:parent-vector} |
| 260 | + p = f (W_m [x_{stem} ; x_{affix}] + b_m) |
| 261 | + \end{equation} |
| 262 | + \item |
| 263 | + The cost function is expression in terms of the squared Euclidean loss between the newly constructed representation $p_c(x_i)$ and the reference representation $p_r(x_i)$. The cost function is given in Equation \ref{eqn:cost-function-morphornn}. |
| 264 | + \begin{equation} \label{eqn:cost-function-morphornn} |
| 265 | + J(\theta) = \sum_{i=1}^N (|| p_c(x_i) - p_c(x_i) ||^2_2) + \frac{\lambda}{2} ||\theta||^2_2 |
| 266 | + \end{equation} |
| 267 | + \item |
| 268 | + The paper describes both context sensitive and insensitive versions of the Morphological RNN. |
| 269 | + \item |
| 270 | + Similar to a typical RNN, the network is trained by computing the activation functions and propagating the errors backward in a forward-backward pass architecture. |
| 271 | + \end{itemize} |
| 272 | + |
| 273 | + \textbf{Analysis:} |
| 274 | + This RNN model performs better than most of the other neural language model, and could be used to supplement word vectors. |
| 275 | + |
| 276 | +% section better_word_representations_with_recursive_neural_networks_for_morphology (end) |
| 277 | + |
| 278 | + |
| 279 | + |
237 | 280 | \newpage
|
238 | 281 |
|
239 | 282 | \bibliographystyle{unsrt}
|
|
0 commit comments