|
14 | 14 |
|
15 | 15 | \setlength\titlebox{5cm}
|
16 | 16 |
|
17 |
| -\title{A Survey of Neural Network Techniques\\ for Feature Extraction from Text\\ \large CS 698 (Winter 2017) - Project} |
| 17 | +\title{A Survey of Neural Network Techniques\\ for Feature Extraction from Text} |
18 | 18 |
|
19 | 19 | \author{
|
20 | 20 | Vineet John \\
|
@@ -319,7 +319,7 @@ \section{Linguistic Regularities in Continuous Space Word Representations} % (fo
|
319 | 319 | $f(z) = \frac{1}{1 + e^{-z}}$ and $g(z_m) = \frac{e^{z_m}}{\sum_k e^{z_k}} $
|
320 | 320 | \begin{figure}[ht]
|
321 | 321 | \centering
|
322 |
| - \includegraphics[width=.4\textwidth]{rnn-lang-model} |
| 322 | + \includegraphics[width=.4\textwidth]{rnn-lang-model.png} |
323 | 323 | \caption{RNN Language Model}
|
324 | 324 | \label{fig:rnn-lang-model}
|
325 | 325 | \end{figure}
|
@@ -350,7 +350,7 @@ \section{Better Word Representations with Recursive Neural Networks for Morpholo
|
350 | 350 | \texttt{morphoRNN} operates at the morpheme level rather than the word level. An example of the this is illustrated in Figure \ref{fig:rnn-morphology}.
|
351 | 351 | \begin{figure}[ht]
|
352 | 352 | \centering
|
353 |
| - \includegraphics[width=.4\textwidth]{rnn-morphology} |
| 353 | + \includegraphics[width=.4\textwidth]{rnn-morphology.png} |
354 | 354 | \caption{morphoRNN}
|
355 | 355 | \label{fig:rnn-morphology}
|
356 | 356 | \end{figure}
|
@@ -388,7 +388,7 @@ \section{Efficient Estimation of Word Representations in Vector Space} % (fold)
|
388 | 388 | \textbf{Approach:}
|
389 | 389 | \begin{itemize}
|
390 | 390 | \item
|
391 |
| - The ideas presented in this paper build on the previous ideas presented by Bengio et.al.\cite{bengio2003neural} |
| 391 | + The ideas presented in this paper build on the previous ideas presented by \cite{bengio2003neural}. |
392 | 392 | \item
|
393 | 393 | The objective was to obtain high-quality word embeddings that capture the syntactic and semantic characteristics of words in a manner that allows algebraic operations to proxy the distances in vector space.
|
394 | 394 | $$man - woman = king - queen$$ or $$tell - told = walk - walked$$
|
@@ -440,7 +440,7 @@ \section{Distributed Representations of Words and Phrases and their Compositiona
|
440 | 440 | \end{equation}
|
441 | 441 | where $c$ is the window or context surrounding the current word being trained on.
|
442 | 442 | \item
|
443 |
| - As introduced by Morin, Bengio et.al.\cite{morin2005hierarchical}, a computationally efficient approximation of the full softmax is the hierarchical softmax. The hierarchical softmax uses a binary tree representation of the output layer with the W words as its leaves and, for each node, explicitly represents the relative probabilities of its child nodes. These define a random walk that assigns probabilities to words. |
| 443 | + As introduced by \cite{morin2005hierarchical}, a computationally efficient approximation of the full softmax is the hierarchical softmax. The hierarchical softmax uses a binary tree representation of the output layer with the W words as its leaves and, for each node, explicitly represents the relative probabilities of its child nodes. These define a random walk that assigns probabilities to words. |
444 | 444 | \item
|
445 | 445 | The authors use a binary Huffman tree, as it assigns short codes to the frequent words which results in fast training. It has been observed before that grouping words together by their frequency works well as a very simple speedup technique for the neural network based language models.
|
446 | 446 | \item
|
@@ -483,7 +483,7 @@ \section{Glove: Global Vectors for Word Representation} % (fold)
|
483 | 483 | \item
|
484 | 484 | An additive shift is included in the logarithm, $$\log(X_{ik}) \Rightarrow log(1 + X_{ik})$$ which maintains the sparsity of X while avoiding the divergences while computing the co-occurrences matrix.
|
485 | 485 | \item
|
486 |
| - The model obtained in the paper could be compared to a global skip-gram model as opposed to a fixed window-size skip-gram model as proposed by Mikolov et.al.\cite{mikolov2013efficient}. |
| 486 | + The model obtained in the paper could be compared to a global skip-gram model as opposed to a fixed window-size skip-gram model as proposed by \cite{mikolov2013efficient}. |
487 | 487 | \item
|
488 | 488 | The performance seems to increase monotonically with an increase in training data.
|
489 | 489 | \end{itemize}
|
@@ -535,7 +535,7 @@ \section{Acknowledgements} % (fold)
|
535 | 535 | % section acknowledgements (end)
|
536 | 536 |
|
537 | 537 |
|
538 |
| -\bibliographystyle{unsrt} |
| 538 | +\bibliographystyle{acl_natbib} |
539 | 539 | \bibliography{cs698_project_report}
|
540 | 540 |
|
541 | 541 | \end{document}
|
0 commit comments