Skip to content

Commit bcdd7c0

Browse files
author
Vineet John
committed
Added NLP Task backgrounds
1 parent 642df96 commit bcdd7c0

File tree

1 file changed

+61
-0
lines changed

1 file changed

+61
-0
lines changed

project-report/cs698_project_report.tex

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,65 @@ \section{Goal} % (fold)
8282
% section goal (end)
8383

8484

85+
\section{NLP Tasks - Background} % (fold)
86+
\label{sec:nlp_tasks_background}
87+
88+
\subsection{Part-of-Speech Tagging} % (fold)
89+
\label{sub:part_of_speech_tagging}
90+
91+
\begin{itemize}
92+
\item
93+
POS aims at labeling each word with a unique tag that indicates its syntactic role, like noun, verb, adjective.
94+
\item
95+
The best POS classifiers are based on classifiers trained on windows of text, which are then fed to a bidirectional decoding algorithm during inference.
96+
\item
97+
In general models resemble a bi-directional dependency network, and can be trained using a variety of methods including support vector machines, as well as bi-directional Viterbi decoders.
98+
\end{itemize}
99+
100+
% subsection part_of_speech_tagging (end)
101+
102+
\subsection{Chunking} % (fold)
103+
\label{sub:chunking}
104+
105+
\begin{itemize}
106+
\item
107+
Chunking aims at labeling segments of a sentence with syntactic constituents such as noun or verb phrases. It is also called shallow parsing and can be viewed as a generalization of part-of-speech tagging to phrases instead of words.
108+
\item
109+
The implementation of chunking usually requires an underlying POS implementation, after which the words are compounded or chunked by concatenation.
110+
\end{itemize}
111+
112+
% subsection chunking (end)
113+
114+
\subsection{Named Entity Recognition} % (fold)
115+
\label{sub:named_entity_recognition}
116+
117+
\begin{itemize}
118+
\item
119+
NER labels atomic elements in the sentence into categories such as “PERSON” or “LOCATION”.
120+
\item
121+
Features to train NER classifiers could include POS tags, CHUNK tags, prefixes and suffixes, and large lexicons of the labelled entities.
122+
\end{itemize}
123+
124+
% subsection named_entity_recognition (end)
125+
126+
\subsection{Semantic Role Labeling} % (fold)
127+
\label{sub:semantic_role_labeling}
128+
129+
\begin{itemize}
130+
\item
131+
SRL aims at giving a semantic role to a syntactic constituent of a sentence.
132+
\item
133+
State-of-the-art SRL systems consist of several stages: producing a parse tree, identifying which parse tree nodes represent the arguments of a given verb, and finally classifying these nodes to compute the corresponding SRL tags.
134+
\item
135+
SRL systems usually entail numerous features like the parts of speech and syntactic labels of words and nodes in the tree, the syntactic path to the verb in the parse tree, whether a node in the parse tree is part of a noun or verb phrase etc.
136+
\end{itemize}
137+
138+
% subsection semantic_role_labeling (end)
139+
140+
141+
% section nlp_tasks_background (end)
142+
143+
85144
\section{Document Vectorization} % (fold)
86145
\label{sec:document_vectorization}
87146

@@ -434,6 +493,8 @@ \section{Glove: Global Vectors for Word Representation} % (fold)
434493
An additive shift is included in the logarithm, $$\log(X_{ik}) \Rightarrow log(1 + X_{ik})$$ which maintains the sparsity of X while avoiding the divergences while computing the co-occurrences matrix.
435494
\item
436495
The model obtained in the paper could be compared to a global skip-gram model as opposed to a fixed window-size skip-gram model as proposed by Mikolov et.al.\cite{mikolov2013efficient}.
496+
\item
497+
The performance seems to increase monotonically with an increase in training data.
437498
\end{itemize}
438499

439500
% section glove_global_vectors_for_word_representation (end)

0 commit comments

Comments
 (0)