Skip to content

Commit 0e1c7e8

Browse files
committed
Some exposition on CRFs
1 parent 9515adf commit 0e1c7e8

File tree

1 file changed

+32
-1
lines changed

1 file changed

+32
-1
lines changed

Deep Learning for Natural Language Processing with Pytorch.ipynb

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1702,7 +1702,38 @@
17021702
"collapsed": true
17031703
},
17041704
"source": [
1705-
"This section will probably contain a BiLSTM Conditional Random Field for Named Entity Recognition."
1705+
"Pytorch is a *dynamic* neural network kit. Another example of a dynamic kit is [Dynet](https://github.com/clab/dynet) (I mention this because working with Pytorch and Dynet is similar. If you see an example in Dynet, it will probably help you implement it in Pytorch). The opposite is the *static* tool kit, which includes Theano, Keras, TensorFlow, etc.\n",
1706+
"The core difference is the following:\n",
1707+
"* In a static toolkit, you define a computation graph once, compile it, and then stream instances to it.\n",
1708+
"* In a dynamic toolkit, you define a computation graph *for each instance*. It is never compiled and is executed on-the-fly\n",
1709+
"\n",
1710+
"Without a lot of experience, it is difficult to appreciate the difference.\n",
1711+
"One example is to suppose we want to build a deep constituent parser.\n",
1712+
"Suppose our model involves roughly the following steps:\n",
1713+
"* We build the tree bottom up\n",
1714+
"* Tag the root nodes (the words of the sentence)\n",
1715+
"* From there, use a neural network and the embeddings of the words\n",
1716+
"to find combinations that form constituents. Whenever you form a new constituent,\n",
1717+
"use some sort of technique to get an embedding of the constituent.\n",
1718+
"In this case, our network architecture will depend completely on the input sentence.\n",
1719+
"In the sentence \"The green cat scratched the wall\", at some point in the model, we will want to combine\n",
1720+
"the span $(i,j,r) = (1, 3, \\text{NP})$ (that is, an NP constituent spans word 1 to word 3, in this case \"The green cat\").\n",
1721+
"\n",
1722+
"However, another sentence might be \"Somewhere, the big fat cat scratched the wall\". In this sentence, we will want to form the constituent $(2, 4, NP)$ at some point.\n",
1723+
"The constituents we will want to form will depend on the instance. If we just compile the computation graph once, as in a static toolkit, it will be exceptionally difficult or impossible to program this logic. In a dynamic toolkit though, there isn't just 1 pre-defined computation graph. There can be a new computation graph for each instance, so this problem goes away.\n",
1724+
"\n",
1725+
"Dynamic toolkits also have the advantage of being easier to debug and the code more closely resembling the host language (by that I mean that Pytorch and Dynet look more like actual Python code than Keras or Theano)."
1726+
]
1727+
},
1728+
{
1729+
"cell_type": "markdown",
1730+
"metadata": {},
1731+
"source": [
1732+
"For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition. Familiarity with CRF's is assumed. Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features.\n",
1733+
"\n",
1734+
"Let $\\textbf{y}$ be a tag sequence, and $\\textbf{w}$ a sequence of words. Recall that the CRF wants to compute\n",
1735+
"$$ P(\\textbf{y} | \\textbf{w}) = \\frac{ \\exp{ ( \\sum_i f(y_{i-1}, y_i, i, \\textbf{w}) \\cdot \\theta ) }}\n",
1736+
"{\\sum_{\\textbf{y'}} \\exp{ ( \\sum_j f(y'_{j-1}, y'_j, j, \\textbf{w} \\cdot \\theta } ) } $$"
17061737
]
17071738
},
17081739
{

0 commit comments

Comments
 (0)