Some exposition on CRFs

rguthrie3 · rguthrie3 · commit 0e1c7e8f7cb2 · 2017-03-05T13:56:24.000-05:00
diff --git a/Deep Learning for Natural Language Processing with Pytorch.ipynb b/Deep Learning for Natural Language Processing with Pytorch.ipynb
@@ -1702,7 +1702,38 @@
     "collapsed": true
    },
    "source": [
-    "This section will probably contain a BiLSTM Conditional Random Field for Named Entity Recognition."
+    "Pytorch is a *dynamic* neural network kit.  Another example of a dynamic kit is [Dynet](https://github.com/clab/dynet) (I mention this because working with Pytorch and Dynet is similar.  If you see an example in Dynet, it will probably help you implement it in Pytorch).  The opposite is the *static* tool kit, which includes Theano, Keras, TensorFlow, etc.\n",
+    "The core difference is the following:\n",
+    "* In a static toolkit, you define a computation graph once, compile it, and then stream instances to it.\n",
+    "* In a dynamic toolkit, you define a computation graph *for each instance*.  It is never compiled and is executed on-the-fly\n",
+    "\n",
+    "Without a lot of experience, it is difficult to appreciate the difference.\n",
+    "One example is to suppose we want to build a deep constituent parser.\n",
+    "Suppose our model involves roughly the following steps:\n",
+    "* We build the tree bottom up\n",
+    "* Tag the root nodes (the words of the sentence)\n",
+    "* From there, use a neural network and the embeddings of the words\n",
+    "to find combinations that form constituents.  Whenever you form a new constituent,\n",
+    "use some sort of technique to get an embedding of the constituent.\n",
+    "In this case, our network architecture will depend completely on the input sentence.\n",
+    "In the sentence \"The green cat scratched the wall\", at some point in the model, we will want to combine\n",
+    "the span $(i,j,r) = (1, 3, \\text{NP})$ (that is, an NP constituent spans word 1 to word 3, in this case \"The green cat\").\n",
+    "\n",
+    "However, another sentence might be \"Somewhere, the big fat cat scratched the wall\".  In this sentence, we will want to form the constituent $(2, 4, NP)$ at some point.\n",
+    "The constituents we will want to form will depend on the instance.  If we just compile the computation graph once, as in a static toolkit, it will be exceptionally difficult or impossible to program this logic.  In a dynamic toolkit though, there isn't just 1 pre-defined computation graph.  There can be a new computation graph for each instance, so this problem goes away.\n",
+    "\n",
+    "Dynamic toolkits also have the advantage of being easier to debug and the code more closely resembling the host language (by that I mean that Pytorch and Dynet look more like actual Python code than Keras or Theano)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition.  Familiarity with CRF's is assumed.  Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features.\n",
+    "\n",
+    "Let $\\textbf{y}$ be a tag sequence, and $\\textbf{w}$ a sequence of words.  Recall that the CRF wants to compute\n",
+    "$$ P(\\textbf{y} | \\textbf{w}) = \\frac{ \\exp{ ( \\sum_i f(y_{i-1}, y_i, i, \\textbf{w}) \\cdot \\theta ) }}\n",
+    "{\\sum_{\\textbf{y'}} \\exp{ ( \\sum_j f(y'_{j-1}, y'_j, j, \\textbf{w} \\cdot \\theta } )  } $$"
    ]
   },
   {