|
1702 | 1702 | "collapsed": true
|
1703 | 1703 | },
|
1704 | 1704 | "source": [
|
1705 |
| - "This section will probably contain a BiLSTM Conditional Random Field for Named Entity Recognition." |
| 1705 | + "Pytorch is a *dynamic* neural network kit. Another example of a dynamic kit is [Dynet](https://github.com/clab/dynet) (I mention this because working with Pytorch and Dynet is similar. If you see an example in Dynet, it will probably help you implement it in Pytorch). The opposite is the *static* tool kit, which includes Theano, Keras, TensorFlow, etc.\n", |
| 1706 | + "The core difference is the following:\n", |
| 1707 | + "* In a static toolkit, you define a computation graph once, compile it, and then stream instances to it.\n", |
| 1708 | + "* In a dynamic toolkit, you define a computation graph *for each instance*. It is never compiled and is executed on-the-fly\n", |
| 1709 | + "\n", |
| 1710 | + "Without a lot of experience, it is difficult to appreciate the difference.\n", |
| 1711 | + "One example is to suppose we want to build a deep constituent parser.\n", |
| 1712 | + "Suppose our model involves roughly the following steps:\n", |
| 1713 | + "* We build the tree bottom up\n", |
| 1714 | + "* Tag the root nodes (the words of the sentence)\n", |
| 1715 | + "* From there, use a neural network and the embeddings of the words\n", |
| 1716 | + "to find combinations that form constituents. Whenever you form a new constituent,\n", |
| 1717 | + "use some sort of technique to get an embedding of the constituent.\n", |
| 1718 | + "In this case, our network architecture will depend completely on the input sentence.\n", |
| 1719 | + "In the sentence \"The green cat scratched the wall\", at some point in the model, we will want to combine\n", |
| 1720 | + "the span $(i,j,r) = (1, 3, \\text{NP})$ (that is, an NP constituent spans word 1 to word 3, in this case \"The green cat\").\n", |
| 1721 | + "\n", |
| 1722 | + "However, another sentence might be \"Somewhere, the big fat cat scratched the wall\". In this sentence, we will want to form the constituent $(2, 4, NP)$ at some point.\n", |
| 1723 | + "The constituents we will want to form will depend on the instance. If we just compile the computation graph once, as in a static toolkit, it will be exceptionally difficult or impossible to program this logic. In a dynamic toolkit though, there isn't just 1 pre-defined computation graph. There can be a new computation graph for each instance, so this problem goes away.\n", |
| 1724 | + "\n", |
| 1725 | + "Dynamic toolkits also have the advantage of being easier to debug and the code more closely resembling the host language (by that I mean that Pytorch and Dynet look more like actual Python code than Keras or Theano)." |
| 1726 | + ] |
| 1727 | + }, |
| 1728 | + { |
| 1729 | + "cell_type": "markdown", |
| 1730 | + "metadata": {}, |
| 1731 | + "source": [ |
| 1732 | + "For this section, we will see a full, complicated example of a Bi-LSTM Conditional Random Field for named-entity recognition. Familiarity with CRF's is assumed. Although this name sounds scary, all the model is is a CRF but where an LSTM provides the features.\n", |
| 1733 | + "\n", |
| 1734 | + "Let $\\textbf{y}$ be a tag sequence, and $\\textbf{w}$ a sequence of words. Recall that the CRF wants to compute\n", |
| 1735 | + "$$ P(\\textbf{y} | \\textbf{w}) = \\frac{ \\exp{ ( \\sum_i f(y_{i-1}, y_i, i, \\textbf{w}) \\cdot \\theta ) }}\n", |
| 1736 | + "{\\sum_{\\textbf{y'}} \\exp{ ( \\sum_j f(y'_{j-1}, y'_j, j, \\textbf{w} \\cdot \\theta } ) } $$" |
1706 | 1737 | ]
|
1707 | 1738 | },
|
1708 | 1739 | {
|
|
0 commit comments