From 3db0d485ebbfc4390abcdd5a95deddb133716e6e Mon Sep 17 00:00:00 2001 From: Bill MacCartney Date: Sun, 22 Mar 2020 23:17:21 -0700 Subject: [PATCH] Minor edits to rel_ext notebooks Minor edits to rel_ext notebooks: 1. Fix extension on data file name. 2. Wording improvements. 3. Use * instead of _ for emphasis. --- rel_ext_01_task.ipynb | 2 +- rel_ext_02_experiments.ipynb | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/rel_ext_01_task.ipynb b/rel_ext_01_task.ipynb index 4659889..65c40b9 100644 --- a/rel_ext_01_task.ipynb +++ b/rel_ext_01_task.ipynb @@ -105,7 +105,7 @@ "\n", "* Make sure your environment includes all the requirements for [the cs224u repository](https://github.com/cgpotts/cs224u).\n", "\n", - "* If you haven't already, download [the course data](http://web.stanford.edu/class/cs224u/data/data.zip), unpack it, and place it in the directory containing the course repository – the same directory as this notebook. (If you want to put it somewhere else, change `rel_ext_data_home` below.)" + "* If you haven't already, download [the course data](http://web.stanford.edu/class/cs224u/data/data.tgz), unpack it, and place it in the directory containing the course repository – the same directory as this notebook. (If you want to put it somewhere else, change `rel_ext_data_home` below.)" ] }, { diff --git a/rel_ext_02_experiments.ipynb b/rel_ext_02_experiments.ipynb index 827831e..ef67e08 100644 --- a/rel_ext_02_experiments.ipynb +++ b/rel_ext_02_experiments.ipynb @@ -289,7 +289,7 @@ "source": [ "### Experiments\n", "\n", - "Now we need some functions to train models, make predictions, and evaluate the results. We'll start with `train_models()`. This function takes as arguments a dictionary of data splits, a list of featurizers, the name of the split on which to train, and a model factory, which is a function which initializes an `sklearn` classifier. It returns a dictionary holding the featurizers, the vectorizer that was used to generate the training matrix, and a dictionary holding the trained models, one per relation." + "Now we need some functions to train models, make predictions, and evaluate the results. We'll start with `train_models()`. This function takes as arguments a dictionary of data splits, a list of featurizers, the name of the split on which to train (by default, 'train'), and a model factory, which is a function which initializes an `sklearn` classifier (by default, a logistic regression classifier). It returns a dictionary holding the featurizers, the vectorizer that was used to generate the training matrix, and a dictionary holding the trained models, one per relation." ] }, { @@ -650,7 +650,7 @@ "\n", "Another way to gain insight into our trained models is to use them to discover new relation instances that don't currently appear in the KB. In fact, this is the whole point of building a relation extraction system: to extend an existing KB (or build a new one) using knowledge extracted from natural language text at scale. Can the models we've trained do this effectively?\n", "\n", - "Because the goal is to discover new relation instances which are _true_ but _absent from the KB_, we can't evaluate this capability automatically. But we can generate candidate KB triples and manually evaluate them for correctness.\n", + "Because the goal is to discover new relation instances which are *true* but *absent from the KB*, we can't evaluate this capability automatically. But we can generate candidate KB triples and manually evaluate them for correctness.\n", "\n", "To do this, we'll start from corpus examples containing pairs of entities which do not belong to any relation in the KB (earlier, we described these as \"negative examples\"). We'll then apply our trained models to each pair of entities, and sort the results by probability assigned by the model, in order to find the most likely new instances for each relation." ]