Skip to content

Commit

Permalink
Merge pull request cgpotts#54 from cgpotts/20200322-minor-edits
Browse files Browse the repository at this point in the history
Minor edits to rel_ext notebooks
  • Loading branch information
cgpotts authored Mar 23, 2020
2 parents 4ef26f7 + 3db0d48 commit 3bdb224
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
2 changes: 1 addition & 1 deletion rel_ext_01_task.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@
"\n",
"* Make sure your environment includes all the requirements for [the cs224u repository](https://github.com/cgpotts/cs224u).\n",
"\n",
"* If you haven't already, download [the course data](http://web.stanford.edu/class/cs224u/data/data.zip), unpack it, and place it in the directory containing the course repository – the same directory as this notebook. (If you want to put it somewhere else, change `rel_ext_data_home` below.)"
"* If you haven't already, download [the course data](http://web.stanford.edu/class/cs224u/data/data.tgz), unpack it, and place it in the directory containing the course repository – the same directory as this notebook. (If you want to put it somewhere else, change `rel_ext_data_home` below.)"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions rel_ext_02_experiments.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@
"source": [
"### Experiments\n",
"\n",
"Now we need some functions to train models, make predictions, and evaluate the results. We'll start with `train_models()`. This function takes as arguments a dictionary of data splits, a list of featurizers, the name of the split on which to train, and a model factory, which is a function which initializes an `sklearn` classifier. It returns a dictionary holding the featurizers, the vectorizer that was used to generate the training matrix, and a dictionary holding the trained models, one per relation."
"Now we need some functions to train models, make predictions, and evaluate the results. We'll start with `train_models()`. This function takes as arguments a dictionary of data splits, a list of featurizers, the name of the split on which to train (by default, 'train'), and a model factory, which is a function which initializes an `sklearn` classifier (by default, a logistic regression classifier). It returns a dictionary holding the featurizers, the vectorizer that was used to generate the training matrix, and a dictionary holding the trained models, one per relation."
]
},
{
Expand Down Expand Up @@ -650,7 +650,7 @@
"\n",
"Another way to gain insight into our trained models is to use them to discover new relation instances that don't currently appear in the KB. In fact, this is the whole point of building a relation extraction system: to extend an existing KB (or build a new one) using knowledge extracted from natural language text at scale. Can the models we've trained do this effectively?\n",
"\n",
"Because the goal is to discover new relation instances which are _true_ but _absent from the KB_, we can't evaluate this capability automatically. But we can generate candidate KB triples and manually evaluate them for correctness.\n",
"Because the goal is to discover new relation instances which are *true* but *absent from the KB*, we can't evaluate this capability automatically. But we can generate candidate KB triples and manually evaluate them for correctness.\n",
"\n",
"To do this, we'll start from corpus examples containing pairs of entities which do not belong to any relation in the KB (earlier, we described these as \"negative examples\"). We'll then apply our trained models to each pair of entities, and sort the results by probability assigned by the model, in order to find the most likely new instances for each relation."
]
Expand Down

0 comments on commit 3bdb224

Please sign in to comment.