Update ACL link

Kumbong · Sep 6, 2020 · bc79ad7 · bc79ad7
1 parent 72d0933
commit bc79ad7
Show file tree

Hide file tree

Showing 3 changed files with 4 additions and 4 deletions.
diff --git a/evaluation_methods.ipynb b/evaluation_methods.ipynb
@@ -373,7 +373,7 @@
     "\n",
     "1. As disussed briefly in [the NLI models notebook](nli_02_models.ipynb#Other-findings), [Leonid Keselman](https://leonidk.com/) observed [in his 2016 NLU course project](https://leonidk.com/stanford/cs224u.html) that one can do much better than chance on SNLI by processing only the hypothesis, ignoring the premise entirely. The exact interpretation of this is complex (we explore this a bit [in our NLI unit](nli_02_models.ipynb#Hypothesis-only-baselines) and [in our NLI bake-off](nli_wordentail.ipynb)), but it's certainly relevant for understanding how much a system has actually learned about reasoning from a premise to a conclusion.\n",
     " \n",
-    "1. [Schwartz et al. (2017)](https://aclanthology.coli.uni-saarland.de/papers/W17-0907/w17-0907) develop a system for choosing between a coherent and incoherent ending for a story. Their best system achieves 75% accuracy by processing the story and the ending, but they achieve 72% using only stylistic features of the ending, ignoring the preceding story entirely. This puts the 75% – and the extent to which the system understands story completion – in a new light."
+    "1. [Schwartz et al. (2017)](https://www.aclweb.org/anthology/W17-0907) develop a system for choosing between a coherent and incoherent ending for a story. Their best system achieves 75% accuracy by processing the story and the ending, but they achieve 72% using only stylistic features of the ending, ignoring the preceding story entirely. This puts the 75% – and the extent to which the system understands story completion – in a new light."
    ]
   },
   {
@@ -914,7 +914,7 @@
     "\n",
     "Most deep learning models have their parameters initialized randomly, perhaps according to some heuristics related to the number of parameters ([Glorot and Bengio 2010](http://proceedings.mlr.press/v9/glorot10a.html)) or their internal structure ([Saxe et al. 2014](https://arxiv.org/abs/1312.6120)). This is meaningful largely because of the non-convex optimization problems that these models define, but it can impact simpler models that have multiple optimal solutions that still differ at test time. \n",
     "\n",
-    "There is growing awareness that these random choices have serious consequences. For instance, [Reimers and Gurevych (2017)](https://aclanthology.coli.uni-saarland.de/papers/D17-1035/d17-1035) report that different initializations for neural sequence models can lead to statistically significant results, and they show that a number of recent systems are indistinguishable in terms of raw performance once this source of variation is taken into account.\n",
+    "There is growing awareness that these random choices have serious consequences. For instance, [Reimers and Gurevych (2017)](https://www.aclweb.org/anthology/D17-1035) report that different initializations for neural sequence models can lead to statistically significant results, and they show that a number of recent systems are indistinguishable in terms of raw performance once this source of variation is taken into account.\n",
     "\n",
     "This shouldn't surprise practitioners, who have long struggled with the question of what to do when a system experiences a catastrophic failure as a result of unlucky initialization. (I think the answer is to report this failure rate.)\n",
     "\n",

diff --git a/nli_01_task_and_data.ipynb b/nli_01_task_and_data.ipynb
@@ -159,7 +159,7 @@
     "* All the premises are captions from the [Flickr30K corpus](http://shannon.cs.illinois.edu/DenotationGraph/).\n",
     "\n",
     "\n",
-    "* Some of the sentences rather depressingly reflect stereotypes ([Rudinger et al. 2017](https://aclanthology.coli.uni-saarland.de/papers/W17-1609/w17-1609)).\n",
+    "* Some of the sentences rather depressingly reflect stereotypes ([Rudinger et al. 2017](https://www.aclweb.org/anthology/W17-1609)).\n",
     "\n",
     "\n",
     "* 550,152 train examples; 10K dev; 10K test\n",

diff --git a/vsm_03_retrofitting.ipynb b/vsm_03_retrofitting.ipynb
@@ -668,7 +668,7 @@
     "\n",
     "* If you think of the input VSM as a \"warm start\" for graph embedding algorithms, then you're essentially retrofitting. This connection opens up a number of new opportunities to go beyond the similarity-based semantics that underlies Faruqui et al.'s model. See [Lengerich et al. 2017](https://arxiv.org/pdf/1708.00112.pdf), section 3.2, for more on these connections.\n",
     "\n",
-    "* [Mrkšić  et al. 2016](https://aclanthology.coli.uni-saarland.de/papers/N16-1018/n16-1018) address the limitation of Faruqui et al's model that it assumes connected nodes in the graph are similar. In a graph with complex, varied edge semantics, this is likely to be false. They address the case of antonymy in particular.\n",
+    "* [Mrkšić  et al. 2016](https://www.aclweb.org/anthology/N16-1018) address the limitation of Faruqui et al's model that it assumes connected nodes in the graph are similar. In a graph with complex, varied edge semantics, this is likely to be false. They address the case of antonymy in particular.\n",
     "\n",
     "* [Lengerich et al. 2017](https://arxiv.org/pdf/1708.00112.pdf) present a __functional retrofitting__ framework in which the edge meanings are explicitly modeled, and they evaluate instantiations of the framework with linear and neural edge penalty functions. (The Faruqui et al. model emerges as a specific instantiation of this framework.)"
    ]