|
| 1 | +# Questionable Answers |
| 2 | + |
| 3 | +This repository contains everything required to completely replicate the results presented in: |
| 4 | + |
| 5 | +Matt Crane. "Questionable Answers in Question Answering Research: Reproducibility and Variability of Published Results". In: Transactions of the Association for Computational Linguistics 6 (2018), pp. 241–252. url: https://transacl.org/ojs/index.php/tacl/article/view/1299. |
| 6 | + |
| 7 | +## Status |
| 8 | + |
| 9 | +Unfortunately, the upstream repository |
| 10 | +[castorini/castor](//github.com/castorini/castor) has diverged due to history |
| 11 | +rewriting changes, so the changesets don't match the official current |
| 12 | +repository. |
| 13 | + |
| 14 | +Unfortunately this repository was not forked in time to capture the `cf0e269` |
| 15 | +SHA from the official repository before that repositories history was |
| 16 | +re-written. This means that if building from source, you'll have a _different_ |
| 17 | +SHA which is used to build this image. The `setup.sh` script will make this |
| 18 | +change, the contents of which can be verified against [the official repository |
| 19 | +diff](//github.com/castorini/castor/commit/ed4dba249712e8bbaf5ed7c1486dff52b472daf4). |
| 20 | + |
| 21 | +Running `setup.sh build` will build the docker images from the source, |
| 22 | +including making the un-captured change above, while `setup.sh pull` will pull |
| 23 | +the prebuilt docker images. |
| 24 | + |
| 25 | +## Requirements |
| 26 | + |
| 27 | +#### Running on GPU |
| 28 | + |
| 29 | +`nvidia-docker` is required to run the GPU based experiments, and for these |
| 30 | +experiments version 1 was used. This has since been deprecated by nVidia in |
| 31 | +favour of version 2. The results _should_ be the same, but for guarantees |
| 32 | +[install version 1](//github.com/nvidia/nvidia-docker/wiki/Installation-(version-1.0)). |
| 33 | + |
| 34 | +#### Building images |
| 35 | + |
| 36 | +The embeddings used by the network should be downloaded from [Aliaksei Severyn's shared file |
| 37 | +(520MB)](//drive.google.com/folderview?id=0B-yipfgecoSBfkZlY2FFWEpDR3M4Qkw5U055MWJrenE5MTBFVXlpRnd0QjZaMDQxejh1cWs&usp=sharing), |
| 38 | +and placed in the working directory for this repository. The docker image |
| 39 | +builder will verify checksums to ensure that the same file is used. |
| 40 | + |
| 41 | +#### Pulling images |
| 42 | + |
| 43 | +All the docker images generated are available online to download/run without |
| 44 | +having to be built from scratch. These are listed [on Docker hub](//hub.docker.com/r/snapbug/qqa/tags/) |
| 45 | + |
| 46 | +By default the `setup.sh` script if run with will pull _all_ the tagged images, |
| 47 | +this can take a substantial amount of disk space, even though they share a lot |
| 48 | +of commonality. If you only, for example, want to replicate the math library |
| 49 | +experiments, then manually pull the required images. Look at `run.sh` for which |
| 50 | +images are required for which experiments. |
| 51 | + |
| 52 | +| Image | Figure/Table | Notes | |
| 53 | +|---------------|------------------|------------------------------------------------| |
| 54 | +| `sha-*` | Table 4 | See note above regarding `sha-cf0e269` | |
| 55 | +| `pytorch-*` | Table 5 | | |
| 56 | +| `*mkl` | Table 6 | | |
| 57 | +| `sha-cf0e269` | Table 7 | | |
| 58 | +| `sha-cf0e269` | Table 8 | | |
| 59 | +| `sha-cf0e269` | Figure 2 (left) | Just the CPU seeds | |
| 60 | +| `sha-cf0e269` | Figure 2 (right) | Just the GPU seeds | |
| 61 | +| `sha-cf0e269` | Figure 2 | Both CPU and GPU seeds | |
| 62 | +| | Figure 3 | Use the output from the logs of `run.sh seeds` | |
| 63 | +| | Table 9 | Use the output from the logs of `run.sh seeds` | |
| 64 | + |
| 65 | +## Replication |
| 66 | + |
| 67 | +`run.sh` will successfully replicate all the experiments in the paper using |
| 68 | +either the built docker images, or pulled docker images from `setup.sh`. It |
| 69 | +takes a single argument that specifies which experiments to run. |
| 70 | + |
| 71 | +| Argument | Figure/Table | Notes | |
| 72 | +|-----------|------------------|------------------------------------------------| |
| 73 | +| all | | All of the experiments | |
| 74 | +| network | Table 4 | | |
| 75 | +| pytorch | Table 5 | | |
| 76 | +| mathlib | Table 6 | | |
| 77 | +| thread | Table 7 | | |
| 78 | +| gpu | Table 8 | | |
| 79 | +| seeds-cpu | Figure 2 (left) | Just the CPU seeds | |
| 80 | +| seeds-gpu | Figure 2 (right) | Just the GPU seeds | |
| 81 | +| seeds | Figure 2 | Both CPU and GPU seeds | |
| 82 | +| | Figure 3 | Use the output from the logs of `run.sh seeds` | |
| 83 | +| | Table 9 | Use the output from the logs of `run.sh seeds` | |
| 84 | + |
| 85 | +**Log** files are generated in the form `qqa.[dataset].log.[experiment]`, at the |
| 86 | +end of training the network performs a feed-forward pass of the datasets, which |
| 87 | +is where the numbers for the paper are extracted. |
| 88 | + |
| 89 | +**Model** files will be generated in the form: |
| 90 | +`qqa.[dataset].model.[experiment]`, to allow for feed-forward verification, or |
| 91 | +re-creation of the results without retraining the network. These models, in my |
| 92 | +experimentation, are reproducible across different hardware setups, although I |
| 93 | +would be interested in hearing of situations where they _aren't_. |
| 94 | + |
| 95 | +## Issues |
| 96 | + |
| 97 | +If you encounter any issues with the scripts etc. in this repository, then |
| 98 | +either file an issue on github, or email me. |
0 commit comments