Skip to content

Commit 3688dab

Browse files
authored
Merge pull request #1 from Microsoft/data_v2
Update dataset v2 preprocessing script/readme
2 parents fadb45f + ba7d6d8 commit 3688dab

File tree

2 files changed

+372
-0
lines changed

2 files changed

+372
-0
lines changed

README.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ We present the setup for the WikiSQL experiments.
1111
- Download a preprocessed dataset [link](https://1drv.ms/u/s!AryzSDJYB5TxnDWZtpb3ZjL3xBny) to `input/`
1212
- Untar the file `tar -xvjf input.tar.bz2`
1313

14+
1415
#### Reproduce Preprocess Steps
1516

1617
1. Download data from [WikiSQL](https://github.com/salesforce/WikiSQL).
@@ -36,6 +37,11 @@ python wikisql_data/scripts/relevance.py
3637
6. Download pretrained embeddings from [glove](https://nlp.stanford.edu/projects/glove/) and [character n-gram embeddings](http://www.logos.t.u-tokyo.ac.jp/~hassy/publications/arxiv2016jmt/) and put them under ``input/``
3738

3839

40+
#### Note we use a new preprocessed dataset (v2) in the [Execute-Guided Decoding](https://arxiv.org/abs/1807.03100) paper
41+
- A preprocessed dataset can be found [here](https://1drv.ms/u/s!AryzSDJYB5TxnF31OCt_4to7uY2t), where the ``wikisql_train.dat``, ``wikisql_test.dat``, ``wikisql_dev.dat`` are the files that can be directly used in training.
42+
43+
Note: the version 2 dataset matches the v1.1 release of [WikiSQL](https://github.com/salesforce/WikiSQL). The preprocessing script ``wikisql_data/scripts/prepare_v2.py`` (python3 required) processes WikiSQL v1.1 raw data and table files to generate ``wikisql_train.dat``, ``wikisql_test.dat``, ``wikisql_dev.dat``.
44+
3945

4046
## Training
4147
Meta + Sum loss training
@@ -130,6 +136,8 @@ $ python run.py --input-dir ./input --output-dir ${OUTDIR}_test \
130136

131137
If you use the code in your paper, then please cite it as:
132138

139+
140+
133141
```
134142
@inproceedings{pshuang2018PT-MAML,
135143
author = {Po{-}Sen Huang and
@@ -143,6 +151,21 @@ If you use the code in your paper, then please cite it as:
143151
}
144152
```
145153

154+
155+
```
156+
@inproceedings{2018executionguided,
157+
author = {Chenglong Wang and
158+
Po{-}Sen Huang and
159+
Alex Polozov and
160+
Marc Brockschmidt and
161+
Rishabh Singh},
162+
title = "{Execution-Guided Neural Program Decoding}",
163+
booktitle = {ICML workshop on Neural Abstract Machines & Program Induction v2 (NAMPI)},
164+
year = {2018}
165+
}
166+
```
167+
168+
146169
and
147170

148171

0 commit comments

Comments
 (0)