Skip to content

Commit fadb45f

Browse files
committed
Initial commit
0 parents  commit fadb45f

30 files changed

+4879
-0
lines changed

.gitignore

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
*.egg-info/
24+
.installed.cfg
25+
*.egg
26+
MANIFEST
27+
28+
# PyInstaller
29+
# Usually these files are written by a python script from a template
30+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
31+
*.manifest
32+
*.spec
33+
34+
# Installer logs
35+
pip-log.txt
36+
pip-delete-this-directory.txt
37+
38+
# Unit test / coverage reports
39+
htmlcov/
40+
.tox/
41+
.coverage
42+
.coverage.*
43+
.cache
44+
nosetests.xml
45+
coverage.xml
46+
*.cover
47+
.hypothesis/
48+
.pytest_cache/
49+
50+
# Translations
51+
*.mo
52+
*.pot
53+
54+
# Django stuff:
55+
*.log
56+
local_settings.py
57+
db.sqlite3
58+
59+
# Flask stuff:
60+
instance/
61+
.webassets-cache
62+
63+
# Scrapy stuff:
64+
.scrapy
65+
66+
# Sphinx documentation
67+
docs/_build/
68+
69+
# PyBuilder
70+
target/
71+
72+
# Jupyter Notebook
73+
.ipynb_checkpoints
74+
75+
# pyenv
76+
.python-version
77+
78+
# celery beat schedule file
79+
celerybeat-schedule
80+
81+
# SageMath parsed files
82+
*.sage.py
83+
84+
# Environments
85+
.env
86+
.venv
87+
env/
88+
venv/
89+
ENV/
90+
env.bak/
91+
venv.bak/
92+
93+
# Spyder project settings
94+
.spyderproject
95+
.spyproject
96+
97+
# Rope project settings
98+
.ropeproject
99+
100+
# mkdocs documentation
101+
/site
102+
103+
# mypy
104+
.mypy_cache/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) Microsoft Corporation. All rights reserved.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE

README.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
# Introduction
2+
This is PointSQL, the source codes of [Natural Language to Structured Query Generation via Meta-Learning](https://arxiv.org/abs/1803.02400)
3+
and [Pointing Out SQL Queries From Text](https://www.microsoft.com/en-us/research/publication/pointing-sql-queries-text) from Microsoft Research.
4+
We present the setup for the WikiSQL experiments.
5+
6+
7+
# Training a New Model
8+
9+
## Data Pre-processing
10+
11+
- Download a preprocessed dataset [link](https://1drv.ms/u/s!AryzSDJYB5TxnDWZtpb3ZjL3xBny) to `input/`
12+
- Untar the file `tar -xvjf input.tar.bz2`
13+
14+
#### Reproduce Preprocess Steps
15+
16+
1. Download data from [WikiSQL](https://github.com/salesforce/WikiSQL).
17+
18+
```
19+
$ cd wikisql_data
20+
$ wget https://github.com/salesforce/WikiSQL/raw/master/data.tar.bz2
21+
$ tar -xvjf data.tar.bz2
22+
```
23+
2. Put the [lib directory](https://github.com/salesforce/WikiSQL/tree/master/lib) under `wikisql_data/scripts/`
24+
3. Run annotation using Stanza and preproces the dataset
25+
```
26+
$ cd wikisql_data/scripts/
27+
$ python annotate.py
28+
$ python prepare.py
29+
```
30+
31+
4. Put the train/dev/test data into ``input/data`` for model training/testing.
32+
5. Use relevance function to prepare relevance files and put them under ``input/nl2prog_input_support_rank``
33+
```
34+
python wikisql_data/scripts/relevance.py
35+
```
36+
6. Download pretrained embeddings from [glove](https://nlp.stanford.edu/projects/glove/) and [character n-gram embeddings](http://www.logos.t.u-tokyo.ac.jp/~hassy/publications/arxiv2016jmt/) and put them under ``input/``
37+
38+
39+
40+
## Training
41+
Meta + Sum loss training
42+
```
43+
$ OUTDIR=output/meta_sum
44+
$ mkdir $OUTDIR
45+
$ python run.py --input-dir ./input \
46+
--output-dir $OUTDIR \
47+
--config config/nl2prog.meta_2_0.001.rank.config \
48+
--meta_learning_rate 0.001 --gradient_clip_norm 5 \
49+
--num_layers 3 --num_meta_example 2 \
50+
--meta_learning --production
51+
```
52+
53+
## Evaluation
54+
- Due to the preprocessing error, we ignore some development (see ``input/data/wikisql_err_dev.dat``) and test (see ``input/data/wikisql_err_test.dat``) set examples, we treat them as incorrect directly.
55+
- Run evaluation as follows (replace ``model_zoo/meta_sum/table_nl_prog-40`` with ``$OUTDIR/table_nl_prog-??`` with the last checkpoint in the folder):
56+
57+
- Development set
58+
```
59+
$ mkdir -p ${OUTDIR}_dev
60+
$ python run.py --input-dir ./input --output-dir ${OUTDIR}_dev \
61+
--config config/nl2prog.meta_2_0.001.rank.devconfig \
62+
--meta_learning --test-model model_zoo/meta_sum/table_nl_prog-40 --production
63+
```
64+
* Run execution for developement set as follows:
65+
```
66+
$ cp ${OUTDIR}_dev/test_top_1.log dev_top_1.log
67+
$ python2 execute_dev.py
68+
#Q2 (predition) result is wrong: 1254
69+
#Q1 or Q2 fail to parse: 0
70+
#Q1 (ground truth) exec to None: 20
71+
#Q1 (ground truth) failed to execute: 0
72+
Logical Form Accuracy: 0.631383269546
73+
Execute Accuracy: 0.68277747403
74+
```
75+
- Test set
76+
```
77+
$ mkdir -p ${OUTDIR}_test
78+
$ python run.py --input-dir ./input --output-dir ${OUTDIR}_test \
79+
--config config/nl2prog.meta_2_0.001.rank.testconfig \
80+
--meta_learning --test-model model_zoo/meta_sum/table_nl_prog-40 --production
81+
```
82+
* Run execution for test set as follows:
83+
```
84+
$ cp ${OUTDIR}_test/test_top_1.log .
85+
$ python2 execute.py
86+
#Q2 (predition) result is wrong: 2556
87+
#Q1 or Q2 fail to parse: 0
88+
#Q1 (ground truth) exec to None: 48
89+
#Q1 (ground truth) failed to execute: 0
90+
Logical Form Accuracy: 0.628073829775
91+
Execute Accuracy: 0.680379563733
92+
```
93+
94+
- Baseline model on test set
95+
```
96+
$ OUTDIR=output/base_sum
97+
$ python run.py --input-dir ./input --output-dir ${OUTDIR}_test \
98+
--config config/nl2prog.testconfig --production \
99+
--test-model model_zoo/base_sum/table_nl_prog-79 --production
100+
```
101+
102+
* Run execution for the baseline model on test set as follows:
103+
```
104+
$ cp ${OUTDIR}_test/test_top_1.log .
105+
$ python2 execute.py
106+
#Q2 (predition) result is wrong: 2636
107+
#Q1 or Q2 fail to parse: 0
108+
#Q1 (ground truth) exec to None: 48
109+
#Q1 (ground truth) failed to execute: 0
110+
Logical Form Accuracy: 0.614592374009
111+
Execute Accuracy: 0.668055314471
112+
```
113+
114+
115+
# Pre-trained Models
116+
- Download [pretrained model checkpoints](https://1drv.ms/u/s!AryzSDJYB5TxnDR5I4rYjLi4HUYz) to ``model_zoo/``
117+
- Run ``tar -xvjf model_zoo.tar.bz2`` to extract pretrain models.
118+
119+
+ Meta + Sum loss: `model_zoo/meta_sum`
120+
+ Base Sum loss: `model_zoo/base_sum`
121+
122+
123+
# Requirements
124+
- Tensorflow 1.4
125+
- python 3.6
126+
- [Stanza](https://github.com/stanfordnlp/stanza)
127+
128+
129+
# Citation
130+
131+
If you use the code in your paper, then please cite it as:
132+
133+
```
134+
@inproceedings{pshuang2018PT-MAML,
135+
author = {Po{-}Sen Huang and
136+
Chenglong Wang and
137+
Rishabh Singh and
138+
Wen-tau Yih and
139+
Xiaodong He},
140+
title = {Natural Language to Structured Query Generation via Meta-Learning},
141+
booktitle = {NAACL},
142+
year = {2018},
143+
}
144+
```
145+
146+
and
147+
148+
149+
```
150+
@techreport{chenglong,
151+
author = {Wang, Chenglong and Brockschmidt, Marc and Singh, Rishabh},
152+
title = {Pointing Out {SQL} Queries From Text},
153+
number = {MSR-TR-2017-45},
154+
year = {2017},
155+
month = {November},
156+
url = {https://www.microsoft.com/en-us/research/publication/pointing-sql-queries-text/},
157+
}
158+
```
159+
160+
161+
162+
# Contributing
163+
164+
This project welcomes contributions and suggestions. Most contributions require you to agree to a
165+
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
166+
the rights to use your contribution. For details, visit https://cla.microsoft.com.
167+
168+
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
169+
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
170+
provided by the bot. You will only need to do this once across all repos using our CLA.
171+
172+
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
173+
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
174+
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

app/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)