GitHub - vered1986/lexcomp: Evaluating Text Representations on Lexical Composition

Evaluating Text Representations on Lexical Composition

Dependencies

Python 3
argparse
allennlp (0.8.1)

Downloading Data:

Download the pre-trained models using bash download.sh.

The VPC classification and LVC classification tasks need a copy of the BNC corpus. Please download the XML version from here, and update its path in the JSON files.

Once you do, you will need to extract the sentences themselves:

python preprocessing/get_sentences_from_bnc.py \ 
    [/path/to/corpora]/bnc/2554/download/Texts/ \ 
    diagnostic_classifiers/data/vpc_classification/ \ 
    diagnostic_classifiers/data/vpc_classification

Running experiments:

To train all the models for a given task, e.g. NC literality, run:

bash diagnostic_classifiers/experiments/nc_literality/train.sh

To evaluate:

bash diagnostic_classifiers/experiments/nc_literality/evaluate.sh

To get the predictions for the test set:

bash diagnostic_classifiers/experiments/nc_literality/predict.sh

Adding a new task:

You will need to create a directory under experiments with the JSON files specifying the architecture and hyper-parameters. Each model requires a DatasetReader, Model, and a Predictor. You can use the ones implemented in this repository or implement new ones according to the specific model's needs.

See the AllenNLP tutorial for additional instructions on configuring models.

If you'd like to create new data, follow the preprocessing instructions.

Adding a new representation:

You will need to implement a new TokenIndexer and TokenEmbedder or TextFieldEmbedder. The first takes a sequence of words and returns their IDs, and the second gets the IDs and returns the vectors. Look at the implementations in this repository and in the AllenNLP repository, and read the documentation there.

You will also need to add a JSON file for the task + representation combination and add the command to the train/evaluate/predict bash files.

Citation

Still a Pain in the Neck: Evaluating Text Representations on Lexical Composition

Vered Shwartz and Ido Dagan. arXiv 2019.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
diagnostic_classifiers		diagnostic_classifiers
preprocessing		preprocessing
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
download.sh		download.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Text Representations on Lexical Composition

Dependencies

Downloading Data:

Running experiments:

Adding a new task:

Adding a new representation:

Citation

About

Releases

Packages

Languages

License

vered1986/lexcomp

Folders and files

Latest commit

History

Repository files navigation

Evaluating Text Representations on Lexical Composition

Dependencies

Downloading Data:

Running experiments:

Adding a new task:

Adding a new representation:

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages