This is the main repository for the NLP project on FRI of the group "Vesoljci". The repository was created for a project that was part of Natural Language Processing course at the University of Ljubljana, Faculty for computer and information science.
To use this repository as intended you should have a NVIDIA GPU with appropriate NVIDIA driver and CUDA versions that are compatible with PyTorch and Tensorflow.
.
├── assets
├── config_files
├── data
│ ├── experiments
| | ├── experiment_1 # folder with dataset setup split into train and test data
| │ │ ├── model_1
| | | ├── model_2
| | | ├── test.csv
| | | └── train.csv
│ │ ├── experiment_2
| | | ├── model_1 # model configuration, saved model, results, generated graphs
| | | | ├── annotaion.csv
| | | | ├── config_dict.json
| | | | ├── model.pt
| | | | ├── results.txt
| | | | └── graph.png
| | | └── ...
| | └── ...
│ └── Termframe # Termframe dataset with its original folder structure
└── report
assets
: files used in README.mdconfig_files
: configuration parameters for each of the experimentsdata
: folder with theTermframe
dataset, all the pre-processed data is saved here after running the scripts as well as newexperiments
folder with different datasets setups and models that is created automatically by running the scripts.report
: folder with the scientific paper
- python>=3.7
- pytorch>=1.8.0
- cudatoolkit>=11.1
- dependencies in the environment.yml can be installed automatically with the comand below (solving the environment might take a while) or manually (pip is recommended for installing transformers)
- pytorch and cudatoolkit versions have to be installed manually
# create new environment
conda create --name <env_name> python=3.9
# activate environment
conda activate <env_name>
# install dependencies
conda env update -f environment.yml
# don't forget to manually install pytorch with cudatoolkit
# convert data from .tsv to .csv format
python convert_data.py
# prepare and split data for training and testing
# for sequence tagging
python prepare_data.py
#for relation extracion
python prepare_data_regions.py
# train sequence tagger
python train_sequence_tagging.py
# train relation extractor
python train_relation_extraction.py
# run experimental relation extraction (after training relation extractor)
python relation_extraction_growing_window.py
# install nltk packages (used only once, since the packages are saved locally)
python install_nltk_packages.py
# generate graphs
python generate_graphs.py