This repository provides a comprehensive kit of tools that are necessary for generating, tailoring and utilizing ABC Treebank data. The CLI interface and most of the functionalities are implemented with Python 3.9. Part of the functionalities are implemented by other langauges but all wrapped with Python.
- System: Linux, tested with Debian 11 (bullseye)
- Need to be installed beforehand:
- Java SE (>= 8)
- m4
- sed
- GNU awk
- Ruby (>= 3.1)
- Python (>= 3.9) and that required packages listed in
pyproject.toml
- Automatically prepared in building the python bdist
- Stanford Tregex (tested with 4.2.0)
- Optional dependencies:
- (language models)
Besides those named above,
are also required.
A wheel package is provided for every release in the release page.
Here is the tranditional way of installation using pip First, make sure that you are using version 20.3 or later, which is compatible with PEP 600:
pip --version
And just install from a wheel package:
pip install <package>.whl
For the parsing support:
pip install '<package>.whl[parser]'
For the trainer support:
pip install '<package>.whl[parser, ml]'
python -m venv .venv
.venv/bin/activate
pip install --upgrade pip
pip install '<package>.whl[parser, ml]'
# change the name and the extras
# according to the package version you get and what you need from extras
Using pipx is recommended so that the running environment can be isolated, avoiding to affect other Python packages.
pipx install './<package>.whl[extras]' --python python3.X
# fill in the blanks to meet your need
# invoke the CLI
abctk --help
NOTE: the path prefix "./" is essential, without which pipx won't work. See this Github issue.
This repository is organized with Poetry.
git clone https://github.com/ABCTreebank/ABCT-toolkit
cd ABCT-toolkit
poetry install
This repository also provides a VS Code devcontainer (based on Docker) containing all necessary components for development.
Refer to the help document of the abctk
command.
See the LICENSE file.
This package reuses the following codes:
- Prof. Alastair Butler's wrapper of Tsurgeon
- Prof. Yoshikawa Masashi's python scripts for AllenNLP model training and parsing
- Mr. Vijith Assar's lit