Balancing Fluency and Adherence: Hybrid Fallback Term Injection in Low-Resource Terminology Translation
This repository contains the accompanying code for the paper:
Balancing Fluency and Adherence: Hybrid Fallback Term Injection in Low-Resource Terminology Translation
Kurt Abela¹, Marc Tanti², and Claudia Borg¹
¹Department of Artificial Intelligence, University of Malta
²Institute of Linguistics and Language Technology, University of Malta
Accepted in LoResMT 2026.
If you use this code or our findings in your research, please cite:
@inproceedings{abela2026balancing,
title={Balancing Fluency and Adherence: Hybrid Fallback Term Injection in Low-Resource Terminology Translation},
author={Abela, Kurt and Tanti, Marc and Borg, Claudia},
booktitle={Proceedings of the 9th Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT)},
year={2026}
}This project explores strategies for injecting terminology into Machine Translation (MT) models for low-resource languages (Maltese and Slovak). We propose a Hybrid Fallback approach that combines the fluency of static constrained training (Acontextual Drill) with the high adherence of Constrained Beam Search (CBS), using the latter only when the static model fails to include the required terminology.
To use these scripts, your project directory should be organized as follows. Note that subdirectories within data/ and models/ depend on the target language being processed (e.g., mt for Maltese, sk for Slovak).
.
├── fairseq/ # Modified Fairseq library
├── scripts/ # Training, generation, and evaluation scripts
├── data/ # Training and test datasets (Language-specific subsets)
│ ├── processed/ # Maltese processed data
│ └── slovak/ # Slovak processed data
├── models/ # Maltese model checkpoints
├── models_sk/ # Slovak model checkpoints
├── results/ # Maltese translation outputs and analysis
├── results_sk/ # Slovak translation outputs and analysis
└── static_vocabs/ # Pre-defined dictionaries for Fairseq
The scripts are designed to run in a Slurm-managed environment with CUDA support.
-
Clone the repository:
git clone <repository-url> cd Balancing-Fluency-and-Adherence
-
Environment Setup: The scripts (
train_baseline.sh, etc.) will automatically attempt to create a Conda environment and install dependencies. To do this manually:conda create -n constrained_mt python=3.8 -y conda activate constrained_mt pip install -r requirements.txt cd fairseq pip install --editable ./ python setup.py build_ext --inplace
All main workflows are provided as Slurm shell scripts in the scripts/ directory. Each script takes a language pair as its first argument (en-mt or en-sk).
Trains a standard Transformer model on the initial parallel dataset.
sbatch scripts/train_baseline.sh en-mtFine-tunes the baseline model using various strategies:
- M1_control: Fine-tuned on in-domain parallel data.
- M2_augmented: Fine-tuned on data with inline term annotations.
- M3_drill: Static constrained training (Acontextual Drill).
- M4_generic_drill: Drill training using only the terminology dictionary.
- M5_drill_seen: Drill training limited to terms present in the training set.
- M6_noun_drill: Drill training focused on noun-classified terms.
sbatch scripts/finetune_models.sh en-mtGenerates translations for all models, including Constrained Beam Search (CBS) variants and the Hybrid Fallback model.
sbatch scripts/generate_translations.sh en-mtCalculates metrics (BLEU, chrF++, COMET, TIR) and performs significance testing. Results are saved in an Excel report.
sbatch scripts/evaluate.sh en-mtThe scripts use a PROJECT_ROOT variable which defaults to the current working directory ($(pwd)). If you are running the scripts from a different location, ensure you modify this path at the top of the shell scripts.