A framework for Relation-aware Data Augmentation in Aspect-Based Sentiment Analysis (ABSA).
RDA-ABSA (Relation-aware Data Augmentation for Aspect-Based Sentiment Analysis) is a pipeline designed to enhance ABSA performance through targeted data augmentation. This repository provides tools to generate high-quality augmented data, train augmentation models, and evaluate ABSA performance.
- Python 3.8+
- PyTorch-compatible GPU (recommended for faster training)
Install all required dependencies with: pip install
- scikit-learn # For text vectorization, topic modeling, and evaluation metrics
- transformers # For Hugging Face models (tokenizers, pre-trained LLMs)
- torch # PyTorch deep learning framework
- tqdm # For progress bars in long-running tasks
- numpy # For numerical operations and array handling
- pandas # For data manipulation and processing
Follow these steps to execute the pipeline:
Generate candidate augmented texts from the original training data using DPO (Direct Preference Optimization) principles: python dpo_augmentation.py
Evaluate the quality of augmented texts using two reward models to assign scores:
python reward1.py
python reward2.py
Create a preference-aligned dataset by ranking augmented texts based on their reward scores: python preference_data.py
Fine-tune a language model to generate high-quality augmented texts using the preference dataset.
Use the trained augmentation model to produce final augmented data for ABSA training: python absa_augmentation.py
Fine-tune the ABSA model on the combined original + augmented training data.
Assess the performance of the trained ABSA model on the test set: python absa_evaluation.py
- Augmented datasets in JSON format
- Trained models saved in Hugging Face format
- Evaluation metrics including accuracy and F1-score
- Adjust hyperparameters based on your specific dataset characteristics
- The number of augmentations can be tuned based on the size of your original dataset
- All paths can be customized according to your file system structure