This repository contains the implementation of a Bengali News Article Summarization model, utilizing two powerful variants:
- banglaT5: A dedicated Bengali transformer model designed for natural language processing tasks, including summarization.
- mT5 (Multilingual Text-to-Text Transfer Transformer 5): A versatile multilingual transformer model capable of handling various NLP tasks, including summarization, across multiple languages.
The model was trained on the Bengali News Summarization Dataset from Kaggle.
The performance of both summarization models is assessed using the following evaluation metrics:
- CER (Character Error Rate): Measures the percentage of characters that are incorrectly predicted in the generated summary compared to the reference summary.
- WER (Word Error Rate): Similar to CER but operates at the word level, measuring the percentage of words that differ between the generated and reference summaries.
- BLEU (Bilingual Evaluation Understudy): Evaluates the quality of the generated summary by comparing it to reference summaries based on precision and recall of n-grams.
- Exact Match: Calculates the percentage of generated summaries that match the reference summaries exactly, without any differences.
- ROUGE (Recall-Oriented Understudy for Gisting Evaluation): Measures the overlap of n-grams (unigrams, bigrams, etc.) between the generated and reference summaries.