Objective: To apply the TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) method to identify the best pre-trained conversational AI model suitable for text generation and chatbot tasks.
Domain: Text Conversational Models (Roll Numbers ending with 4 or 9).
We evaluated 5 pre-trained models based on expert ratings (scale 1-10) across 5 key performance criteria.
- DialoGPT
- BlenderBot
- GPT-2
- T5
- ALBERT
| Criterion | Impact | Weight | Description |
|---|---|---|---|
| Response Quality | Benefit (+) | 0.30 | Coherence and relevance of generated replies. |
| Context Understanding | Benefit (+) | 0.25 | Ability to maintain context over turns. |
| Computational Cost | Cost (-) | 0.15 | Resources required for inference. |
| Model Size | Cost (-) | 0.10 | Memory footprint of the model. |
| Ease of Fine-Tuning | Benefit (+) | 0.20 | Flexibility for domain adaptation. |
The ranking was performed using the TOPSIS method, which selects the alternative that is closest to the ideal positive solution and farthest from the ideal negative solution.
- Normalization: Converted the decision matrix to a normalized scale.
- Weighting: Applied the predefined weights to the normalized matrix.
- Ideal Solutions: Determined the best (V+) and worst (V-) value for each criterion.
- Separation Measures: Calculated Euclidean distances from ideal best and worst.
-
Scoring: Computed the final TOPSIS score (
$S_i$ ) and ranked the models.
Based on the analysis, BlenderBot achieved the highest score, making it the most suitable model for this specific configuration of weights and criteria.
(Note: The table below is generated from the Python script)
| Model | TOPSIS Score | Rank |
|---|---|---|
| BlenderBot | 0.824 | 1 |
| T5 | 0.541 | 2 |
| DialoGPT | 0.485 | 3 |
| GPT-2 | 0.412 | 4 |
| ALBERT | 0.231 | 5 |
The following bar chart illustrates the comparative performance of the models.
BlenderBot is ranked as the best pre-trained conversational model in this analysis, largely due to its superior scores in Response Quality and Context Understanding, which were heavily weighted (combined 55%). While ALBERT had lower costs, its performance scores were not sufficient to offset the benefits provided by larger models like BlenderBot.
