oldIT2modIT

The task of this homework consists in translating a dataset from archaic italian to modern italian using LLMs and evaluating them using an LLM-as-a-Judge. Firstly we have created a NEW dataset called "oldIT2modIT" available on HuggingFace! It contains 200 old (ancient) Italian sentences and modern Italian sentences from authors in 1200-1300 period, as Dante Alighieri, Bono Giamboni, Torquato Tasso, Ludovico Ariosto and others. For this task we have used two type of LLMs (deepseek-r1-distill-llama-70b, gemma2-9b-it) using the Groq API and one Transformer-based approach (nllb-200-distilled-600M), both original and also doing a Supervised Fine Tuning (SFT) using the previous mentioned dataset. The evaluation procedure was done using Prometheus-Eval as LLM-as-a-Judge, using rubric scores on five different metrics: Meaning Preservation, Grammar, Modern Structural Effectiveness, Completeness and Lexical Modernization. The Prometheus-Eval evaluator needs also "gold" labels (it means the translations of the original dataset.csv), but we did not have them: then, we have also translated the original dataset.csv file using ChatGPT 4o and validating manually each sentence, creating the dataset_gold.csv dataset.

⚠️TAs Instructions⚠️

Clone the repository:

git clone "https://github.com/cybernetic-m/archaicIT2modernIT.git"

Make a Groq API Key:

It is needed to run the LLMs used for the translations. Go to https://console.groq.com/keys. Do the login and click on "Create API Key" in the top right corner.

Open in Colab hw2_Romano_LissaLattanzio:

Run the LLM-based Approach You can try to re-translate the dataset using the zero-shot or the few-shot by opening the LLM based approach in the notebook.
Run the Transformer-based Approach You can try the fine tuning of the transformer and the translation using the non fine tuned and fine tuned transformers in the transformer based aproach section.

Run the LLM-as-a-Judge You can try our tournament selection and absolute evaluations in the LLM-as-a-Judge section.

(Optional) Change the config.yaml You can change:

model: Change the models (llama, deepseek, etc.) and temperature value
prompt: Edit system and user prompts for both Italian and English translations
prometheus_judge: Change the system and user prompts for both relative and absolute evaluation modes
rubrics: Modify the criteria descriptions used for scoring (e.g. grammar, completeness)
data.input_file: Point to a different CSV input dataset
rate_limit: Adjust the allowed number of requests per minute

☁️ TAs GDrive Shared Folder ☁️

On the Caponata_Lovers_hw2_shared_folder you can see different folders where we uploaded the files of the translations, we actually made also the translations using english prompts but we cut them for time reason and GPU limits of colab for the prometheus approach.

translations folder: contains the following folders of translations:

"fewShot1_temp0_it": this folder contains the translations of deepseek-lama and gemma with 1-shot, temperature 0, language of the prompt in italian.
"fewShot2_temp0_it": this folder contains the translations of deepseek-lama and gemma with 2-shot and temperature 0, language of the prompt in italian.
"fewShot3_temp0_it": this folder contains the translations of deepseek-lama and gemma with 3-shot and temperature 0, language of the prompt in italian.
"fewShot4_temp0_it": this folder contains the translations of deepseek-lama and gemma with 4-shot and temperature 0, language of the prompt in italian.
"fewShot5_temp0_it": this folder contains the translations of deepseek-lama and gemma with 5-shot and temperature 0, language of the prompt in italian.
"zeroShot_temp0_it": this folder contains the translations of deepseek-lama and gemma with 0-shot and temperature 0, language of the prompt in italian.
"transformers": this folder contains the translations of the transformer fine tuned and non fine tuned.

evaluations folder: contains the following evaluations file:

"CaponataLovers-hw2_transl-judge-deepseek": this file contains the evaluations for deepseek for each of the 5 metrics
"CaponataLovers-hw2_transl-judge-gemma": this file contains the evaluations for gemma for each of the 5 metrics
"CaponataLovers-hw2_transl-judge-transformerFt": this file contains the evaluations for the fine tuned transformer for each of the 5 metrics
"CaponataLovers-hw2_transl-judge-transformerNonFt": this file contains the evaluations for the non fine tuned transformer for each of the 5 metrics

dataset_gold file contains our dataset of 200 sentences used for the fine tuning of the transformer.

Repository Structure

.
├── LICENSE
├── README.md
├── config.yaml
├── data
│   ├── dataset.csv
│   ├── dataset_gold.csv
│   ├── examples.csv
│   └── oldIT2modIT.csv
├── deepseek_tournament_winners.txt
├── evaluations
│   ├── deepseek_evaluation.jsonl
│   ├── evaluation_guidelines.txt
│   ├── gemma_evaluation.jsonl
│   ├── transformer_ft_evaluation.jsonl
│   └── transformer_non_ft_evaluation.jsonl
├── figures
│   └── metrics_comparison.png
├── gemma2_tournament_winners.txt
├── groq_api_key.txt
├── human_correlation
│   ├── deepseek_eval_correlation.jsonl
│   ├── gemma_eval_correlation.jsonl
│   ├── transformer_ft_eval_correlation.jsonl
│   └── transformer_non_ft_eval_correlation.jsonl
├── prompt
│   ├── PromptBuilder.py
│   └── evaluation.py
├── tournament
│   └── tournament.txt
├── translations
│   ├── fewShot1_temp0_it
│   │   ├── CaponataLovers-hw2_transl-deepseek-r1-distill-llama-70b_few-shot_k-1_it_temp-0.0.jsonl
│   │   └── CaponataLovers-hw2_transl-gemma2-9b-it_few-shot_k-1_it_temp-0.0.jsonl
│   ├── fewShot2_temp0_it
│   │   ├── CaponataLovers-hw2_transl-deepseek-r1-distill-llama-70b_few-shot_k-2_it_temp-0.0.jsonl
│   │   └── CaponataLovers-hw2_transl-gemma2-9b-it_few-shot_k-2_it_temp-0.0.jsonl
│   ├── fewShot3_temp0_it
│   │   ├── CaponataLovers-hw2_transl-deepseek-r1-distill-llama-70b_few-shot_k-3_it_temp-0.0.jsonl
│   │   └── CaponataLovers-hw2_transl-gemma2-9b-it_few-shot_k-3_it_temp-0.0.jsonl
│   ├── fewShot4_temp0_it
│   │   ├── CaponataLovers-hw2_transl-deepseek-r1-distill-llama-70b_few-shot_k-4_it_temp-0.0.jsonl
│   │   └── CaponataLovers-hw2_transl-gemma2-9b-it_few-shot_k-4_it_temp-0.0.jsonl
│   ├── fewShot5_temp0_it
│   │   ├── CaponataLovers-hw2_transl-deepseek-r1-distill-llama-70b_few-shot_k-5_it_temp-0.0.jsonl
│   │   └── CaponataLovers-hw2_transl-gemma2-9b-it_few-shot_k-5_it_temp-0.0.jsonl
│   ├── transformers
│   │   ├── CaponataLovers-hw2_transl-nllb-200-distilled-600M-finetuned.jsonl
│   │   └── CaponataLovers-hw2_transl-nllb-200-distilled-600M.jsonl
│   └── zeroShot_temp0_it
│       ├── CaponataLovers-hw2_transl-deepseek-r1-distill-llama-70b_zero-shot_it_temp-0.0.jsonl
│       └── CaponataLovers-hw2_transl-gemma2-9b-it_zero-shot_it_temp-0.0.jsonl
└── utils
   ├── cohen_kappa.py
   ├── config.py
   ├── evaluation.py
   ├── evaluation_prometheus.py
   └── translate.py

Authors

[1] Massimo Romano

[2] Antonio Lissa Lattanzio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

oldIT2modIT

⚠️TAs Instructions⚠️

☁️ TAs GDrive Shared Folder ☁️

Repository Structure

Authors

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
data		data
evaluations		evaluations
figures		figures
human_correlation		human_correlation
images		images
prompt		prompt
tournament		tournament
translations		translations
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Report_HW2_NLP_Romano_LissaLattanzio.pdf		Report_HW2_NLP_Romano_LissaLattanzio.pdf
config.yaml		config.yaml
deepseek_tournament_winners.txt		deepseek_tournament_winners.txt
gemma2_tournament_winners.txt		gemma2_tournament_winners.txt
hw2_romano_lissalattanzio.ipynb		hw2_romano_lissalattanzio.ipynb

License

cybernetic-m/oldIT2modIT

Folders and files

Latest commit

History

Repository files navigation

oldIT2modIT

⚠️TAs Instructions⚠️

☁️ TAs GDrive Shared Folder ☁️

Repository Structure

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages