Notebook Guide:
-The Data Preparation notebook includes all the preprocessing and transformations made to convert the tables to text for the ToTTo dataset.
-The Modeling Current Version notebook includes the modeling code for ToTTo.
-The QTSumm All notebook includes everything I did for QTSumm (preprocessing + modeling).
Info:
The goal is to provide table summaries that include only the requested information.
The evaluation metrics used are: METEOR, ROUGE and SacreBleu.
Regarding ToTTo the models were evaluated on the whole test set as well as the top 5 most populated domains.
This is a Capstone Project conducted in partnership with the company Incelligent IKE for the completion of my MSc in Data Science.
For Llama2 7B chat there are no available results as the GPU RAM needs for inference exceed the available 40GB.
For more information please check the Thesis Presentation file and the Thesis Report.
Demo for ToTTo (T5-base generated summaries vs provided reference summaries):
To verify that the given information is valid, the corresponding Wikipedia Tables are provided:
1st Table (Snippet):