Skip to content

This repository contains all the relevant data and code files for DLT Project

Notifications You must be signed in to change notification settings

Abhi23run/CSE8803_DLT_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSE8803_DLT_Project

This repository contains all the relevant data and code files for our DLT Project.

made-with-python Pytorch HuggingFace

-----------------------------------------------------

Abstract

Financial reports offer critical insights into a company’s operations, yet their extensive length—typically spanning 30-40 pages—poses challenges for swift decision-making in dynamic markets. To address this, we leveraged fine-tuned Language Models (LLMs) to distill key indicators and operational metrics from these reports. We devised a method to locate critical data, and leverage the FinQA dataset to fine-tune both Llama 2 7B and T5 models for customized question answering. We achieved 65% accuracy on final numerical answer, a competitive accuracy in numerical reasoning and calculation.

-----------------------------------------------------

Methodology


Figure 1: LLMs to do Numerical Reasoning on annual reports (Pipeline)

-----------------------------------------------------

Conclusion

As part of this exercise we have been able to engineer a robust approach for numerical question answering from PDF reports by leveraging advanced natural language processing techniques, based on T5, Llama-2 and Langchain. Despite the overall success, certain nuances required careful consideration. Parsing non-conventional tables presented challenges, as the approach encountered limitations in handling unconventional table structures commonly found in complex reports. Additionally, errors in the table-to-text serialization process posed another obstacle, demanding a refined post-processing pipeline to enhance the accuracy of the answers generated. Furthermore, it was observed that certain aspects of question-answering with large language models (LLMs), exhibited subpar performance,necessitating ongoing efforts to address and improve model performance. In conclusion, while our approach demonstrated commendable results in numerical question answering from PDF reports, continual refinement is essential to overcome challenges related to non-conventional tables, serialization errors, and performance nuances associated with large language models in question-answering tasks. These insights guide our ongoing efforts to enhance the robustness and applicability of our approach.

-----------------------------------------------------

Important Links:

About

This repository contains all the relevant data and code files for DLT Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •