CSE8803_DLT_Project

This repository contains all the relevant data and code files for our DLT Project.

Abstract

Financial reports offer critical insights into a company’s operations, yet their extensive length—typically spanning 30-40 pages—poses challenges for swift decision-making in dynamic markets. To address this, we leveraged fine-tuned Language Models (LLMs) to distill key indicators and operational metrics from these reports. We devised a method to locate critical data, and leverage the FinQA dataset to fine-tune both Llama 2 7B and T5 models for customized question answering. We achieved 65% accuracy on final numerical answer, a competitive accuracy in numerical reasoning and calculation.

Methodology

Figure 1: LLMs to do Numerical Reasoning on annual reports (Pipeline)

Conclusion

As part of this exercise we have been able to engineer a robust approach for numerical question answering from PDF reports by leveraging advanced natural language processing techniques, based on T5, Llama-2 and Langchain. Despite the overall success, certain nuances required careful consideration. Parsing non-conventional tables presented challenges, as the approach encountered limitations in handling unconventional table structures commonly found in complex reports. Additionally, errors in the table-to-text serialization process posed another obstacle, demanding a refined post-processing pipeline to enhance the accuracy of the answers generated. Furthermore, it was observed that certain aspects of question-answering with large language models (LLMs), exhibited subpar performance,necessitating ongoing efforts to address and improve model performance. In conclusion, while our approach demonstrated commendable results in numerical question answering from PDF reports, continual refinement is essential to overcome challenges related to non-conventional tables, serialization errors, and performance nuances associated with large language models in question-answering tasks. These insights guide our ongoing efforts to enhance the robustness and applicability of our approach.

Important Links:

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
CodeBase		CodeBase
Data		Data
Images		Images
Papers		Papers
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSE8803_DLT_Project

About

Releases

Packages

Contributors 4

Languages

Abhi23run/CSE8803_DLT_Project

Folders and files

Latest commit

History

Repository files navigation

CSE8803_DLT_Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages