FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability
This repository contains 2 samples (sample-1, sample-2) from the dataset mentioned in the paper: FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability (accepted at The Financial Narrative Processing Workshop colocated with LREC-2022, Marseille, France).
In addition to this, data collection & cleaning scripts, embedding extraction & model development script, and a starter example are also present. You can dowloand the model along with the weights from Hugging Face.
The embeddings & labels of the full dataset are available in the embeddings_and_labels directory. Several model artifacts developed by training classiers like Logistic Regression, GBM, Random Forest on the entire dataset have been made available in the models directory.
To access the raw version of the full dataset from https://huggingface.co/datasets/sohomghosh/FinRAD_Financial_Readability_Assessment_Dataset. Also, please send a request by filling this form. You can also re-create the raw datasets using the data collection & cleaning scripts.
Primary Columns:
"terms": This is the financial term
"definitions": This is the definition corresponding to the financial term
"source": This represents the source from which the term and the definition has been obtained.
"assigned_readability": This is the manually assigned readability. 0 means not readable, 1 means readable.
Other Columns:
"flesch_reading_ease", "flesch_kincaid_grade", "smog_index", "coleman_liau_index", "automated_readability_index", "dale_chall_readability_score", "linsear_write_formula", "gunning_fog"
These are readability scores extracted using the textstat library
Metadata of source
Tag | Description | Assigned Readability |
---|---|---|
prin | Principles of Corporate Finance by Richard A. Brealey, Stewart C. Myers, Franklin Allen | 0 |
zvi | Investments by Zvi Bodie Alex Kane Alan J. Marcus | 0 |
sam | Economics Textbook by Paul Samuelson and William Nordhaus | 1 |
opod | Options, Futures, and Other Derivatives, Global Edition by John C. Hull | 0 |
fmi | Financial Markets and Institutions by Frederic S. Mishkin Stanley Eakins | 0 |
ncert_keec111 | NCERT Indian Economic Development Economics Class 11 | 1 |
ncert_kest | NCERT Statistics for Economics Class 12 | 1 |
ncert | NCERT Introduction to MacroEconomics Class 12 | 1 |
ncert_class12_econ | NCERT Introduction to MicroEconomics Class 12 | 1 |
investopedia | Investopedia Data Dictionary | 1 |
economist | The Economist terms dictionary | 1 |
6_8_louis | Glossary of Economics and Personal Finance Terms from Federal Reserve Bank of St. Louis | 1 |
9_12_louis | Glossary of Economics and Personal Finance Terms from Federal Reserve Bank of St. Louis | 1 |
pre_louis | Glossary of Economics and Personal Finance Terms from Federal Reserve Bank of St. Louis | 1 |
palgrave | The Palgrave Macmillan Dictionary of Finance, Investment and Banking by Erik Banks | 0 |
If you find this repository helpful, feel free to cite our forthcoming publication [FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability](to be updated):
@InProceedings{ghosh-EtAl:2022:FNP,
author = {Ghosh, Sohom and Sengupta, Shovon and Naskar, Sudip Kumar and Singh, Sunny Kumar},
title = {FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability},
booktitle = {Proceedings of the The 4th Financial Narrative Processing Workshop @LREC2022},
month = {June},
year = {2022},
address = {Marseille, France},
publisher = {European Language Resources Association},
pages = {1--9},
url = {http://www.lrec-conf.org/proceedings/lrec2022/workshops/FNP/pdf/2022.fnp-1.1.pdf}
}
and our demo/tool presented at ICON 2021. The artifacts of this demo are available in the old_model_FinRead directory.
New model trained on 13K+ instances (using Logistic Regression): HuggingFace Spaces link
Old model trained on 8K+ instances (using lightgbm classifier): Google Colab link
@inproceedings{ghosh-etal-2021-finread,
title = "{F}in{R}ead: A Transfer Learning Based Tool to Assess Readability of Definitions of Financial Terms",
author = "Ghosh, Sohom and
Sengupta, Shovon and
Naskar, Sudip and
Singh, Sunny Kumar",
booktitle = "Proceedings of the 18th International Conference on Natural Language Processing (ICON)",
month = dec,
year = "2021",
address = "National Institute of Technology Silchar, Silchar, India",
publisher = "NLP Association of India (NLPAI)",
url = "https://aclanthology.org/2021.icon-main.81",
pages = "658--659"
}
Contact: sohom1ghosh@gmail.com
For any part of this work for which the license is applicable, this work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 Internationallicense. See LICENSE.CC-BY-NC-SA-4.0.