HKEx Announcement Classifier is a project on data exploration, analysis and finally training a recurrent neural network (RNN) to ~93% validation accuracy to classify disclosure announcements submitted by listed companies on the Hong Kong Stock Exchange (HKEx).
In this project, I scraped data corresponding to 6 types of announcements, namely:
(1) Trading Halt;
(2) Notifiable Transactions;
(3) Connected Transactions;
(4) Announcement of Annual Results;
(5) Notice of Annual General Meeting; and
(6) Takeover Offers.
The ability to accurate identify categories of legal text data is an immensely useful building block for legaltech or quantitative trading applications.
For example:
- Accurately classifying legal texts can help lawyers drastically increase their efficiency in AI-assisted legal due diligence.
- The ability to classify important disclosure announcements on different stock exchanges is incredibly useful in news-based quantitative trading algorithms.
The types of data can, in a future iteration of this project, include different types of contracts such as loan agreements, lease agreements, joint venture agreements, so on and so forth. The scope of jurisdiction can easily extend beyond Hong Kong, such as to include offshore jurisdictions (British Virgin Islands, Cayman Islands etc.) and international trading hubs (US, UK, EU, China) for a more powerful legal text classifier.
Includes bi-gram analysis and average word count analysis across different types of announcements. The implementation and markdown write-up can be found here.
Bigrams for Announcements of Annual General Meetings:
Bigrams for Announcements of Annual Results:
Bigrams for Announcements of Trading Halts:
Bigrams for Announcements of Connected Transactions:
Bigrams for Announcements of Notifiable Transactions:
Bigrams for Announcements of Takeover Offers:
I implemented a recurrent neural network with a bidirectional LSTM layer after an embedding layer using pre-trained GloVe embeddings of 300 dimensions. The details of my implementation can can be found here.
The model architecture is as follows:
The trained neural network was able to accurately classify 93.55% of announcements in the validation set and it was able to accurately identify new announcements on the HKEx, provided that the categories of such announcements were within the training data.
Further improvements on the model would include more data from different jurisdictions, and from different types of legal documents, so as to create a more general and more accurate legal text classifier.