Skip to content

Commit bbb4869

Browse files
authored
Create README.md
Description of the deployed example
1 parent b60fa78 commit bbb4869

File tree

1 file changed

+58
-0
lines changed

1 file changed

+58
-0
lines changed

README.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Transcribe SFRC tables with Mistral AI
2+
3+
Example Python code to transcribe tables from regulatory filings into a digital form. To run these examples you will need an [Anaconda environment](https://www.anaconda.com/), a Mistral [API key](https://docs.mistral.ai/getting-started/quickstart/).
4+
In this example we transcribed the balance sheet table from Solvency and Financial Conditions reports that companies need to file every year.
5+
6+
For a subset we took the main 18 life insurance companies operating on the Italian market.
7+
8+
## Companies in scope
9+
10+
- Credemvita S.p.A.
11+
- AXA MPS Assicurazioni Vita
12+
- CRÈDIT AGRICOLE VITA
13+
- Società Reale Mutua di Assicurazioni
14+
- Cardif Vita S.p.A.
15+
- MEDIOLANUM VITA S.p.A.
16+
- Generali Italia S.p.A.
17+
- Banco BPM Vita S.p.A.
18+
- HDI ASSICURAZIONI S.p.A.
19+
- Gruppo Assicurativo Poste Vita
20+
- FIDEURAM VITA S.P.A.
21+
- CNP Vita Assicura S.p.A.
22+
- ITAS VITA
23+
- Helvetia Vita S.p.A.
24+
- Vittoria Assicurazioni S.p.A.
25+
- GROUPAMA ASSICURAZIONI S.P.A.
26+
- UniCredit Allianz Vita S.p.A.
27+
- Zurich Investments Life S.p.A.
28+
29+
## Description of the process
30+
31+
The process of extraction is performed in 5 phases.
32+
33+
### Phase 1: Find the reports and identify the relevant tables (manually).
34+
1) Identify the new SFCR report and save it into the folder Input.
35+
2) Identify the pages where the tables of interest are.
36+
3) Compile the map of the company run in the master_list.csv.
37+
38+
### Phase 2: Run the Extraction notebook (released on 23-September-2025).
39+
The notebook performs the following steps (with slight modifications depending on the table format):
40+
1) Save the page with the table into a separate folder Single_pdf.
41+
2) Use either a Python package or specialized LLM to create a digital equivalent of the table.
42+
3) Fix the systemic errors that prevent the table from being saved as DataFrame.
43+
4) Save the DataFrame into the Output folder.
44+
45+
### Phase 3: Run the Processing notebook.
46+
The notebook applies fixes to the DataFrame to make the numbers closer to the reported numbers. It joins all the tables into a single dataset.
47+
48+
### Phase 4: Run the Cross-Validation notebook.
49+
The notebook applies a series of tests that check for the internal consistency between the numbers. Flags the potential errors.
50+
51+
## Contact
52+
A version of this process is used by OSM to extract data for our actuarial models. One of the benefits of releasing our code is the feedback and improvement ideas. If you have any, you can contact us at gregor@osmodelling.com.
53+
54+
## License
55+
MIT license
56+
57+
### Phase 5: Final modifications to the table and a manual inspection.
58+

0 commit comments

Comments
 (0)