The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code
We need paired data for the network to learn. Luckly there is a lot of LaTeX code on the internet, e.g. wikipedia, arXiv. We also use the formulae from the im2latex-170k dataset.
- Latin Modern Math
- GFSNeohellenicMath.otf
- Asana Math
- XITS Math
- Cambria Math
In order to render the math in many different fonts we use XeLaTeX, generate a PDF and finally convert it to a PNG. For the last step we need to use some third party tools:
- XeLaTeX
- ImageMagick with Ghostscript.
- Node.js to run KaTeX
de-macro
>= 1.4- Python 3.7+ & dependencies
Contributions of any kind are welcome.
Code taken and modified from im2markup, arxiv_leaks