Texotic is a Python library to convert images of equations into LaTeX code based on the ONNXRuntime.
Cythonized fork of RapidLatexOCR. # TODO: finish this Modified fork of RapidLatexOCR.
- Works completely offline. Models are predownloaded
- Bug fixes:
- Rewrote code using Numba:
- TODO: document speed up
- Removed usage of deprecated numpy methods
- Stronger type checking via up-to-date type hint syntax, removed dependance on the inbuilt typing module
- More comprehensive error handling and logging
- More documentation and usage examples
- Added more options for model inference
- Refactored code
- Rewrote code using Numba:
rapid_latex_ocris a tool to convert formula images to latex format.- The reasoning code in the repo is modified from LaTeX-OCR, the model has all been converted to ONNX format, and the reasoning code has been simplified, Inference is faster and easier to deploy.
- The repo only has codes based on
ONNXRuntimeorOpenVINOinference in onnx format, and does not contain training model codes. If you want to train your own model, please move to LaTeX-OCR. - If it helps you, please give a little star ⭐ or sponsor a cup of coffee (click the link in Sponsor at the top of the page)
- Welcome all friends to actively contribute to make this tool better.
- ☆ Model Conversion Notes
- Rewrite LaTeX-OCR GUI version based on
rapid_latex_ocr - Add demo in the hugging face
- Integrate other better models
- Add support for OpenVINO
-
pip install
rapid_latext_ocrlibrary. Because packaging the model into the whl package exceeds the pypi limit (100M), the model needs to be downloaded separately.pip install rapid_latex_ocr
-
Download the model (Google Drive | Baidu NetDisk), when initializing, just specify the model path, see the next part for details.
model name size image_resizer.onnx37.1M encoder.onnx84.8M decoder.onnx48.5M
- Used by python script:
from rapid_latex_ocr import LatexOCR image_resizer_path = 'models/image_resizer.onnx' encoder_path = 'models/encoder.onnx' decoder_path = 'models/decoder.onnx' tokenizer_json = 'models/tokenizer.json' model = LatexOCR(image_resizer_path=image_resizer_path, encoder_path=encoder_path, decoder_path=decoder_path, tokenizer_json=tokenizer_json) img_path = "tests/test_files/6.png" with open(img_path, "rb") as f: data = f. read() result, elapse = model(data) print(result) # {\frac{x^{2}}{a^{2}}}-{\frac{y^{2}}{b^{2}}}=1 print(elapse) # 0.4131628000000003
- Used by command line.
$ rapid_latex_ocr -h usage: rapid_latex_ocr [-h] [-img_resizer IMAGE_RESIZER_PATH] [-encdoer ENCODER_PATH] [-decoder DECODER_PATH] [-tokenizer TOKENIZER_JSON] img_path positional arguments: img_path Only img path of the formula. optional arguments: -h, --help show this help message and exit -img_resizer IMAGE_RESIZER_PATH, --image_resizer_path IMAGE_RESIZER_PATH -encdoer ENCODER_PATH, --encoder_path ENCODER_PATH -decoder DECODER_PATH, --decoder_path DECODER_PATH -tokenizer TOKENIZER_JSON, --tokenizer_json TOKENIZER_JSON $ rapid_latex_ocr tests/test_files/6.png \ -img_resizer models/image_resizer.onnx \ -encoder models/encoder.onnx \ -dedocer models/decoder.onnx \ -tokenizer models/tokenizer.json # ('{\\frac{x^{2}}{a^{2}}}-{\\frac{y^{2}}{b^{2}}}=1', 0.47902780000000034)
- 2023-09-13 v0.0.4 update:
- Merge pr #5
- Optim code
- 2023-07-15 v0.0.1 update:
- First release
- Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
- Please make sure to update tests as appropriate.
If you want to sponsor the project, you can directly click the Buy me a coffee image, please write a note (e.g. your github account name) to facilitate adding to the sponsorship list below.
This project is released under the MIT license.

