Skip to content

Commit b3a31f3

Browse files
committed
move colab notebooks to github
1 parent 4056b86 commit b3a31f3

File tree

4 files changed

+306
-3
lines changed

4 files changed

+306
-3
lines changed

.gitignore

+2-1
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,8 @@ dmypy.json
128128
# Pyre type checker
129129
.pyre/
130130

131-
notebooks/
131+
notebooks/**
132+
!notebooks/LaTeX_OCR*.ipynb
132133
.ipynb_checkpoints/
133134
dataset/data/**
134135
wandb/

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# pix2tex - LaTeX OCR
22

3-
[![GitHub](https://img.shields.io/github/license/lukas-blecher/LaTeX-OCR)](https://github.com/lukas-blecher/LaTeX-OCR) [![PyPI](https://img.shields.io/pypi/v/pix2tex?logo=pypi)](https://pypi.org/project/pix2tex) [![PyPI - Downloads](https://img.shields.io/pypi/dm/pix2tex?logo=pypi)](https://pypi.org/project/pix2tex) [![GitHub all releases](https://img.shields.io/github/downloads/lukas-blecher/LaTeX-OCR/total?color=blue&logo=github)](https://github.com/lukas-blecher/LaTeX-OCR/releases) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1ba_qCGJl29dFQqfBjdqMik3o_EqPE4fr)
3+
[![GitHub](https://img.shields.io/github/license/lukas-blecher/LaTeX-OCR)](https://github.com/lukas-blecher/LaTeX-OCR) [![PyPI](https://img.shields.io/pypi/v/pix2tex?logo=pypi)](https://pypi.org/project/pix2tex) [![PyPI - Downloads](https://img.shields.io/pypi/dm/pix2tex?logo=pypi)](https://pypi.org/project/pix2tex) [![GitHub all releases](https://img.shields.io/github/downloads/lukas-blecher/LaTeX-OCR/total?color=blue&logo=github)](https://github.com/lukas-blecher/LaTeX-OCR/releases) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukas-blecher/LaTeX-OCR/blob/master/notebooks/LaTeX_OCR_test.ipynb)
44

55
The goal of this project is to create a learning based system that takes an image of a math formula and returns corresponding LaTeX code.
66

@@ -33,7 +33,7 @@ The model works best with images of smaller resolution. That's why I added a pre
3333

3434
Always double check the result carefully. You can try to redo the prediction with an other resolution if the answer was wrong.
3535

36-
## Training the model [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1MqZSKzSgEnJB9lU7LyPma4bo4J3dnj1E)
36+
## Training the model [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lukas-blecher/LaTeX-OCR/blob/master/notebooks/LaTeX_OCR_training.ipynb)
3737

3838
1. First we need to combine the images with their ground truth labels. I wrote a dataset class (which needs further improving) that saves the relative paths to the images with the LaTeX code they were rendered with. To generate the dataset pickle file run
3939

notebooks/LaTeX_OCR_test.ipynb

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"name": "LaTeX OCR test.ipynb",
7+
"provenance": [],
8+
"collapsed_sections": []
9+
},
10+
"kernelspec": {
11+
"name": "python3",
12+
"display_name": "Python 3"
13+
},
14+
"language_info": {
15+
"name": "python"
16+
}
17+
},
18+
"cells": [
19+
{
20+
"cell_type": "markdown",
21+
"source": [
22+
"# LaTeX OCR\n",
23+
"In this colab you can convert an image of an equation into LaTeX code.\n",
24+
"## How?\n",
25+
"Execute the cell titled \"Setup\". The first time an error will show up. Simply execute the cell again. Everything should be fine now.\n",
26+
"\n",
27+
"Next, execute the cell below and upload the image(s).\n",
28+
"\n",
29+
"> Note: You can probably also run this project locally and with a GUI. Follow the steps on [GitHub](https://github.com/lukas-blecher/LaTeX-OCR)"
30+
],
31+
"metadata": {
32+
"id": "aaAqi3wku23I"
33+
}
34+
},
35+
{
36+
"cell_type": "code",
37+
"execution_count": null,
38+
"metadata": {
39+
"cellView": "form",
40+
"id": "DQM_PKeCuzWR"
41+
},
42+
"outputs": [],
43+
"source": [
44+
"#@title Setup\n",
45+
"%reload_ext autoreload\n",
46+
"%autoreload\n",
47+
"import PIL\n",
48+
"!pip install Pillow -U -qq\n",
49+
"if int(PIL.__version__[0]) < 9:\n",
50+
" print('Mandatory restart: Execute this cell again!')\n",
51+
" import os\n",
52+
" os.kill(os.getpid(), 9)\n",
53+
"!pip install pix2tex -qq\n",
54+
"!pip install opencv-python-headless==4.1.2.30 -U -qq\n",
55+
"\n",
56+
"def upload_files():\n",
57+
" from google.colab import files\n",
58+
" from io import BytesIO\n",
59+
" uploaded = files.upload()\n",
60+
" return [(name, BytesIO(b)) for name, b in uploaded.items()]\n",
61+
"\n",
62+
"from pix2tex import cli as pix2tex\n",
63+
"from PIL import Image\n",
64+
"args = pix2tex.initialize()\n",
65+
"\n",
66+
"from IPython.display import HTML, Math\n",
67+
"display(HTML(\"<script src='https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/\"\n",
68+
" \"latest.js?config=default'></script>\"))\n",
69+
"table = r'\\begin{array} {l|l} %s \\end{array}'"
70+
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"source": [
75+
"imgs = upload_files()\n",
76+
"predictions = []\n",
77+
"for name, f in imgs:\n",
78+
" img = Image.open(f)\n",
79+
" math = pix2tex.call_model(*args, img)\n",
80+
" print(math)\n",
81+
" predictions.append('\\\\mathrm{%s} & \\\\displaystyle{%s}'%(name, math))\n",
82+
"Math(table%'\\\\\\\\'.join(predictions))"
83+
],
84+
"metadata": {
85+
"id": "CjrR3O07u3uH"
86+
},
87+
"execution_count": null,
88+
"outputs": []
89+
},
90+
{
91+
"cell_type": "code",
92+
"source": [
93+
""
94+
],
95+
"metadata": {
96+
"id": "ZqCH-4XoCkMO"
97+
},
98+
"execution_count": null,
99+
"outputs": []
100+
}
101+
]
102+
}

notebooks/LaTeX_OCR_training.ipynb

+200
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
{
2+
"nbformat": 4,
3+
"nbformat_minor": 0,
4+
"metadata": {
5+
"colab": {
6+
"name": "LaTeX-OCR training.ipynb",
7+
"provenance": [],
8+
"collapsed_sections": []
9+
},
10+
"kernelspec": {
11+
"name": "python3",
12+
"display_name": "Python 3"
13+
},
14+
"language_info": {
15+
"name": "python"
16+
},
17+
"accelerator": "GPU"
18+
},
19+
"cells": [
20+
{
21+
"cell_type": "markdown",
22+
"source": [
23+
"# Train a LaTeX OCR model\n",
24+
"In this brief notebook I show how you can finetune/train an OCR model.\n",
25+
"\n",
26+
"I've opted to mix in handwritten data into the regular pdf LaTeX images. For that I started out with the released pretrained model and continued training on the slightly larger corpus."
27+
],
28+
"metadata": {
29+
"id": "YtR1GhYwnLnu"
30+
}
31+
},
32+
{
33+
"cell_type": "code",
34+
"metadata": {
35+
"id": "r396ah-Q3EQc"
36+
},
37+
"source": [
38+
"!pip install pix2tex -qq"
39+
],
40+
"execution_count": null,
41+
"outputs": []
42+
},
43+
{
44+
"cell_type": "code",
45+
"metadata": {
46+
"id": "dZ4PLwkb3RIs"
47+
},
48+
"source": [
49+
"import os\n",
50+
"!mkdir -p LaTeX-OCR\n",
51+
"os.chdir('LaTeX-OCR')"
52+
],
53+
"execution_count": null,
54+
"outputs": []
55+
},
56+
{
57+
"cell_type": "code",
58+
"metadata": {
59+
"id": "cUsTlxXV3Mot"
60+
},
61+
"source": [
62+
"!pip install gpustat -q\n",
63+
"!pip install opencv-python-headless==4.1.2.30 -U -q\n",
64+
"!pip install --upgrade --no-cache-dir gdown -q"
65+
],
66+
"execution_count": null,
67+
"outputs": []
68+
},
69+
{
70+
"cell_type": "code",
71+
"source": [
72+
"# check what GPU we have\n",
73+
"!gpustat"
74+
],
75+
"metadata": {
76+
"id": "uhLzh5vyaCaL"
77+
},
78+
"execution_count": null,
79+
"outputs": []
80+
},
81+
{
82+
"cell_type": "code",
83+
"metadata": {
84+
"id": "aAz37dDU21zu"
85+
},
86+
"source": [
87+
"!mkdir -p dataset/data\n",
88+
"!mkdir images\n",
89+
"# Google Drive ids\n",
90+
"# handwritten: 13vjxGYrFCuYnwgDIUqkxsNGKk__D_sOM\n",
91+
"# pdf - images: 176PKaCUDWmTJdQwc-OfkO0y8t4gLsIvQ\n",
92+
"# pdf - math: 1QUjX6PFWPa-HBWdcY-7bA5TRVUnbyS1D\n",
93+
"!gdown -O dataset/data/crohme.zip --id 13vjxGYrFCuYnwgDIUqkxsNGKk__D_sOM\n",
94+
"!gdown -O dataset/data/pdf.zip --id 176PKaCUDWmTJdQwc-OfkO0y8t4gLsIvQ\n",
95+
"!gdown -O dataset/data/pdfmath.txt --id 1QUjX6PFWPa-HBWdcY-7bA5TRVUnbyS1D\n",
96+
"os.chdir('dataset/data')\n",
97+
"!unzip -q crohme.zip \n",
98+
"!unzip -q pdf.zip \n",
99+
"# split handwritten data into val set and train set\n",
100+
"os.chdir('images')\n",
101+
"!mkdir ../valimages\n",
102+
"!ls | shuf -n 1000 | xargs -i mv {} ../valimages\n",
103+
"os.chdir('../../..')"
104+
],
105+
"execution_count": null,
106+
"outputs": []
107+
},
108+
{
109+
"cell_type": "markdown",
110+
"source": [
111+
"Now we generate the datasets. We can string multiple datasets together to get one large lookup table. The only thing saved in these pkl files are image sizes, image location and the ground truth latex code. That way we can serve batches of images with the same dimensionality."
112+
],
113+
"metadata": {
114+
"id": "2BMuIqRIqG-8"
115+
}
116+
},
117+
{
118+
"cell_type": "code",
119+
"source": [
120+
"!python -m pix2tex.dataset.dataset -i dataset/data/images dataset/data/train -e dataset/data/CROHME_math.txt dataset/data/pdfmath.txt -o dataset/data/train.pkl"
121+
],
122+
"metadata": {
123+
"id": "1JebcEarl-g6"
124+
},
125+
"execution_count": null,
126+
"outputs": []
127+
},
128+
{
129+
"cell_type": "code",
130+
"source": [
131+
"!python -m pix2tex.dataset.dataset -i dataset/data/valimages dataset/data/val -e dataset/data/CROHME_math.txt dataset/data/pdfmath.txt -o dataset/data/val.pkl"
132+
],
133+
"metadata": {
134+
"id": "x_Orutb37xHD"
135+
},
136+
"execution_count": null,
137+
"outputs": []
138+
},
139+
{
140+
"cell_type": "code",
141+
"source": [
142+
"# download the weights we want to fine tune\n",
143+
"!curl -L -o weights.pth https://github.com/lukas-blecher/LaTeX-OCR/releases/download/v0.0.1/weights.pth"
144+
],
145+
"metadata": {
146+
"id": "I3iOyEEBbw58"
147+
},
148+
"execution_count": null,
149+
"outputs": []
150+
},
151+
{
152+
"cell_type": "code",
153+
"source": [
154+
"# If using wandb\n",
155+
"!pip install -q wandb \n",
156+
"# you can cancel this if you don't wan't to use it or don't have a W&B acc.\n",
157+
"#!wandb login"
158+
],
159+
"metadata": {
160+
"id": "vow2NnpHmWt0"
161+
},
162+
"execution_count": null,
163+
"outputs": []
164+
},
165+
{
166+
"cell_type": "code",
167+
"source": [
168+
"# generate colab specific config (set 'debug' to true if wandb is not used)\n",
169+
"!echo {backbone_layers: [2, 3, 7], betas: [0.9, 0.999], batchsize: 10, bos_token: 1, channels: 1, data: dataset/data/train.pkl, debug: true, decoder_args: {'attn_on_attn': true, 'cross_attend': true, 'ff_glu': true, 'rel_pos_bias': false, 'use_scalenorm': false}, dim: 256, encoder_depth: 4, eos_token: 2, epochs: 50, gamma: 0.9995, heads: 8, id: null, load_chkpt: 'weights.pth', lr: 0.001, lr_step: 30, max_height: 192, max_seq_len: 512, max_width: 672, min_height: 32, min_width: 32, model_path: checkpoints, name: mixed, num_layers: 4, num_tokens: 8000, optimizer: Adam, output_path: outputs, pad: false, pad_token: 0, patch_size: 16, sample_freq: 2000, save_freq: 1, scheduler: StepLR, seed: 42, temperature: 0.2, test_samples: 5, testbatchsize: 20, tokenizer: dataset/tokenizer.json, valbatches: 100, valdata: dataset/data/val.pkl} > colab.yaml"
170+
],
171+
"metadata": {
172+
"id": "OnsNCLp84QSY"
173+
},
174+
"execution_count": null,
175+
"outputs": []
176+
},
177+
{
178+
"cell_type": "code",
179+
"metadata": {
180+
"id": "c8NU5j2k3z36"
181+
},
182+
"source": [
183+
"!python -m pix2tex.train --config colab.yaml"
184+
],
185+
"execution_count": null,
186+
"outputs": []
187+
},
188+
{
189+
"cell_type": "code",
190+
"source": [
191+
""
192+
],
193+
"metadata": {
194+
"id": "g3DU9KxubWgq"
195+
},
196+
"execution_count": null,
197+
"outputs": []
198+
}
199+
]
200+
}

0 commit comments

Comments
 (0)