Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
takmin committed Oct 23, 2024
1 parent f10c689 commit 0f26613
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 1 deletion.
12 changes: 12 additions & 0 deletions docker/cli.patch
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
--- /usr/local/lib/python3.8/dist-packages/pix2tex/cli.py 2024-10-23 19:35:59.000000000 +0900
+++ cli.py 2024-10-23 19:46:37.053516037 +0900
@@ -44,7 +44,8 @@
ratios = [a/b for a, b in zip(img.size, max_dimensions)]
if any([r > 1 for r in ratios]):
size = np.array(img.size)//max(ratios)
- img = img.resize(size.astype(int), Image.BILINEAR)
+ size = tuple(size.astype(int))
+ img = img.resize(size, Image.BILINEAR)
if min_dimensions is not None:
# hypothesis: there is a dim in img smaller than min_dimensions, and return a proper dim >= min_dimensions
padded_size = [max(img_dim, min_dim) for img_dim, min_dim in zip(img.size, min_dimensions)]
1 change: 1 addition & 0 deletions docker/dockerbuild-fix.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
docker build -t latexocr-wrapper -f docker/Dockerfile-fix .
34 changes: 33 additions & 1 deletion readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## 概要
NDL OCRのWrapperツール(またはEasyOCR Wrapper)により出力されたレイアウト情報を読み取り、数式領域の文字認識を一括で行います。

数式認識はLaTeX-OCR(https://github.com/lukas-blecher/LaTeX-OCR)を用います。
数式認識は[LaTeX-OCR](https://github.com/lukas-blecher/LaTeX-OCR)を用います。


## ソフトウェア構成
Expand All @@ -27,6 +27,38 @@ $ sh ./docker/dockerbuild.sh
```


### バグ修正版

LaTeX-OCRの実行中に次のようなエラーが出る可能性がある。
```
Traceback (most recent call last):
File "/root/latexocr/proc_latexocr.py", line 96, in <module>
recog_text_in_math(args.input, args.json, args.output)
File "/root/latexocr/proc_latexocr.py", line 81, in recog_text_in_math
block["LINES"] = latexocr(math_lines, img, 0.1)
File "/root/latexocr/proc_latexocr.py", line 44, in latexocr
copy_line["STRING"] = model(crop_img)
File "/usr/lib/python3.8/contextlib.py", line 75, in inner
return func(*args, **kwds)
File "/usr/local/lib/python3.8/dist-packages/pix2tex/cli.py", line 122, in __call__
img = pad(minmax_size(input_image.resize((w, h), Image.Resampling.BILINEAR if r > 1 else Image.Resampling.LANCZOS), self.args.max_dimensions, self.args.min_dimensions))
File "/usr/local/lib/python3.8/dist-packages/pix2tex/cli.py", line 47, in minmax_size
img = img.resize(size.astype(int), Image.BILINEAR)
File "/usr/local/lib/python3.8/dist-packages/PIL/Image.py", line 2297, in resize
if self.size == size and box == (0, 0) + self.size:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
```

このエラーは以下のISSUEにあたる。
https://github.com/lukas-blecher/LaTeX-OCR/issues/392


その場合、バグ修正版のdockerbuild-fix.shを使用する。(将来latexocrのアップデートによって修正された場合、この対応は不要になる)
```
$ sh ./docker/dockerbuild-fix.sh
```


## 使い方

docker/run_docker.shを用いることで、Dockerコンテナの立ち上げ、数式認識の実行、コンテナの解放という流れを入力フォルダの中の画像群に対し一度に行うことができます。
Expand Down

0 comments on commit 0f26613

Please sign in to comment.