forked from mayan-edms/Mayan-EDMS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathocr_backend.txt
35 lines (24 loc) · 1.26 KB
/
ocr_backend.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
===========
OCR backend
===========
Mayan EDMS ships an OCR backend that uses the FLOSS engine Tesseract
(https://github.com/tesseract-ocr/tesseract/), but it can
use other engines. To support other engines create a wrapper that subclasses the
``OCRBackendBase`` class defined in mayan/apps/ocr/classes. This subclass should
expose the ``execute`` method. For an example of how the Tesseract backend
is implemented take a look at the file ``mayan/apps/ocr/backends/tesseract.py``
Once you create you own backend, in your ``local.py`` settings add the option
``OCR_BACKEND`` and point it to your new OCR backend class path.
The default value of ``OCR_BACKEND`` is ``"ocr.backends.tesseract.Tesseract"``
.. note::
Refer to the :doc:`../chapters/settings` topic for information on how to
create your own Python settings files.
To add support to OCR more languages when using Tesseract, install the
corresponding language file. If using a Debian based OS, this command will
display the available language files:
.. code-block:: console
apt-cache search tesseract-ocr
If using the Docker image, pass the environment variable MAYAN_APT_INSTALLS
with the corresponding Tesseract language option. Example:
.. code-block:: console
-e MAYAN_APT_INSTALLS='tesseract-ocr-deu'