Skip to content

Commit

Permalink
Add dependency on native Tesseract OCR executable for pytesseract
Browse files Browse the repository at this point in the history
The pytesseract package needs to have the `tesseract` executable available at runtime to work.

By default, the Python package looks for the `tesseract` executable in the PATH. This doesn't work here, so we need to override the `tesseract_cmd` with the path to the `tesseract` executable we pulled in from Nix. I did this with a patch [based on how pytesseract is set up in Nixpkgs][1].

The patching code feels a bit fiddly. I don't know the idiomatic way to do this sort of thing.

I included a test that will fail if pytesseract cannot find the `tesseract` executable. The test passed for me with both `preferWheels = true` and `preferWheels = false`, but I only included one in the test suite here, not sure if it makes sense to have both—the actual patching code had to be a bit different depending on whether the source was a wheel or not.

[1]: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/python-modules/pytesseract/tesseract-binary.patch
  • Loading branch information
TikhonJelvis committed Oct 17, 2023
1 parent fe0dcb4 commit 9ecb280
Show file tree
Hide file tree
Showing 5 changed files with 150 additions and 0 deletions.
28 changes: 28 additions & 0 deletions overrides/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -2141,6 +2141,34 @@ lib.composeManyExtensions [
buildInputs = (old.buildInputs or [ ]) ++ [ pkgs.taglib ];
});

pytesseract =
let
pytesseract-cmd-patch = pkgs.writeText "pytesseract-cmd.patch" ''
--- a/pytesseract/pytesseract.py
+++ b/pytesseract/pytesseract.py
@@ -27,7 +27,7 @@
from PIL import Image
-tesseract_cmd = 'tesseract'
+tesseract_cmd = '${pkgs.tesseract4}/bin/tesseract'
numpy_installed = find_loader('numpy') is not None
if numpy_installed:
'';
in
super.pytesseract.overridePythonAttrs (old: {
buildInputs = (old.buildInputs or [ ]) ++ [ pkgs.tesseract4 ];
patches = (old.patches or [ ]) ++ lib.optionals (!(old.src.isWheel or false)) [ pytesseract-cmd-patch ];

# apply patch in postInstall if the source is a wheel
postInstall = lib.optionalString (old.src.isWheel or false) ''
pushd "$out/${self.python.sitePackages}"
patch -p1 < "${pytesseract-cmd-patch}"
popd
'';
});

pytezos = super.pytezos.overridePythonAttrs (old: {
buildInputs = (old.buildInputs or [ ]) ++ [ pkgs.libsodium ];
});
Expand Down
1 change: 1 addition & 0 deletions tests/default.nix
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,7 @@ in
rpds-py-no-wheel = callTest ./rpds-py-no-wheel { };
contourpy-wheel = callTest ./contourpy-wheel { };
contourpy-no-wheel = callTest ./contourpy-no-wheel { };
pytesseract = callTest ./pytesseract { };
} // lib.optionalAttrs (!stdenv.isDarwin) {
# pyqt5 = (callTest ./pyqt5 { });

Expand Down
13 changes: 13 additions & 0 deletions tests/pytesseract/default.nix
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{ poetry2nix, runCommand, python3 }:
let
env = poetry2nix.mkPoetryEnv {
python = python3;
pyproject = ./pyproject.toml;
poetrylock = ./poetry.lock;
preferWheels = true;
};
py = env.python;
in
runCommand "pytesseract-test" { } ''
${env}/bin/python -c 'import pytesseract; print(pytesseract.get_tesseract_version())' > $out
''
99 changes: 99 additions & 0 deletions tests/pytesseract/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/pytesseract/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[tool.poetry]
name = "pytesseract-test"
version = "0.1.0"
description = ""
authors = ["Your Name <you@example.com>"]

[tool.poetry.dependencies]
python = "^3.10"
pytesseract = "*"

0 comments on commit 9ecb280

Please sign in to comment.