Skip to content

BigIskander/Handwriting-keyboard-for-Linux-tesseract

Repository files navigation

Handwriting-keyboard-for-Linux.

This is program written for Linux desktop environment.

To recognize handwritten pattern program uses OCR engine.

At the moment program supports 2 OCR engines: Tesseract OCR and PaddleOCR.

To send the keyboard input program uses xdotool or ydotool.

You can find compiled .deb, .rpm and .AppImage packages in releases page.

This is the instruction for version 2.1. Instruction for version 2.0 is located at v2.0 branch of this repository. Instruction for version 1 is located at v1 branch of this repository.

How to use the program

  1. Launch the program with or without command line options
  2. write text in the canvas by using mouse or stylus (on graphical tablet)
  3. press 'recognize' button
  4. press to recognized text, program will type this text or copy to clipboard and paste by triggering ctrl+V (or shift+ctrl+V) keypress

Note: before using the program you need to install dependencies.

If program's window is in focus, before sending keyboard input program will trigger alt+Tab keypress to return focus to previous active window and only then send the input (this does not applied when '--skip-taskbar' option is set).

Command line options

Usage: handwriting-keyboard-t [OPTIONS]

Options:
      --use-paddle-ocr...
          Use PaddleOCR to recognize handwriting pattern. 
          By default program uses Tesseract OCR.

  -l, --lang <lang>
          Language used to recognize handwriting pattern. Value depends on OCR engine. 
          Default value is 'chi_all' for Tesseract OCR and 'ch' for PaddleOCR.

      --tessdata-dir <tessdata-dir>
          A directory with *.traineddata files for Tesseract OCR engine. 
          Tesseract OCR specefic.

  -a, --automode...
          Automatically send recognize text request to OCR engine after every stroke.

      --use-tmp-file...
          Save canvas as temporary image file and send path of this file to OCR engine. 
          By default program sends image data to OCR engine via stdin.

      --use-ydotool...
          Use ydotool to send keyboard input. By default program uses xdotool.

      --use-clipboard...
          Copy text to clipboard and paste it via triggering ctrl+V (or shift+ctrl+V) 
          kyepress to paste the text. 
          By default program will try to type text 
          (ydotool only supports typing latin characters).

      --use-shift...
          Trigger shift+ctrl+V kyepress to paste text from clipboard. 
          By default program uses ctrl+V. 
          Only applyed when '--use-clipboard' option is set.

      --return-focus...
          Program will return focus to previous window by triggering alt+Tab kyepress 
          every time when program gains focus (after mouseup event inside the window). 
          Will not work if option '--skip-taskbar' is set.

      --return-keyboard...
          After sending keyboard input program will trigger alt+Tab keypress 
          to return focus back to keyboad's window. 
          Will not work if option '--skip-taskbar' is set.

      --fly-to-bottom...
          At launch program window will fly to the bottom of the screen 
          and resize to screen width.

      --skip-taskbar...
          Program window will skip taskbar.

      --dark-theme...
          Use dark theme. Change colors of the application to dark theme.

      --show-grid...
          Show grid. Shows helper grid in canvas background.

      --allow-undo...
          Allow undo. Allows undo function and shows undo botton.

      --stroke-autocorrect...
          Stroke autocorrection function for Chinese language. Experimental function.

      --common-punctuation...
          Show common Chinese punctuation.

      --debug...
          Output some debug info in console.

  -h, --help
          Print help (see a summary with '-h')

  -V, --version
          Print version

Example of using command line options:

handwriting-keyboard-t --tessdata-dir=/home/user/ --lang=chi_sim -a

In this case (above), to recognize hand written pattern program will use Tesseract OCR (as default OCR engine) with training data from folder "/home/user/" and language "chi_sim" (Chinese simplified), particularly the file "/home/user/chi_sim.traineddata". Also in this case the program will automatically send request to tesseract-ocr after every stroke, because it was launched with "-a" option.

Installing dependencies

  1. Install OCR engine: Tesseract OCR or/and PaddleOCR.

    • in debian based linux system you can install Tesseract OCR from repository:
      sudo apt install tesseract-ocr
      
      for other linux distributions you can find instructions in their github repository: https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract
    • to install PaddleOCR, I would recommend to:
      1. set up conda envirenment by this instruction, because PaddleOCR's python dependencies might conflict with existing python environment
      2. then activate this conda environment and install PaddleOCR:
        conda activate paddle_env
        
        python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
        
        pip install "paddleocr>=2.0.1"
        
        For different type of settings and instructions visit PaddleOCR's official website https://paddlepaddle.github.io/PaddleOCR/main/en/index.html
  2. Install xdotool or/and ydotool.

Notes about dependencies

  1. If you use the program with Tesseract OCR, I would recommend to install tesseract 4 (instead of tesseract 5). Because the results is more accurate when using with tesseract 4 (at least for recognition of text (writing) in Chinese language).

  2. If you use the program with Tesseract OCR, you also need to download model data for tesseract-ocr and copy .traineddata files to data folder of tesseract-ocr (for example for tesseract-ocr 4.0 it would be this folder /usr/share/tesseract-ocr/4.00/tessdata/). Or alternatively you can put these files in whatever folder you like and run program with --tessdata-dir command line option and point to the folder where model data files are located.

  3. By default program uses Tesseract OCR with language set as chi_all, *.traineddata files for which you can download by this link.

  4. PaddleOCR is significantly more accurate than Tesseract OCR, at least in regognizing Chinese characters, however it is also slower at least on my hardware.

  5. If you use the program with PaddleOCR. PaddleOCR downloads model data at first use, then it can be used offline. List of available languages can be found by this link.

  6. If you use the program with PaddleOCR and it is installed in conda environment, you need to activete conda environment first and then launch this program.

  7. As for sending keyboard input:

    • xdotool - only supports X11 desktop environment
    • ydotool - works in X11 and Wayland desktop environment, ydotool can type only latin characters and ydotoold process should be running (in background or in separate terminal) in order to ydotool to work
    • instead of typing program can copy the text to clipboard and paste by trigerring ctrl+V (or shift+ctrl+V) keypress

Notes about Wayland

This program (application) can work in Wayland desktop environment, however it is not fully supported.

Some specifics of using this program in Wayland:

  1. Always on top property of the window can not be set programmatically. As workaround you can set it manually by right clicking on title bar of the window and checking 'Always on top' option (it should work, at least in gnome based desktop environment).

  2. set_accept_focus(false) property of the window is not working in Wayland. Program's window will gain focus anyway (when interacting with it).

  3. --skip-taskbar command line option will not work in Wayland.

  4. --fly-to-bottom command line option will not position program's window correctly.

  5. Xdotool does not support Wayland. For that reasoun I would recommend to use this program with ydotool instead.

  6. If you use this program in Wayland desktop environment. I would recommed to launch it with --use-ydotool and --use-clipboard command line options.

Some technical details

Program written by using tauri framework https://tauri.app/

The script from https://github.com/ChenYuHo/handwriting.js is used to make a writing canvas.

To recognize handwritten pattern program uses Tesseract OCR or PaddleOCR.

To send keyboard input program uses xdotool or ydotool.

In order to run from code or compile the programm: You need to install Node.js 20 or newer version and Rust as well.

Install Node.js dependencies:

npm install

Run program in development environment:

npm run tauri dev

Run program in development environment with cli (command line) options:

npm run tauri dev -- -- -- cli_options

Compile the program:

npm run tauri build

Older version of this program using Google API instead is available by this link: https://github.com/BigIskander/Handwriting-keyboard-for-Linux.

Recommended IDE Setup