Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
-
Updated
Jan 23, 2026 - Python
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Powerful handwritten text recognition. A simple-to-use, unofficial implementation of the paper "TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models".
An image to text model base on transformer which can also be used on OCR task.
AutoRegressive Transformer Model for Optical Character Recognition In Farasi
A service for extracting and indexing archival document images
This is a handwritten-equations to latex-equations converter, built using trocrbasestage1 model + CROHME dataset + custom-built SentencePiece Tokenizer (latex)
I integrated the TrOCR model with the Django framework
This is a public dataset for training OCR models on using the British Parliament's historic records. It is for digital humanities researchers
This script allows to run TrOCR transriptions on any images with a page xml with layout recognized. It follows a 3 step workflow of cropping all the lines, then running trocr on them and then recombining the results of step 2 to generate a new .xml as output of the recognized lines.
Add a description, image, and links to the trocr topic page so that developers can more easily learn about it.
To associate your repository with the trocr topic, visit your repo's landing page and select "manage topics."