Highlights
- Pro
Stars
This repository allows to perform the evaluation of author embedding on a writing style axis.
A PyTorch implementation of DTrOCR: Decoder-only Transformer for Optical Character Recognition
🎉🌩️ Dynamic DNS (DDNS) service based on Cloudflare! Access your home network remotely via a custom domain name without a static IP!
🇧🇪 BelGPT-2: the 1st GPT model pretrained in French.
Lord of Large Language Models Web User Interface
BookNLP, a natural language processing pipeline for books
Evaluation of the Optical Character Recognition (OCR) capabilities of GPT-4V(ision)
Implementation of KatKit as presented at DH2024
Responsible Datasets in Context
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
Custom recipe and utilities for document processing
🧬 A VS Code extension for annotating data with Prodigy
Code and prompt templates for the "Post-OCR Correction with OpenAI’s GPT Models on Challenging English Prosody Texts" short-paper submission to DocEng 2024.
Distribute and run LLMs with a single file.
A software to detect text reuse with BLAST.
Instruction Tuning with GPT-4
Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.
Infinite Photorealistic Worlds using Procedural Generation
Catalog of abusive language data (PLoS 2020)
The Broad Twitter Corpus, an NER dataset in English stratified for time, location, social media genre, socioeconomic factors (COLING 2016)
🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.
🍳 Recipes for the Prodigy, our fully scriptable annotation tool