Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and…

Python 46,363 7,993 Updated Feb 11, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 40,245 4,932 Updated Feb 12, 2025

floodsung / Deep-Learning-Papers-Reading-Roadmap

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

Python 38,734 7,343 Updated Nov 27, 2022

google-research / bert

TensorFlow code and pre-trained models for BERT

Python 38,642 9,657 Updated Jul 23, 2024

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,754 4,617 Updated Feb 11, 2025

hankcs / HanLP

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Python 34,408 10,357 Updated Jan 15, 2025

OpenBMB / ChatDev

Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration)

Python 26,156 3,312 Updated Dec 30, 2024

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 25,624 1,948 Updated Feb 11, 2025

sebastianruder / NLP-progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Python 22,786 3,619 Updated Jul 28, 2024

kaixindelele / ChatPaper

Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复

Python 18,717 1,941 Updated Apr 4, 2024

Belval / TextRecognitionDataGenerator

A synthetic data generator for text recognition

Python 3,397 996 Updated Jul 18, 2024

ownthink / Jiagu

Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类

Python 3,351 614 Updated May 7, 2022

ankush-me / SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.

Python 2,056 623 Updated Aug 9, 2023

jiesutd / NCRFpp

NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

Python 1,891 446 Updated Jun 30, 2022

BLKSerene / Wordless

An Integrated Corpus Tool With Multilingual Support for the Study of Language, Literature, and Translation

Python 707 92 Updated Feb 1, 2025

baudm / parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Python 620 130 Updated May 29, 2024

HillZhang1999 / MuCGEC

MuCGEC中文纠错数据集及文本纠错SOTA模型开源；Code & Data for our NAACL 2022 Paper "MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction"

Python 518 65 Updated Jun 9, 2023

nusnlp / m2scorer

MaxMatch (M^2) Scorer - Evaluation program for grammatical error correction systems.

Python 149 37 Updated Sep 27, 2022

MaksTarnavskyi / gector-large

Forked from grammarly/gector

Improved version of GECToR

Python 60 6 Updated Jul 24, 2023

Python 2 Updated Aug 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZongXuquan imagoodman-aa

Highlights

Block or report imagoodman-aa

Stars

deepseek-ai / DeepSeek-V3

abi / screenshot-to-code

PaddlePaddle / PaddleOCR