Skip to content
View 520jefferson's full-sized avatar

Block or report 520jefferson

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 5,694 471 Updated Oct 28, 2024

An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conv…

Jupyter Notebook 1,883 180 Updated Aug 19, 2024
Python 153 17 Updated Oct 1, 2024

Faker is a Python package that generates fake data for you.

Python 17,705 1,929 Updated Oct 24, 2024

similarity: Text similarity calculation Toolkit for Java. 文本相似度计算工具包,java编写,可用于文本相似度计算、情感分析等任务,开箱即用。

Java 1,429 325 Updated Feb 3, 2024
Python 171 9 Updated May 31, 2024

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,726 204 Updated Sep 21, 2024

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Python 6,717 496 Updated Oct 28, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,798 458 Updated May 3, 2024

中华经典古籍精校、诗词,四书五经、四大名著、诗经、楚辞、全唐诗、全宋词、唐诗三百首、宋詞三百首、二十四史......

81 32 Updated Feb 19, 2021

Similarities: a toolkit for similarity calculation and semantic search. 相似度计算、匹配搜索工具包,支持亿级数据文搜文、文搜图、图搜图,python3开发,开箱即用。

Python 768 74 Updated Oct 9, 2024

中文图书语料MD5链接

Python 210 23 Updated Jan 31, 2024

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation me…

Python 1,221 111 Updated Apr 3, 2024

distributed trainer for LLMs

Python 538 76 Updated May 20, 2024

Democratizing Internet-scale financial data.

Jupyter Notebook 1,123 200 Updated Jul 1, 2024

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,556 349 Updated Oct 17, 2024

DFA 实现中文敏感词检测

Python 93 28 Updated May 23, 2022

一款高性能敏感词(非法词/脏字)检测过滤组件,附带繁体简体互换,支持全角半角互换,汉字转拼音,模糊搜索等功能。

JavaScript 4,750 854 Updated Sep 26, 2024

非常全的文言文(古文)-现代文平行语料

Python 1,167 267 Updated Apr 21, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 13,842 1,123 Updated Sep 24, 2024

首个llama2 13b 中文版模型 (Base + 中文对话SFT,实现流畅多轮人机自然语言交互)

89 7 Updated Aug 21, 2023

Tuning LLMs with no tears💦; Sample Design Engineering (SDE) for more efficient downstream-tuning.

HTML 965 99 Updated Apr 27, 2024

MOSS-RLHF

Python 1,284 101 Updated Mar 3, 2024

Train transformer language models with reinforcement learning.

Python 9,856 1,246 Updated Oct 27, 2024

Inference Llama 2 in one file of pure C

C 17,368 2,068 Updated Aug 6, 2024

Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型

Python 396 31 Updated Oct 21, 2023

中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)

Python 7,080 579 Updated Sep 23, 2024

Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用

Python 13,877 1,247 Updated Sep 5, 2024

Code and data of the CCL 2022 paper "Automatic Construction of Sentence Pattern Structure Treebank".

Python 5 Updated Oct 28, 2022

比较全的中华古诗古词古文库,包括21万首古诗词,以及注释、赏析等信息,包含10000多名诗人以及诗人的介绍、生平等,同时包含,1600多个词牌介绍,中国70多个朝代解析,和古诗文的近200个分类标签

JavaScript 311 96 Updated Sep 11, 2023
Next