Skip to content
View thsno02's full-sized avatar

Block or report thsno02

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,661 354 Updated Dec 7, 2024

A quick guide (especially) for trending instruction finetuning datasets

2,890 187 Updated Nov 28, 2023

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 39,695 6,503 Updated Dec 9, 2024

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 23,102 5,612 Updated Aug 14, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,738 262 Updated Feb 18, 2025

中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.

HTML 157 28 Updated Apr 15, 2024

Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

Jupyter Notebook 1,422 55 Updated May 13, 2024

A curation of awesome tools, documents and projects about LLM Security.

1,082 118 Updated Feb 23, 2025

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 21,463 2,781 Updated Aug 15, 2024

Official Repository for "The Curious Case of Neural Text Degeneration"

HTML 160 17 Updated Apr 18, 2023

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python 1,812 146 Updated Dec 30, 2024

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Python 971 57 Updated Oct 24, 2024

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/

Python 4,198 438 Updated Feb 26, 2025

12 Weeks, 24 Lessons, AI for All!

Jupyter Notebook 36,223 6,457 Updated Feb 13, 2025

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge managemen…

TypeScript 56,533 12,056 Updated Feb 28, 2025
Python 2,521 310 Updated May 19, 2024

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,320 96 Updated Mar 3, 2024

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 58,841 5,977 Updated Aug 24, 2024

互联网常用敏感词、停止词词库

1,361 636 Updated Jun 4, 2024

互联网常用敏感词库

346 187 Updated Dec 4, 2018

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,697 5,945 Updated Feb 28, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,974 4,639 Updated Feb 28, 2025

小火箭 shadowrocket 配置文件 模块 脚本 module sgmodule 图文教程 规则 分流 破解 解锁

JavaScript 5,428 287 Updated Feb 28, 2025
Python 783 79 Updated Sep 14, 2023

Train transformer language models with reinforcement learning.

Python 12,070 1,633 Updated Feb 27, 2025

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 13,557 960 Updated Feb 14, 2025

Examples and guides for using the OpenAI API

MDX 62,034 9,995 Updated Feb 20, 2025

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

18,556 1,784 Updated Sep 19, 2024

Forum for discussing Internet censorship circumvention

Python 3,589 82 Updated Dec 12, 2024

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation me…

Python 1,274 114 Updated Apr 3, 2024
Next