Skip to content
View thsno02's full-sized avatar

Block or report thsno02

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,661 354 Updated Dec 7, 2024

A quick guide (especially) for trending instruction finetuning datasets

2,891 187 Updated Nov 28, 2023

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 39,703 6,505 Updated Dec 9, 2024

Code for the paper "Language Models are Unsupervised Multitask Learners"

Python 23,104 5,612 Updated Aug 14, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

3,740 262 Updated Feb 18, 2025

中文词典 / 中文詞典。Chinese / Chinese-English dictionaries.

HTML 157 28 Updated Apr 15, 2024

Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?

Jupyter Notebook 1,422 55 Updated May 13, 2024

A curation of awesome tools, documents and projects about LLM Security.

1,083 118 Updated Feb 23, 2025

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Python 21,465 2,782 Updated Aug 15, 2024

Official Repository for "The Curious Case of Neural Text Degeneration"

HTML 160 17 Updated Apr 18, 2023

[ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.

Python 1,813 146 Updated Dec 30, 2024

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Python 971 57 Updated Oct 24, 2024

H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/

Python 4,198 438 Updated Feb 26, 2025

12 Weeks, 24 Lessons, AI for All!

Jupyter Notebook 36,227 6,457 Updated Feb 13, 2025

🤯 Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge managemen…

TypeScript 56,564 12,060 Updated Feb 28, 2025
Python 2,521 310 Updated May 19, 2024

Secrets of RLHF in Large Language Models Part I: PPO

Python 1,320 96 Updated Mar 3, 2024

🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…

Python 58,848 5,978 Updated Aug 24, 2024

互联网常用敏感词、停止词词库

1,361 637 Updated Jun 4, 2024

互联网常用敏感词库

346 187 Updated Dec 4, 2018

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 39,750 5,954 Updated Feb 28, 2025

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,976 4,637 Updated Feb 28, 2025

小火箭 shadowrocket 配置文件 模块 脚本 module sgmodule 图文教程 规则 分流 破解 解锁

JavaScript 5,429 287 Updated Feb 28, 2025
Python 783 79 Updated Sep 14, 2023

Train transformer language models with reinforcement learning.

Python 12,082 1,635 Updated Feb 28, 2025

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 13,559 960 Updated Feb 14, 2025

Examples and guides for using the OpenAI API

MDX 62,036 9,996 Updated Feb 20, 2025

整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。

18,572 1,787 Updated Sep 19, 2024

Forum for discussing Internet censorship circumvention

Python 3,590 82 Updated Dec 12, 2024

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation me…

Python 1,274 114 Updated Apr 3, 2024
Next