-
CMU
- Pittsburgh, US
-
01:30
(UTC -05:00) - https://stiglidu.github.io/
Highlights
- Pro
Stars
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
🙌 OpenHands: Code Less, Make More
SGLang is a fast serving framework for large language models and vision language models.
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
A generative world for general-purpose robotics & embodied AI learning.
Simple and efficient pytorch-native transformer training and inference (batched)
[NeurIPS D&B Track 2024] Source code for the paper "Constrained Human-AI Cooperation: An Inclusive Embodied Social Intelligence Challenge"
Entropy Based Sampling and Parallel CoT Decoding
ORLM: Training Large Language Models for Optimization Modeling
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Repository for the paper Stream of Search: Learning to Search in Language
A framework for few-shot evaluation of language models.
Code for the paper "Evaluating Large Language Models Trained on Code"
RewardBench: the first evaluation tool for reward models.
A high-throughput and memory-efficient inference and serving engine for LLMs
A curated list of fellowships for graduate students in Computer Science and related fields.
Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" presented by Zhiheng Xi et al.
Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision