Stars
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Python 开源项目之「自学编程之路」,保姆级教程:AI实验室、宝藏视频、数据结构、学习指南、机器学习实战、深度学习实战、网络爬虫、大厂面经、程序人生、资源分享。
A framework for few-shot evaluation of language models.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
[NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
We present the first systematic study on the scaling property of raw agents instantiated by LLMs. We find that performance scales with the increase in the number of agents, using the simple(st) way…
CycleQD is a framework for parameter space model merging.