Stars
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
CycleQD is a framework for parameter space model merging.
A framework for few-shot evaluation of language models.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
Awesome-LLM: a curated list of Large Language Model
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
[ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
ai副业赚钱大集合,教你如何利用ai做一些副业项目,赚取更多额外收益。The Ultimate Guide to Making Money with AI Side Hustles: Learn how to leverage AI for some cool side gigs and rake in some extra cash. Check out the English versi…
Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
Python 开源项目之「自学编程之路」,保姆级教程:AI实验室、宝藏视频、数据结构、学习指南、机器学习实战、深度学习实战、网络爬虫、大厂面经、程序人生、资源分享。
We present the first systematic study on the scaling property of raw agents instantiated by LLMs. We find that performance scales with the increase in the number of agents, using the simple(st) way…
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
A platform for building proxies to bypass network restrictions.
算法导论第三版答案(从其他git摘取得, 供自己学习对照使用)