Skip to content
#

humaneval

Here are 14 public repositories matching this topic...

SkyCode是一个多语言开源编程大模型,采用GPT3模型结构,支持Java, JavaScript, C, C++, Python, Go, shell等多种主流编程语言,并能理解中文注释。模型可以对代码进行补全,拥有强大解题能力,使您从编程中解放出来,专心于解决更重要的问题。| SkyCode is an open source programming model, which adopts the GPT3 model structure. It supports Java, JavaScript, C, C++, Python, Go, shell and other languages, and can understand Chinese comments.

  • Updated Mar 2, 2023

Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.

  • Updated Dec 3, 2025
  • Python

Dataset management library for ML experiments—loaders for SciFact, FEVER, GSM8K, HumanEval, MMLU, TruthfulQA, HellaSwag; git-like versioning with lineage tracking; transformation pipelines; quality validation with schema checks and duplicate detection; GenStage streaming for large datasets. Built for reproducible AI research.

  • Updated Dec 6, 2025
  • Elixir

Improve this page

Add a description, image, and links to the humaneval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the humaneval topic, visit your repo's landing page and select "manage topics."

Learn more