swe-bench

Here are 3 public repositories matching this topic...

JARVIS-Xs / SE-Agent

SE-Agent is a self-evolution framework for LLM Code agents. It enables trajectory-level evolution to exchange information across reasoning paths via Revision, Recombination, and Refinement, expanding the search space and escaping local optima. On SWE-bench Verified, it achieves SOTA performance

mcts code-fix swe-agent test-time-scaling claude-code code-agent swe-bench self-evolve

Updated Sep 23, 2025
Python

abhaymundhara / llm-benchmark-suite

Star

Benchmark suite for evaluating LLMs and SLMs on coding and SE tasks. Features HumanEval, MBPP, SWE-bench, and BigCodeBench with an interactive Streamlit UI. Supports cloud APIs (OpenAI, Anthropic, Google) and local models via Ollama. Tracks pass rates, latency, token usage, and costs.

python benchmark evaluation gemini openai code-generation claude streamlit humaneval llm ollama swe-bench mbpp bigcodebench

Updated Dec 3, 2025
Python

gk1712 / llm-council

Star

🤖 Unite multiple LLMs in a "Council" to provide ranked responses, enhancing query accuracy and insight through collective intelligence.

developer-tools code-review chatbots cursor ai-agents coding-assistant anthropic ai-conversations ollama llm-chat coding-agents coderabbit vibe-coding vibecoding claude-code swe-bench claude-code-hooks llm-council

Updated Jan 10, 2026
Python

Improve this page

Add a description, image, and links to the swe-bench topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the swe-bench topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly