Skip to content
#

llm-benchmarking

Here are 21 public repositories matching this topic...

We introduce a benchmark for testing how well LLMs can find vulnerabilities in cryptographic protocols. By combining LLMs with symbolic reasoning tools like Tamarin, we aim to improve the efficiency and thoroughness of protocol analysis, paving the way for future AI-powered cybersecurity defenses.

  • Updated Nov 4, 2024
  • Haskell

Biologically and economically aligned AI safety benchmarks for LLM-s with simplified observation format. The benchmark themes include sustainability, multi-objective homeostasis, (multi-objective) diminishing returns, complementary goods, multi-agent resource sharing.

  • Updated Feb 2, 2025
  • Python

Improve this page

Add a description, image, and links to the llm-benchmarking topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-benchmarking topic, visit your repo's landing page and select "manage topics."

Learn more