prorok9898

prorok9898

Popular repositories Loading

ERR-EVAL ERR-EVAL Public

🔍 Evaluate AI models' ability to detect ambiguity and manage uncertainty with the ERR-EVAL benchmark for reliable epistemic reasoning.

Python
prorok9898.github.io prorok9898.github.io Public

🔍 Evaluate AI models' reliability against ambiguity and uncertainty with the ERR-EVAL benchmark, ensuring accurate and calibrated responses in challenging scenarios.