A unified evaluation framework for large language models
-
Updated
Feb 20, 2026 - Python
A unified evaluation framework for large language models
Corruption and Perturbation Robustness (ICLR 2019)
Benchmarking Generalized Out-of-Distribution Detection
PyBullet CartPole and Quadrotor environments—with CasADi symbolic a priori dynamics—for learning-based control and RL
A Harder ImageNet Test Set (CVPR 2021)
Code and information for face image quality assessment with SER-FIQ
🔥🔥🔥[AAAI 2026 Oral] Official Implementation of Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Diffusion Classifier leverages pretrained diffusion models to perform zero-shot classification without additional training
alpha-beta-CROWN: An Efficient, Scalable and GPU Accelerated Neural Network Verifier (winner of VNN-COMP 2021, 2022, 2023, 2024, 2025)
Benchmark your model on out-of-distribution datasets with carefully collected human comparison data (NeurIPS 2021 Oral)
auto_LiRPA: An Automatic Linear Relaxation based Perturbation Analysis Library for Neural Networks and General Computational Graphs
Tensorflow implementation of "Compounding the Performance Improvements of Assembled Techniques in a Convolutional Neural Network"
[NeurIPS 2023] RoboDepth: Robust Out-of-Distribution Depth Estimation under Corruptions
A repository and benchmark for online test-time adaptation.
ImageNet-R(endition) and DeepAugment (ICCV 2021)
Self-Supervised Learning for OOD Detection (NeurIPS 2019)
Extend python lists operations using .NET's LINQ syntax for clean and fast coding.
A preliminary evaluation of ChatGPT/GPT-4 for machine translation.
Code & data accompanying the NeurIPS 2020 paper "Iterative Deep Graph Learning for Graph Neural Networks: Better and Robust Node Embeddings".
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296
Add a description, image, and links to the robustness topic page so that developers can more easily learn about it.
To associate your repository with the robustness topic, visit your repo's landing page and select "manage topics."