面向深度学习模型的可靠性测试综述

这个 Github 存储库总结了深度学习模型的可靠性测试资源的精选列表。有关更多详细信息和分类标准，请参阅我们的综述论文。

为什么研究可靠性测试？深度学习模型由于其出色的性能表现而在各个领域被广泛应用，但它们在面对不确定输入时，往往会出现意料之外的错误行为，在诸如自动驾驶系统等安全关键应用，可能会造成灾难性的后果。深度模型的可靠性问题引起了学术界和工业界的广泛关注。因此，在深度模型部署前迫切需要对模型进行系统性测试，通过生成测试样本，并由模型的输出得到测试报告，以评估模型的可靠性，提前发现潜在缺陷。然而，深度测试虽然已在多个领域得到应用，但尚缺少对其任务性能、安全性、公平性与隐私性四个方面展开全面测试的方法综述。

任务性能测试

模型准确率测试
训练程度测试

安全性测试

推理阶段安全性测试
训练阶段安全性测试
测试样本选取方法

公平性和隐私性测试

公平性测试
隐私性测试

可靠性测试的应用

自动驾驶
语音识别
自然语言处理

安全性测试方法

模型准确率测试

Classifier variability: Accounting for training and testing [pdf]

SynEva: Evaluating ML Programs by Mirror Program Synthesis [pdf]

训练程度测试

Perturbed Model Validation: A New Framework to Validate Model Relevance [pdf]

Detecting Overfitting via Adversarial Examples [pdf]

Circuit-Based Intrinsic Methods to Detect Overfitting [pdf]

Test data reuse for evaluation of adaptive machine learning algorithms: over-fitting to a fixed 'test' dataset and a potential solution [pdf]

MODE: Automated neural network model debugging via state differential analysis and input selection [pdf]

推理阶段安全性测试

基于覆盖率的测试方法

DeepXplore: Automated Whitebox Testing of Deep Learning Systems [pdf]

DLFuzz: Differential Fuzzing Testing of Deep Learning Systems [pdf]

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing [pdf]

DeepTest: Automated Testing of Deep-Neural-Network-driven Autonomous Cars [pdf]

DeepGauge: Multi-Granularity Testing Criteria for Deep Learning Systems [pdf]

Deephunter: A coverage-guided fuzz testing framework for deep neural networks [pdf]

Effective White-Box Testing of Deep Neural Networks with Adaptive Neuron-Selection Strategy [pdf]

DeepCT: Tomographic Combinatorial Testing for Deep Learning Systems [pdf]

Testing Deep Neural Networks [pdf]

Concolic Testing for Deep Neural Networks [pdf]

DeepCruiser: Automated Guided Testing for Stateful Deep Learning Systems [pdf]

DeepStellar: Model-Based Quantitative Analysis of Stateful Deep Learning Systems [pdf]

testRNN: Coverage-guided Testing on Recurrent Neural Networks [pdf]

覆盖率方法的局限性

Structural Coverage Criteria for Neural Networks Could Be Misleading [pdf]

Is Neuron Coverage a Meaningful Measure for Testing Deep Neural Networks? [pdf]

There is Limited Correlation between Coverage and Robustness for Deep Neural Networks [pdf]

基于变异的测试方法

An Analysis and Survey of the Development of Mutation Testing [pdf]

DeepMutation: Mutation Testing of Deep Learning Systems [pdf]

DeepMutation++: a Mutation Testing Framework for Deep Learning Systems [pdf]

DeepCrime: mutation testing of deep learning systems based on real faults [pdf]

MuNN: Mutation Analysis of Neural Networks [pdf]

DEEPMETIS: Augmenting a Deep Learning Test Set to Increase its Mutation Score [pdf]

基于修复的测试方法

Apricot: A Weight-Adaptation Approach to Fixing Deep Learning Models [pdf]

Plum: Exploration and Prioritization of Model Repair Strategies for Fixing Deep Learning Models [pdf]

DeepCorrect: Correcting DNN Models against Image Distortions [pdf]

DeepFault: Fault Localization for Deep Neural Networks [pdf]

RobOT: Robustness-Oriented Testing for Deep Learning Systems [pdf]

Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks [pdf]

DialTest: automated testing for recurrent-neural-network-driven dialogue systems [pdf]

DeepRepair: Style-Guided Repairing for DNNs in the Real-world Operational Environment [pdf]

TauMed: test augmentation of deep learning in medical diagnosis [pdf]

训练阶段安全性测试

离线检测

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks [pdf]

TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems [pdf]

Scalable Backdoor Detection in Neural Networks [pdf]

DeepInspect: A Black-box Trojan Detection and Mitigation Framework for Deep Neural Networks [pdf]

Detecting AI Trojans Using Meta Neural Analysis [pdf]

在线检测

ABS: Scanning Neural Networks for Back-doors by Artificial Brain Stimulation [pdf]

EX-RAY: Distinguishing Injected Backdoor from Natural Features in Neural Networks by Examining Differential Feature Symmetry [pdf]

测试样本的选取方法

DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks [pdf]

Input Prioritization for Testing Neural Networks [pdf]

A Noise-Sensitivity-Analysis-Based Test Prioritization Technique for Deep Neural Networks [pdf]

Neuron Activation Frequency Based Test Case Prioritization [pdf]

Test Selection for Deep Learning Systems [pdf]

Prioritizing Test Inputs for Deep Neural Networks via Mutation Analysis [pdf]

Guiding Deep Learning System Testing using Surprise Adequacy [pdf]

Multiple-boundary clustering and prioritization to promote neural network retraining [pdf]

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models [pdf]

Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions [pdf]

公平性测试

个体公平性

Automated Directed Fairness Testing [pdf]

Automated Test Generation to Detect Individual Discrimination in AI Models [pdf]

White-box fairness testing through adversarial sampling [pdf]

群体公平性

Fairness Testing: Testing Software for Discrimination [pdf]

隐私性测试

DP-Finder: Finding Differential Privacy Violations by Sampling and Optimization [pdf]

Testing Differential Privacy with Dual Interpreters [pdf]

可靠性测试的应用

自动驾驶

DeepBillboard: Systematic Physical-World Testing of Autonomous Driving Systems [pdf]

Model-based Exploration of the Frontier of Behaviours for Deep Learning System Testing [pdf]

Automated Test Cases Prioritization for Self-driving Cars in Virtual Environments [pdf]

语音识别

CrossASR: Efficient differential testing of automatic speech recognition via text-to-speech [pdf]

CrossASR++ : A Modular Differential Testing Framework for Automatic Speech Recognition [pdf]

自然语言处理

Metamorphic testing for machine translations: MT4MT [pdf]

Automatic Testing and Improvement of Machine Translation [pdf]

在线模型库

Caffe Model Zoo

Caffe是一个考虑了表达、运行速度和模块化的深度学习框架。在Caffe Model Zoo中，集成了由许多研究人员和工程师使用各种架构和数据为不同的任务制作的Caffe模型，这些预训练模型可以应用于多种任务和研究中，从简单回归到大规模视觉分类，再到语音和机器人应用。[Web]

ONNX Model Zoo

开放神经网络交换（Open Neural Network Exchange, ONNX）是一种用于表示机器学习模型的开放标准格式。ONNX定义了一组通用运算符、机器学习和深度学习模型的构建块，以及一种通用文件格式，使AI开发人员能够使用具有各种框架、工具、运行时和编译器的模型。ONNX Model Zoo是由社区成员贡献的ONNX格式的预训练的、最先进的集成模型库。模型任务涵盖了图像分类、目标检测、机器翻译等十种多领域任务。[Web]

BigML model market

BigML是一个可消耗，可编程且可扩展的机器学习平台，可轻松解决分类、回归、时间序列预报、聚类分析、异常检测、关联发现和主题建模任务，并将它们自动化。BigML促进了跨行业的无限预测应用，包括航空航天、汽车、能源、娱乐、金融服务、食品、医疗保健、物联网、制药、运输、电信等等。 [Web]

Amazon SageMaker

Amazon SageMaker是由亚马逊提供的机器学习服务平台，通过整合专门为机器学习构建的广泛功能集，帮助数据科学家和开发人员快速准备、构建、训练和部署高质量的机器学习模型。SageMaker消除了机器学习过程中每个步骤的繁重工作，让开发高质量模型变得更加轻松。SageMaker在单个工具集中提供了用于机器学习的所有组件，因此模型将可以通过更少的工作量和更低的成本更快地投入生产。 [Web]

常用工具包

Themis

Galhotra等人提出了Themis，一个开源的、用于检测因果偏见的公平性测试工具。它可以通过生成有效的测试套件来测量歧视是否存在。在给定描述有效系统输入的模式时，Themis会自动生成判别测试。应用场景包括金融贷款、医疗诊断和治疗、促销行为、刑事司法系统等。 [Web]

mltest

测试工具mltest，是一个用于为基于Tensorflow的机器学习系统编写单元测试的测试框架。它可以通过极少的设置，实现包括变量变化、变量恒定、对数范围检查、输入依赖、NaN和Inf张量检查等多种不同的常见机器学习问题进行综合测试。遗憾的是，Tensorflow2.0的发布，破坏了该测试工具的大部分功能。 [Web]

torchtest

torchtest受mltest启发，与mltest功能类似，torchtest用于为基于pytorch的机器学习系统编写单元测试的测试框架。 [Web]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Applications		Applications
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

面向深度学习模型的可靠性测试综述

安全性测试方法

模型准确率测试

训练程度测试

推理阶段安全性测试

基于覆盖率的测试方法

覆盖率方法的局限性

基于变异的测试方法

基于修复的测试方法

训练阶段安全性测试

离线检测

在线检测

测试样本的选取方法

公平性测试

个体公平性

群体公平性

隐私性测试

可靠性测试的应用

自动驾驶

语音识别

自然语言处理

在线模型库

Caffe Model Zoo

ONNX Model Zoo

BigML model market

Amazon SageMaker

常用工具包

Themis

mltest

torchtest

About

Releases

Packages

Languages

Allen-piexl/Testing-Zoo

Folders and files

Latest commit

History

Repository files navigation

面向深度学习模型的可靠性测试综述

安全性测试方法

模型准确率测试

训练程度测试

推理阶段安全性测试

基于覆盖率的测试方法

覆盖率方法的局限性

基于变异的测试方法

基于修复的测试方法

训练阶段安全性测试

离线检测

在线检测

测试样本的选取方法

公平性测试

个体公平性

群体公平性

隐私性测试

可靠性测试的应用

自动驾驶

语音识别

自然语言处理

在线模型库

Caffe Model Zoo

ONNX Model Zoo

BigML model market

Amazon SageMaker

常用工具包

Themis

mltest

torchtest

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages