Given the success of neural networks in recent years, and especially after the success of deep architectures, their use has been expanding to ever more critical application areas such as security, autonomous driving, and healthcare. Contrary to previous well-documented and thoroughly tested approaches, we still have little understanding of what such models learn and when they could fail. The question that naturally arises is whether we can trust such systems to undertake safety-critical tasks. Furthermore, from a commercial perspective, companies employing such models in their products should be able to present them to customers in an understandable way, not only to increase their propensity to buy the product, but also in the light of recent European Union directives (2016 General Data Protection Regulation, art. 22) that essentially require accountable models.
Based on the work of Nguyen et. al., 2014, who showed that convolutional neural networks can misclassify with high certainty images that are unrecognizable to humans, we aim to show that recurrent neural networks (RNN) can also exhibit this type of behavior. For this purpose, we will use similar approaches to those included in the aforementioned work (evolutionary algorithms) to generate inputs that will help illustrate critical failures of an RNN, and we will employ visualization techniques that allow users to investigate and understand such failures, as well as associate them with the network’s input.
We will begin our research in the context of Natural Language Processing, since it is an area where RNN have been extensively studied, and aim to extend our research in autonomous systems, e.g. self-driving vehicles, at a future stage.