ConTrust: Robust Contrastive Explanations for Deep Neural Networks

(started 2022, duration 4 years)


Deep Neural Networks (DNNs) are increasingly being used in automated decision making. However, they often produce outputs that are not intelligible to humans. Being able to understand their predictions has become crucial for their safe deployment in any area where accuracy and explainability are paramount.


The area of Explainable Artificial Intelligence (XAI) is concerned with providing methods and tools to improve the interpretability of DNNs. A widely recognised factor contributing to this end is the availability of contrastive explanations, arguments supporting or contrasting the decisions taken by a DNN. While several approaches exist to generate such explanations, they are often lacking robustness, i.e., they may produce completely different explanations for similar events. This phenomenon can have troubling implications, as lack of robustness may indicate that explanations are not capturing the underlying decision-making process of a DNN and thus cannot be trusted.


My proposal is to tackle these problems using an approach based on Verification of Neural Network (VNN). VNN is concerned with providing methods to certify that DNNs satisfy a given property, or generate counterexamples to comprehensively show the circumstances under which violations may occur. Crucially, a large body of VNN research focuses on certifying the robustness of predictions made by DNNs and efficient tools have been developed for this purpose.

In this project, I will extend techniques for the verification of DNNs and develop new explainability methods to generate robust contrastive explanations for deep neural networks. Such an approach is motivated by several similarities between the two research areas, but also by the lack of effective solutions within XAI for generating robust explanations.


This research is funded by Imperial College London under the ICRF fellowship scheme.