ConTrust: Robust Contrastive Explanations for Deep Neural Networks
(started Aug 2022, duration 4 years)
The area of Explainable Artificial Intelligence (XAI) is concerned with providing methods and tools to improve the interpretability of learned models, such as Deep Neural Networks (DNNs). A widely recognised factor contributing to this end is the availability of contrastive explanations, arguments supporting or contrasting the decisions taken by a DNN. While several approaches exist to generate such explanations, they are often lacking robustness, i.e., they may produce completely different explanations for similar events. This phenomenon can have troubling implications, as lack of robustness may indicate that explanations are not capturing the underlying decision-making process of a DNN and thus cannot be trusted.
My proposal is to tackle these problems using an approach based on Verification of Neural Network (VNN). VNN is concerned with providing methods to certify that DNNs satisfy a given property, or generate counterexamples to comprehensively show the circumstances under which violations may occur. Crucially, a large body of VNN research focuses on certifying the robustness of predictions made by DNNs and efficient tools have been developed for this purpose.
In this project, I will extend techniques for the verification of DNNs and develop new explainability methods to generate robust contrastive explanations for deep neural networks. Such an approach is motivated by several similarities between the two research areas, but also by the lack of effective solutions within XAI for generating robust explanations.
- Towards Robust Contrastive Explanations for Human-Neural Multi-agent Systems. F. Leofante, A. Lomuscio. Twenty-second International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2023).
- Formalising the Robustness of Counterfactual Explanations for Neural Networks. J. Jiang*, F. Leofante*, A. Rago, F. Toni. Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI 2023). * Equal contribution.
This research is funded by Imperial College London under the ICRF fellowship scheme.