2

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models

Recent studies indicate that NLU models are prone to rely on shortcut features for prediction. As a result, these models could potentially fail to generalize to real-world out-of-distribution scenarios. In this work, we show that the shortcut …

A Unified Taylor Framework for Revisiting Attribution Methods

Attribution methods have been developed to understand the decision making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon …

Mitigating Gender Bias in Captioning Systems

Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, …

Differentiated Explanation of Deep Neural Networks with Skewed Distributions

We propose a simple but efficent approch for the differentiated explanations of black-box classifiers. To do this, we introduce a trainable relevance estimator that produces relevance scores in a skewed distribution. Specifically, we present the …

Techniques for Interpretable Machine Learning

Interpretable machine learning tackles the important problem that humans cannot understand the behaviors of complex machine learning models and how these models arrive at a particular decision. Although many approaches have been proposed, a …

Fairness in Deep Learning: A Computational Perspective

Deep learning is increasingly being used in high-stake decision making applications that affect individual lives. However, deep learning models might exhibit algorithmic discrimination behaviors with respect to protected groups, potentially posing …

Learning Credible DNNs via Incorporating Prior Knowledge and Model Local Explanation

Recent studies have shown that state-of-the-art DNNs are not always credible, despite their impressive performance on the hold-out test set of a variety of tasks. These models tend to exploit dataset shortcuts to make predictions, rather than learn …