Trustworthy NLP
Our team has contributed significant efforts to advancing Trustworthy NLP, with a focus on developing more robust, fair, and explainable large language models (LLMs). Through innovations in model training, evaluation, and interpretation, we are hoping to develop LLMs that are reliable, unbiased, and transparent. Our work is helping overcome key challenges in deploying LLMs responsibly. Below, we summarize our work on robustness, fairness and explainability of LLMs.
Shortcut Learning of Large Language Models in Natural Language Understanding
Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, Xia Hu
Accepted by Communications of the ACM (CACM), 2023
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
Mengnan Du, Subho Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu and Ahmed Hassan Awadallah
The 17th Annual Meeting of the European chapter of the Association for Computational Linguistics (EACL), 2023
Mitigating Shortcuts in Language Models with Soft Label Encoding
Zirui He, Huiqi Deng, Haiyan Zhao, Ninghao Liu, Mengnan Du
arXiv:2309.09380, 2023
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu,
Tong Sun and Xia Hu
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
A Survey on Fairness in Large Language Models
Yingji Li, Mengnan Du, Rui Song, Xin Wang, Ying Wang
arXiv:2308.10149, 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li, Mengnan Du, Xin Wang and Ying Wang
The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Explainability for Large Language Models: A Survey
Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du
arXiv:2309.01029, 2023 [Github]
M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models
Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, Himabindu Lakkaraju, Haoyi Xiong
NeurIPS Datasets and Benchmarks track, 2023