Trustworthy NLP
Our team has contributed significant efforts to advancing Trustworthy NLP, with a focus on developing more robust, fair, and explainable large language models (LLMs). Through innovations in model training, evaluation, and interpretation, we are hoping to develop LLMs that are reliable, unbiased, and transparent. Our work is helping overcome key challenges in deploying LLMs responsibly. Below, we summarize our work on robustness, fairness and explainability of LLMs.
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao, Heng Zhao, Bo Shen, Ali Payani, Fan Yang, Mengnan Du
arXiv:2410.00153, 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao, Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du
arXiv:2402.10688, 2024
Quantifying Multilingual Performance of Large Language Models Across Languages
Zihao Li, Yucheng Shi, Zirui Liu, Fan Yang, Ninghao Liu, Mengnan Du
arXiv:2404.11553, 2024
The Impact of Reasoning Step Length on Large Language Models
Mingyu Jin, Qinkai Yu, Dong shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du
ACL Findings, 2024 [Code]
Exploring the Alignment Landscape: LLMs and Geometric Deep Models in Protein Representation
Dong Shu, Bingbing Duan, Kai Guo, Kaixiong Zhou, Jiliang Tang, Mengnan Du
arXiv:2411.05316, 2024
Exploring Multilingual Probing in Large Language Models: A Cross-Language Analysis
Daoyang Li, Mingyu Jin, Qingcheng Zeng, Haiyan Zhao, Mengnan Du
arXiv:2409.14459, 2024
DemoShapley: Valuation of Demonstrations for In-Context Learning
Shan Xie, Man Luo, Chadly Daniel Stern, Mengnan Du, Lu Cheng
arXiv:2410.07523, 2024
Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?
Mingyu Jin, Qinkai Yu, Jingyuan Huang, Qingcheng Zeng, Zhenting Wang, Wenyue Hua, Haiyan Zhao, Kai Mei, Yanda Meng, Kaize Ding, Fan Yang, Mengnan Du, Yongfeng Zhang
COLING, 2025 [Code] [Website]
Usable XAI: 10 Strategies Towards Exploiting Explainability in the LLM Era
Xuansheng Wu, Haiyan Zhao, Yaochen Zhu, Yucheng Shi, Fan Yang, Tianming Liu, Xiaoming Zhai, Wenlin Yao, Jundong Li, Mengnan Du, Ninghao Liu
arXiv:2403.08946, 2024 [Code]
Explainability for Large Language Models: A Survey
Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, Mengnan Du
ACM Transactions on Intelligent Systems and Technology (TIST) [Github]
M4: A Unified XAI Benchmark for Faithfulness Evaluation of Feature Attribution Methods across Metrics, Modalities and Models
Xuhong Li, Mengnan Du, Jiamin Chen, Yekun Chai, Himabindu Lakkaraju, Haoyi Xiong
NeurIPS Datasets and Benchmarks track, 2023
Strategic Demonstration Selection for Improved Fairness in LLM In-Context Learning
Jingyu Hu, Weiru Liu, Mengnan Du
EMNLP (Main Track), 2024 [Code] [Website]
Enhancing Fairness in In-Context Learning: Prioritizing Minority Samples in Demonstrations
Jingyu Hu, Mengnan Du
ICLR Tiny Papers Track, 2024
A Survey on Fairness in Large Language Models
Yingji Li, Mengnan Du, Rui Song, Xin Wang, Ying Wang
arXiv:2308.10149, 2023
Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases
Yingji Li, Mengnan Du, Xin Wang and Ying Wang
The 61st Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Mitigating Shortcuts in Language Models with Soft Label Encoding
Zirui He, Huiqi Deng, Haiyan Zhao, Ninghao Liu, Mengnan Du
COLING, 2024 [Code]
Unveiling Project-Specific Bias in Neural Code Models
Zhiming Li, Yanzhou Li, Tianlin Li, Mengnan Du, Bozhi Wu, Yushi Cao, Xiaofei Xie, Yi Li, Yang Liu
COLING, 2024
Shortcut Learning of Large Language Models in Natural Language Understanding
Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, Xia Hu
Communications of the ACM (CACM), 2023
Robustness Challenges in Model Distillation and Pruning for Natural Language Understanding
Mengnan Du, Subho Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu and Ahmed Hassan Awadallah
The 17th Annual Meeting of the European chapter of the Association for Computational Linguistics (EACL), 2023
Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models
Mengnan Du, Varun Manjunatha, Rajiv Jain, Ruchi Deshpande, Franck Dernoncourt, Jiuxiang Gu,
Tong Sun and Xia Hu
North American Chapter of the Association for Computational Linguistics (NAACL), 2021