Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models

Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, Qi Liu

Abstract

Large Language Models (LLMs) have shown remarkable capabilities in various natural language processing tasks. However, LLMs may rely on dataset biases as shortcuts for prediction, which can significantly impair their robustness and generalization capabilities. This paper presents Shortcut Suite, a comprehensive test suite designed to evaluate the impact of shortcuts on LLMs’ performance, incorporating six shortcut types, five evaluation metrics, and four prompting strategies. Our extensive experiments yield several key findings: 1) LLMs demonstrate varying reliance on shortcuts for downstream tasks, which significantly impairs their performance. 2) Larger LLMs are more likely to utilize shortcuts under zero-shot and few-shot in-context learning prompts. 3) Chain-of-thought prompting notably reduces shortcut reliance and outperforms other prompting strategies, while few-shot prompts generally underperform compared to zero-shot prompts. 4) LLMs often exhibit overconfidence in their predictions, especially when dealing with datasets that contain shortcuts. 5) LLMs generally have a lower explanation quality in shortcut-laden datasets, with errors falling into three types: distraction, disguised comprehension, and logical fallacy. Our findings offer new insights for evaluating robustness and generalization in LLMs and suggest potential directions for mitigating the reliance on shortcuts.

Anthology ID:: 2024.emnlp-main.679
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12188–12200
Language:
URL:: https://aclanthology.org/2024.emnlp-main.679
DOI:: 10.18653/v1/2024.emnlp-main.679
Bibkey:
Cite (ACL):: Yu Yuan, Lili Zhao, Kai Zhang, Guangting Zheng, and Qi Liu. 2024. Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 12188–12200, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Do LLMs Overcome Shortcut Learning? An Evaluation of Shortcut Challenges in Large Language Models (Yuan et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.679.pdf

PDF Cite Search