Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Théo Gigant, Camille Guinaudeau, Marc Decombas, Frederic Dufaux

Abstract

Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independant of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlates poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used along reference-based metrics to improve their robustness in low quality reference settings.

Anthology ID:: 2024.emnlp-main.1078
Volume:: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19355–19368
Language:
URL:: https://aclanthology.org/2024.emnlp-main.1078
DOI:
Bibkey:
Cite (ACL):: Théo Gigant, Camille Guinaudeau, Marc Decombas, and Frederic Dufaux. 2024. Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 19355–19368, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics (Gigant et al., EMNLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.emnlp-main.1078.pdf
Software:: 2024.emnlp-main.1078.software.zip

PDF Cite Search Software