Google Scholar

On the blind spots of model-based evaluation metrics for text generation

T He, J Zhang, T Wang, S Kumar, K Cho… - arXiv preprint arXiv …, 2022 - arxiv.org

In this work, we explore a useful but often neglected methodology for robustness analysis of
text generation evaluation metrics: stress tests with synthetic data. Basically, we design and
synthesize a wide range of potential errors and check whether they result in a
commensurate drop in the metric scores. We examine a range of recently proposed
evaluation metrics based on pretrained language models, for the tasks of open-ended
generation, translation, and summarization. Our experiments reveal interesting …

Enregistrer Citer Cité 31 fois Autres articles Les 7 versions Version HTML

[CITATION][C] On the blind spots of model-based evaluation metrics for text generation

H Tianxing, Z Jingyu, W Tianle, K Sachin, T Yulia - arXiv preprint arXiv: 2212.10020, 2022

Enregistrer Citer Cité 2 fois Autres articles

Résultats de recherche les plus pertinents Voir tous les résultats

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

On the blind spots of model-based evaluation metrics for text generation

[CITATION][C] On the blind spots of model-based evaluation metrics for text generation