[go: up one dir, main page]

Improved Named Entity Recognition for Noisy Call Center Transcripts

Sam Davidson, Jordan Hosier, Yu Zhou, Vijay Gurbani


Abstract
We explore the application of state-of-the-art NER algorithms to ASR-generated call center transcripts. Previous work in this domain focused on the use of a BiLSTM-CRF model which relied on Flair embeddings; however, such a model is unwieldy in terms of latency and memory consumption. In a production environment, end users require low-latency models which can be readily integrated into existing pipelines. To that end, we present two different models which can be utilized based on the latency and accuracy requirements of the user. First, we propose a set of models which utilize state-of-the-art Transformer language models (RoBERTa) to develop a high-accuracy NER system trained on custom annotated set of call center transcripts. We then use our best-performing Transformer-based model to label a large number of transcripts, which we use to pretrain a BiLSTM-CRF model and further fine-tune on our annotated dataset. We show that this model, while not as accurate as its Transformer-based counterpart, is highly effective in identifying items which require redaction for privacy law compliance. Further, we propose a new general annotation scheme for NER in the call-center environment.
Anthology ID:
2021.wnut-1.40
Volume:
Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)
Month:
November
Year:
2021
Address:
Online
Editors:
Wei Xu, Alan Ritter, Tim Baldwin, Afshin Rahimi
Venue:
WNUT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
361–370
Language:
URL:
https://aclanthology.org/2021.wnut-1.40
DOI:
10.18653/v1/2021.wnut-1.40
Bibkey:
Cite (ACL):
Sam Davidson, Jordan Hosier, Yu Zhou, and Vijay Gurbani. 2021. Improved Named Entity Recognition for Noisy Call Center Transcripts. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 361–370, Online. Association for Computational Linguistics.
Cite (Informal):
Improved Named Entity Recognition for Noisy Call Center Transcripts (Davidson et al., WNUT 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.wnut-1.40.pdf
Data
CoNLL 2003