Author
Listed:
- Miller, Corey
(Rev, USA)
- Jetté, Migüel
(Rev, USA)
- Kokotov, Dan
(CNaught, USA)
AbstractAs automatic speech recognition (ASR) has improved, it has become a viable tool for content transcription. Prior to the use of ASR for this task, content transcription was achieved through human effort alone. Despite improvements, ASR performance is as yet imperfect, especially in more challenging conditions (eg multiple speakers, noise, nonstandard accents). Given this, a promising way forward is a human-in-the-loop (HIL) approach. This contribution describes our work with HIL ASR on the transcription task. Traditionally, ASR performance has been measured using word error rate (WER). This measure may not be sufficient to describe the full set of errors that a speech-to-text (STT) pipeline designed for transcription can make, such as those involving capitalisation, punctuation, and inverse text normalisation (ITN). It is therefore the case that improved WER does not always lead to increased productivity, and the inclusion of ASR in HIL may adversely affect productivity if it contains too many errors. Rev.com provides a convenient laboratory to explore these questions. Originally, the company provided transcriptions of audio and video content executed solely by humans (known as Revvers). More recently, ASR was introduced in an HIL workflow where Revvers postedited an ASR first draft. We provide an analysis of the interaction between metrics of ASR accuracy and the productivity of our 72,000+ Revvers transcribing more than 15,000 hours of media every week. To do this, we utilise two measures of transcriptionist productivity: transcriber real time factor (RTF) and words per minute (WPM). Through our work, we hope to focus attention on the human productivity and quality of experience (QoE) aspects of improvements in ASR and related technologies. Given the broad scope of content transcription applications and the still elusive objective of perfect machine performance, keeping the human in the loop in both practice and mind is critical. This paper provides an overview of human and machine transcription and Rev’s marketplace, followed by an analysis of the relationship between ASR accuracy and transcriptionist productivity, and concludes with suggestions for future work.
Suggested Citation
Miller, Corey & Jetté, Migüel & Kokotov, Dan, 2022.
"Human–machine collaboration in transcription,"
Journal of AI, Robotics & Workplace Automation, Henry Stewart Publications, vol. 2(1), pages 24-36, September.
Handle:
RePEc:aza:airwa0:y:2022:v:2:i:1:p:24-36
Download full text from publisher
As the access to this document is restricted, you may want to search for a different version of it.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aza:airwa0:y:2022:v:2:i:1:p:24-36. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
We have no bibliographic references for this item. You can help adding them by using this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Henry Stewart Talks (email available below). General contact details of provider: .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.