Quantitative Tools for Time Series Analysis in Natural Language Processing: A Practitioners Guide
Wolfgang Schmal
Papers from arXiv.org
Abstract:
Natural language processing tools have become frequently used in social sciences such as economics, political science, and sociology. Many publications apply topic modeling to elicit latent topics in text corpora and their development over time. Here, most publications rely on visual inspections and draw inference on changes, structural breaks, and developments over time. We suggest using univariate time series econometrics to introduce more quantitative rigor that can strengthen the analyses. In particular, we discuss the econometric topics of non-stationarity as well as structural breaks. This paper serves as a comprehensive practitioners guide to provide researchers in the social and life sciences as well as the humanities with concise advice on how to implement econometric time series methods to thoroughly investigate topic prevalences over time. We provide coding advice for the statistical software R throughout the paper. The application of the discussed tools to a sample dataset completes the analysis.
Date: 2024-04
New Economics Papers: this item is included in nep-big and nep-ecm
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2404.18499 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2404.18499
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().