Modelling Non-stationary 'Big Data'
Jennifer Castle,
Jurgen Doornik and
David Hendry
No 905, Economics Series Working Papers from University of Oxford, Department of Economics
Abstract:
Seeking substantive relationships among vast numbers of spurious connections when modelling Big Data requires an appropriate approach. Big Data are useful if they can increase the probability that the data generation process is nested in the postulated model, increase the power of specification and mis-specification tests, and yet do not raise the chances of adventitious significance. Simply choosing the best-fitting equation or trying hundreds of empirical fits and selecting a preferred one–perhaps contradicted by others that go unreported–is not going to lead to a useful outcome. Wide-sense non-stationarity (including both distributional shifts and integrated data) must be taken into account. The paper discusses the use of principal components analysis to identify cointegrating relations as a route to handling that aspect of non-stationary big data, along with saturation to handle distributional shifts, and models the monthly UK unemployment rate, using both macroeconomic and Google Trends data, searching over 3000 explanatory variables and yet identifying a parsimonious, well-specified and theoretically interpretable model specification.
Keywords: Cointegration; Big Data; Model Selection; Outliers; Indicator Saturation; Autometrics (search for similar items in EconPapers)
JEL-codes: C51 Q54 (search for similar items in EconPapers)
Date: 2020-04-15
New Economics Papers: this item is included in nep-big, nep-ecm, nep-ets and nep-ore
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
https://ora.ox.ac.uk/objects/uuid:fdad75eb-06e4-4a2c-8401-0b8236cec292 (text/html)
Related works:
Journal Article: Modelling non-stationary ‘Big Data’ (2021) 
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:oxf:wpaper:905
Access Statistics for this paper
More papers in Economics Series Working Papers from University of Oxford, Department of Economics Contact information at EDIRC.
Bibliographic data for series maintained by Anne Pouliquen ( this e-mail address is bad, please contact ).