nep-ecm 2019-07-15 papers

on Econometrics

Issue of 2019‒07‒15
twenty-one papers chosen by
Sune Karlsson
Örebro universitet

Random Forest Estimation of the Ordered Choice Model By Lechner, Michael; Okasa, Gabriel
Heteroskedasticity-Robust Inference in Linear Regression Models By Jochmans, K.
Should We Trust Clustered Standard Errors? A Comparison with Randomization-Based Methods By Lourenço S. Paz; James E. West
On Testing Continuity and the Detection of Failures By Matthew Backus; Sida Peng
An improved approach for estimating large losses in insurance analytics and operational risk using the g-and-h distribution By Marco Bee; Julien Hambuckers; Luca Trapin
Semi-parametric Realized Nonlinear Conditional Autoregressive Expectile and Expected Shortfall By Chao Wang; Richard Gerlach
Simulation smoothing for nowcasting with large mixed-frequency VARs By Sebastian Ankargren; Paulina Jon\'eus
Modified-Likelihood Estimation of Fixed-Effect Models for Dyadic Data By Jochmans, K.
Realized variance modeling: decoupling forecasting from estimation By Fabrizio Cipollini; Giampiero M. Gallo; Alessandro Palandri
Identification of a Class of Health-Outcome Distributions under a Common Form of Partial Data Observability By John Mullahy
Forecasting security's volatility using low-frequency historical data, high-frequency historical data and option-implied volatility By Huiling Yuan; Yong Zhou; Zhiyuan Zhang; Xiangyu Cui
Dynamic Factor Models By Catherine Doz; Peter Fuleky
Policy Targeting under Network Interference By Davide Viviano
Treatment Effects with Heterogeneous Externalities By Arduini, Tiziano; Patacchini, Eleonora; Rainone, Edoardo
Realized Volatility Forecasting: Robustness to Measurement Errors By Fabrizio Cipollini; Giampiero M. Gallo; Edoardo Otranto
Profile-Score Adjustments for Incidental-Parameter Problems By Dhaene, G.; Jochmans, K.
Full Information Estimation of Household Income Risk and Consumption Insurance By Arpita Chatterjee; James Morley; Aarti Singh
Empirical Process Results for Exchangeable Arrays By Laurent Davezies; Xavier D'Haultfoeuille; Yannick Guyonvarch
Dynamic time series clustering via volatility change-points By Nick Whiteley
Selection between Exponential and Lindley distributions By Shovan Chowdhury
Causal Inference By LeRoy, Stephen F.

Random Forest Estimation of the Ordered Choice Model

By:	Lechner, Michael; Okasa, Gabriel
Abstract:	In econometrics so-called ordered choice models are popular when interest is in the estimation of the probabilities of particular values of categorical outcome variables with an inherent ordering, conditional on covariates. In this paper we develop a new machine learning estimator based on the random forest algorithm for such models without imposing any distributional assumptions. The proposed Ordered Forest estimator provides a flexible estimation method of the conditional choice probabilities that can naturally deal with nonlinearities in the data, while taking the ordering information explicitly into account. In addition to common machine learning estimators, it enables the estimation of marginal effects as well as conducting inference thereof and thus providing the same output as classical econometric estimators based on ordered logit or probit models. An extensive simulation study examines the finite sample properties of the Ordered Forest and reveals its good predictive performance, particularly in settings with multicollinearity among the predictors and nonlinear functional forms. An empirical application further illustrates the estimation of the marginal effects and their standard errors and demonstrates the advantages of the flexible estimation compared to a parametric benchmark model.
Keywords:	Ordered choice models, random forests, probabilities, marginal effects, machine learning
JEL:	C14 C25 C40
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:usg:econwp:2019:08&r=all

Heteroskedasticity-Robust Inference in Linear Regression Models

By:	Jochmans, K.
Abstract:	This paper considers inference in heteroskedastic linear regression models with many control variables. The slope coefficients on these variables are nuisance parameters. Our setting allows their number to grow with the sample size, possibly at the same rate, in which case they are not consistently estimable. A prime example of this setting are models with many (possibly multi-way) fixed effects. The presence of many nuisance parameters introduces an incidental-parameter problem in the usual heteroskedasticity-robust estimators of the covariance matrix, rendering them biased and inconsistent. Hence, tests based on these estimators are size distorted even in large samples. An alternative covariance-matrix estimator that is conditionally unbiased and remains consistent is presented and supporting simulation results are provided.
Keywords:	bias, fixed effects, heteroskedasticity, inference, leave-one-out estimator, many regressors, unbalanced regressor design, robust covariance matrix, size control, statistical leverage
Date:	2019–06–25
URL:	http://d.repec.org/n?u=RePEc:cam:camdae:1957&r=all

Should We Trust Clustered Standard Errors? A Comparison with Randomization-Based Methods

By:	Lourenço S. Paz; James E. West
Abstract:	We compare the precision of critical values obtained under conventional sampling-based methods with those obtained using sample order statics computed through draws from a randomized counterfactual based on the null hypothesis. When based on a small number of draws (200), critical values in the extreme left and right tail (0.005 and 0.995) contain a small bias toward failing to reject the null hypothesis which quickly dissipates with additional draws. The precision of randomization-based critical values compares favorably with conventional sampling-based critical values when the number of draws is approximately 7 times the sample size for a basic OLS model using homoskedastic data, but considerably less in models based on clustered standard errors, or the classic Differences-in-Differences. Randomization-based methods dramatically outperform conventional methods for treatment effects in Differences-in-Differences specifications with unbalanced panels and a small number of treated groups.
JEL:	C18 C33
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:25926&r=all

On Testing Continuity and the Detection of Failures

By:	Matthew Backus; Sida Peng
Abstract:	Estimation of discontinuities is pervasive in applied economics: from the study of sheepskin effects to prospect theory and “bunching” of reported income on tax returns, models that predict discontinuities in outcomes are uniquely attractive for empirical testing. However, existing empirical methods often rely on assumptions about the number of discontinuities, the type, the location, or the underlying functional form of the model. We develop a nonparametric approach to the study of arbitrary discontinuities — point discontinuities as well as jump discontinuities in the nth derivative, where n = 0,1,... — that does not require such assumptions. Our approach exploits the development of false discovery rate control methods for lasso regression as proposed by G’Sell et al. (2015). This framework affords us the ability to construct valid tests for both the null of continuity as well as the significance of any particular discontinuity without the computation of nonstandard distributions. We illustrate the method with a series of Monte Carlo examples and by replicating prior work detecting and measuring discontinuities, in particular Lee (2008), Card et al. (2008), Reinhart and Rogoff (2010), and Backus et al. (2018b).
JEL:	C01 C20 C52
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26016&r=all

An improved approach for estimating large losses in insurance analytics and operational risk using the g-and-h distribution

By:	Marco Bee; Julien Hambuckers; Luca Trapin
Abstract:	In this paper, we study the estimation of parameters for g-and-h distributions. These distributions find applications in modeling highly skewed and fat-tailed data, like extreme losses in the banking and insurance sector. We first introduce two estimation methods: a numerical maximum likelihood technique, and an indirect inference approach with a bootstrap weighting scheme. In a realistic simulation study, we show that indirect inference is computationally more efficient and provides better estimates in case of extreme features of the data. Empirical illustrations on insurance and operational losses illustrate these findings.
Keywords:	Intractable likelihood, indirect inference, skewed distribution, tail modeling, bootstrap
JEL:	C15 C46 C51 G22
Date:	2019
URL:	http://d.repec.org/n?u=RePEc:trn:utwprg:2019/11&r=all

Semi-parametric Realized Nonlinear Conditional Autoregressive Expectile and Expected Shortfall

By:	Chao Wang; Richard Gerlach
Abstract:	A joint conditional autoregressive expectile and Expected Shortfall framework is proposed. The framework is extended through incorporating a measurement equation which models the contemporaneous dependence between the realized measures and the latent conditional expectile. Nonlinear threshold specification is further incorporated into the proposed framework. A Bayesian Markov Chain Monte Carlo method is adapted for estimation, whose properties are assessed and compared with maximum likelihood via a simulation study. One-day-ahead VaR and ES forecasting studies, with seven market indices, provide empirical support to the proposed models.
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1906.09961&r=all

Simulation smoothing for nowcasting with large mixed-frequency VARs

By:	Sebastian Ankargren; Paulina Jon\'eus
Abstract:	There is currently an increasing interest in large vector autoregressive (VAR) models. VARs are popular tools for macroeconomic forecasting and use of larger models has been demonstrated to often improve the forecasting ability compared to more traditional small-scale models. Mixed-frequency VARs deal with data sampled at different frequencies while remaining within the realms of VARs. Estimation of mixed-frequency VARs makes use of simulation smoothing, but using the standard procedure these models quickly become prohibitive in nowcasting situations as the size of the model grows. We propose two algorithms that alleviate the computational efficiency of the simulation smoothing algorithm. Our preferred choice is an adaptive algorithm, which augments the state vector as necessary to sample also monthly variables that are missing at the end of the sample. For large VARs, we find considerable improvements in speed using our adaptive algorithm. The algorithm therefore provides a crucial building block for bringing the mixed-frequency VARs to the high-dimensional regime.
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1907.01075&r=all

Modified-Likelihood Estimation of Fixed-Effect Models for Dyadic Data

By:	Jochmans, K.
Abstract:	We consider point estimation and inference based on modifications of the profile likelihood in models for dyadic interactions between n agents featuring agent-specific parameters. This setup covers the ß-model of network formation and generalizations thereof. The maximum-likelihood estimator of such models has bias and standard deviation of O(n-1) and so is asymptotically biased. Estimation based on modified likelihoods leads to estimators that are asymptotically unbiased and likelihood-ratio tests that exhibit correct size. We apply the modifications to versions of the ß-model for network formation and of the Bradley-Terry model for paired comparisons.
Keywords:	asymptotic bias, ß-model, Bradley-Terry model, dyadic data, fixed effects, modified profile likelihood, paired comparisons, matching, network formation, undirected random graph
Date:	2019–06–25
URL:	http://d.repec.org/n?u=RePEc:cam:camdae:1958&r=all

Realized variance modeling: decoupling forecasting from estimation

By:	Fabrizio Cipollini (Dipartimento di Statistica, Informatica, Applicazioni "G. Parenti", UniversitÃ di Firenze); Giampiero M. Gallo (Italian Court of Audits, and New York University in Florence); Alessandro Palandri (Dipartimento di Statistica, Informatica, Applicazioni "G. Parenti", UniversitÃ di Firenze)
Abstract:	In this paper we evaluate the in-sample fit and out-of-sample forecasts of various combinations of realized variance models and estimation criteria . Our empirical findings highlight that: independently of the econometricianâ€™s forecasting loss function, certain estimation criteria perform significantly better than others; the simple ARMA modeling of the log realized variance generates superior forecasts than the HAR family, for any of the forecasting loss functions considered; the (2,1) parameterizations with negative lag-2 coefficient emerge as the benchmark specifications generating the best forecasts and approximating long-run dependence as well as the HAR family.
Keywords:	Variance modeling; Variance forecasting; Heterogeneous Autoregressive (HAR) model; Multiplicative Error Model (MEM); Realized variance space
JEL:	C32 C53 C58 G17
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:fir:econom:wp2019_05&r=all

Identification of a Class of Health-Outcome Distributions under a Common Form of Partial Data Observability

By:	John Mullahy
Abstract:	This paper suggests analytical strategies for obtaining informative parameter bounds when multivariate health-outcome data are partially observed in a particular yet common manner. One familiar context is where M>1 health outcomes' respective totals across N>1 time periods are observed but where questions of interest involve features—probabilities, moments, etc.—of their unobserved joint distribution at each of the N time periods. For instance, one might wish to understand the distribution of any type of unhealthy day experienced over a month but have access only to the separate totals of physically unhealthy and mentally unhealthy days that are experienced. After demonstrating methods to bound, or partially identify, such distributions and related parameters under several sampling assumptions, the paper proceeds to derive bounds on partial effects involving exogenous covariates. These results are applied in three empirical exercises. Whether the proposed bounds prove to be sufficiently narrow to usefully inform decisionmakers can only be determined in context, although it is suggested in the paper's conclusion that the issues considered in this paper are likely to become increasingly important for analysts.
JEL:	C25 I1
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:nbr:nberwo:26011&r=all

Forecasting security's volatility using low-frequency historical data, high-frequency historical data and option-implied volatility

By:	Huiling Yuan; Yong Zhou; Zhiyuan Zhang; Xiangyu Cui
Abstract:	Low-frequency historical data, high-frequency historical data and option data are three major sources, which can be used to forecast the underlying security's volatility. In this paper, we propose two econometric models, which integrate three information sources. In GARCH-It\^{o}-OI model, we assume that the option-implied volatility can influence the security's future volatility, and the option-implied volatility is treated as an observable exogenous variable. In GARCH-It\^{o}-IV model, we assume that the option-implied volatility can not influence the security's volatility directly, and the relationship between the option-implied volatility and the security's volatility is constructed to extract useful information of the underlying security. After providing the quasi-maximum likelihood estimators for the parameters and establishing their asymptotic properties, we also conduct a series of simulation analysis and empirical analysis to compare the proposed models with other popular models in the literature. We find that when the sampling interval of the high-frequency data is 5 minutes, the GARCH-It\^{o}-OI model and GARCH-It\^{o}-IV model has better forecasting performance than other models.
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1907.02666&r=all

Dynamic Factor Models

By:	Catherine Doz (Paris School of Economics and University Paris); Peter Fuleky (Department of Economics, University of Hawaii at Manoa, UHERO)
Abstract:	Dynamic factor models are parsimonious representations of relationships among time series variables. With the surge in data availability, they have proven to be indispensable in macroeconomic forecasting. This chapter surveys the evolution of these models from their pre-big-data origins to the large-scale models of recent years. We review the associated estimation theory, forecasting approaches, and several extensions of the basic framework.
Keywords:	dynamic factor models, big data, two-step estimation, time domain, frequency domain, structural breaks
JEL:	C32 C38 C53
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:hae:wpaper:2019-4&r=all

Policy Targeting under Network Interference

By:	Davide Viviano
Abstract:	The empirical analysis of experiments and quasi-experiments often seeks to determine the optimal allocation of treatments that maximizes social welfare. In the presence of interference, spillover effects lead to a new formulation of the statistical treatment choice problem. This paper develops a novel method to construct individual-specific optimal treatment allocation rules under network interference. Several features make the proposed methodology particularly appealing for applications: we construct targeting rules that depend on an arbitrary set of individual, neighbors' and network characteristics, and we allow for general constraints on the policy function; we consider heterogeneous direct and spillover effects, arbitrary, possibly non-linear, regression models, and we propose estimators that are robust to model misspecification; the method flexibly accommodates for cases where researchers only observe local information of the network. From a theoretical perspective, we establish the first set of guarantees on the utilitarian regret under interference, and we show that it achieves the min-max optimal rate in scenarios of practical and theoretical interest. We discuss the empirical performance in simulations and we illustrate our method by investigating the role of social networks in micro-finance decisions.
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1906.10258&r=all

Treatment Effects with Heterogeneous Externalities

By:	Arduini, Tiziano; Patacchini, Eleonora; Rainone, Edoardo
Abstract:	This paper proposes a new method for estimating heterogeneous externalities in policy analysis when social interactions take the linear-in-means form. We establish that the parameters of interest can be identified and consistently estimated using specific functions of the share of the eligible population. We also study the finite sample performance of the proposed estimators using Monte Carlo simulations. The method is illustrated using data on the PROGRESA program. We find that more than 50 percent of the effects of the program on schooling attendance are due to externalities, which are heterogeneous within and between poor and nonpoor households.
Keywords:	Indirect Treatment Eï¬?ec; Program evaluation; Two-Stage Least Squares
JEL:	C13 C21 D62
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:cpr:ceprdp:13781&r=all

Realized Volatility Forecasting: Robustness to Measurement Errors

By:	Fabrizio Cipollini (Dipartimento di Statistica, Informatica, Applicazioni "G. Parenti", UniversitÃ di Firenze); Giampiero M. Gallo (Corte dei Conti and NYU in Florence, Italy); Edoardo Otranto (UniversitÃ di Messina, Italy)
Abstract:	In this paper, we reconsider the issue of measurement errors affecting the estimates of a dynamic model for the conditional expectation of realized variance arguing that heteroskedasticity of such errors may be adequately represented with a multiplicative error model. Empirically we show that the significance of quarticity/quadratic terms capturing attenuation bias is very important within an HAR model, but is greatly diminished within an AMEM, and more so when regime specific dynamics account for a faster mean reversion when volatility is high. Model Confidence Sets confirm such robustness both inâ€“ and outâ€“ofâ€“sample.
Keywords:	Realized volatility, Forecasting, Measurement errors, HAR, AMEM, Markov switching, Volatility of volatility
JEL:	C22 C51 C53 C58
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:fir:econom:wp2019_04&r=all

Profile-Score Adjustments for Incidental-Parameter Problems

By:	Dhaene, G.; Jochmans, K.
Abstract:	We propose a scheme of iterative adjustments to the profile score to deal with incidental-parameter bias in models for stratified data with few observations on a large number of strata. The first-order adjustment is based on a calculation of the profile-score bias and evaluation of this bias at maximum-likelihood estimates of the incidental parameters. If the bias does not depend on the incidental parameters, the first-order adjusted profile score is fully recentered, solving the incidental-parameter problem. Otherwise, it is approximately recentered, alleviating the incidental-parameter problem. In the latter case, the adjustment can be iterated to give higher-order adjustments, possibly until convergence. The adjustments are generally applicable (e.g. not requiring parameter orthogonality) and lead to estimates that generally improve on maximum likelihood. We examine a range of nonlinear models with covariates. In many of them, we obtain an adjusted profile score that is exactly unbiased. In the others, we obtain approximate bias adjustments that yield much improved estimates, relative to maximum likelihood, even when there are only two observations per stratum.
Keywords:	adjusted profile score, bias reduction, incidental parameters
Date:	2019–06–25
URL:	http://d.repec.org/n?u=RePEc:cam:camdae:1959&r=all

Full Information Estimation of Household Income Risk and Consumption Insurance

By:	Arpita Chatterjee (UNSW Business School, UNSW); James Morley (University of Sydney); Aarti Singh (University of Sydney)
Abstract:	Blundell, Pistaferri, and Preston (2008) report an estimate of household consumption insurance with respect to permanent income shocks of 36%. Their estimate is distorted by an error in their code and is not robust to weighting scheme for GMM. We propose instead to use quasi maximum likelihood estimation (QMLE), which produces a more precise and signiﬁcantly higher estimate of consumption insurance at 55%. For sub-groups by age and education, diﬀerences between estimates are even more pronounced. Monte Carlo experiments with non-Normal shocks demonstrate that QMLE is more accurate than GMM, especially given a smaller sample size.
Keywords:	consumption insurance; weighting schemes; quasi maximum likelihood
JEL:	E21 C13 C33
Date:	2019–07
URL:	http://d.repec.org/n?u=RePEc:swe:wpaper:2019-07&r=all

Empirical Process Results for Exchangeable Arrays

By:	Laurent Davezies; Xavier D'Haultfoeuille; Yannick Guyonvarch
Abstract:	Exchangeable arrays are natural ways to model common forms of dependence between units of a sample. Jointly exchangeable arrays are well suited to dyadic data, where observed random variables are indexed by two units from the same population. Examples include trade flows between countries or relationships in a network. Separately exchangeable arrays are well suited to multiway clustering, where units sharing the same cluster (e.g. geographical areas or sectors of activity when considering individual wages) may be dependent in an unrestricted way. We prove uniform laws of large numbers and central limit theorems for such exchangeable arrays. We obtain these results under the same moment restrictions and conditions on the class of functions as with i.i.d. data. As a result, convergence and asymptotic normality of nonlinear estimators can be obtained under the same regularity conditions as with i.i.d. data. We also show the convergence of bootstrap processes adapted to such arrays.
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1906.11293&r=all

Dynamic time series clustering via volatility change-points

By:	Nick Whiteley
Abstract:	This note outlines a method for clustering time series based on a statistical model in which volatility shifts at unobserved change-points. The model accommodates some classical stylized features of returns and its relation to GARCH is discussed. Clustering is performed using a probability metric evaluated between posterior distributions of the most recent change-point associated with each series. This implies series are grouped together at a given time if there is evidence the most recent shifts in their respective volatilities were coincident or closely timed. The clustering method is dynamic, in that groupings may be updated in an online manner as data arrive. Numerical results are given analyzing daily returns of constituents of the S&P 500.
Date:	2019–06
URL:	http://d.repec.org/n?u=RePEc:arx:papers:1906.10372&r=all

Selection between Exponential and Lindley distributions

By:	Shovan Chowdhury (Indian Institute of Management, Kozhikode)
Abstract:	Exponential and Lindley distributions are quite effective in analyzing positively skeweddata. While the distributions exhibit some of the distinguishable characteristics, these arealso very close to each other for certain ranges of the parameter values. In this paper, weintend to discriminate between the exponential and Lindley distribution functions consid-ering the ratio of the maximized likelihood functions. The asymptotic distribution of thelogarithm of the maximized likelihood ratio has been obtained to determine the minimumsample size required to discriminate between the two distributions for given probability ofcorrect selection and a distance measure. Some numerical results are obtained to validatethe asymptotic results. It is also observed that the asymptotic results work quite well evenfor small sample size. One data analysis is performed to demonstrate the results.
Keywords:	Asymptotic distribution; Likelihood ratio test; Probability of correctselection; Kolmogrov-Smirnov distance; Lindley distribution
Date:	2019–03
URL:	http://d.repec.org/n?u=RePEc:iik:wpaper:316&r=all

Causal Inference

By: LeRoy, Stephen F.

Keywords: Physical Sciences and Mathematics

Date: 2019–07–10

URL: http://d.repec.org/n?u=RePEc:cdl:ucsbec:qt6pc1x9r6&r=all

This nep-ecm issue is ©2019 by Sune Karlsson. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.

General information on the NEP project can be found at http://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.

NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.

By:	LeRoy, Stephen F.
Keywords:	Physical Sciences and Mathematics
Date:	2019–07–10
URL:	http://d.repec.org/n?u=RePEc:cdl:ucsbec:qt6pc1x9r6&r=all