My bibliography Save this paper

GAM(L)A: An econometric model for interpretable Machine Learning

Author

Listed:

Emmanuel Flachaire
Gilles Hacheme
Sullivan Hu'e
S'ebastien Laurent

Emmanuel Flachaire

Abstract

Despite their high predictive performance, random forest and gradient boosting are often considered as black boxes or uninterpretable models which has raised concerns from practitioners and regulators. As an alternative, we propose in this paper to use partial linear models that are inherently interpretable. Specifically, this article introduces GAM-lasso (GAMLA) and GAM-autometrics (GAMA), denoted as GAM(L)A in short. GAM(L)A combines parametric and non-parametric functions to accurately capture linearities and non-linearities prevailing between dependent and explanatory variables, and a variable selection procedure to control for overfitting issues. Estimation relies on a two-step procedure building upon the double residual method. We illustrate the predictive performance and interpretability of GAM(L)A on a regression and a classification problem. The results show that GAM(L)A outperforms parametric models augmented by quadratic, cubic and interaction effects. Moreover, the results also suggest that the performance of GAM(L)A is not significantly different from that of random forest and gradient boosting.

Suggested Citation

Emmanuel Flachaire & Gilles Hacheme & Sullivan Hu'e & S'ebastien Laurent, 2022. "GAM(L)A: An econometric model for interpretable Machine Learning," Papers 2203.11691, arXiv.org.

Handle: RePEc:arx:papers:2203.11691

Download full text from publisher

References listed on IDEAS

Castle Jennifer L. & Doornik Jurgen A & Hendry David F., 2011. "Evaluating Automatic Model Selection," Journal of Time Series Econometrics, De Gruyter, vol. 3(1), pages 1-33, February.
- Jennifer Castle & David Hendry & Jurgen A. Doornik, 2010. "Evaluating Automatic Model Selection," Economics Series Working Papers 474, University of Oxford, Department of Economics.
Christophe Hurlin & Christophe Perignon & Sébastien Saurin, 2021. "The Fairness of Credit Scoring Models," Working Papers hal-03501452, HAL.
- Christophe HURLIN & Christophe PERIGNON & Sébastien SAURIN, 2021. "The Fairness of Credit Scoring Models," LEO Working Papers / DR LEO 2912, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
- Christophe Hurlin & Christophe P'erignon & S'ebastien Saurin, 2022. "The Fairness of Credit Scoring Models," Papers 2205.10200, arXiv.org, revised Feb 2024.
- Hurlin, Christophe & Pérignon, Christophe & Saurin, Sébastien, 2021. "The Fairness of Credit Scoring Models," HEC Research Papers Series 1411, HEC Paris.
Christophe Hurlin & Christophe Pérignon, 2019. "Machine learning et nouvelles sources de données pour le scoring de crédit," Revue d'économie financière, Association d'économie financière, vol. 0(3), pages 21-50.
- Christophe Hurlin & Christophe Pérignon, 2019. "Machine Learning et nouvelles sources de données pour le scoring de crédit," Working Papers halshs-02377886, HAL.
- Christophe HURLIN & Christophe PERIGNON, 2019. "Machine Learning et nouvelles sources de données pour le scoring de crédit," LEO Working Papers / DR LEO 2739, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
- Christophe Hurlin & Christophe Pérignon, 2019. "Machine learning et nouvelles sources de données pour le scoring de crédit," Post-Print hal-03532418, HAL.
Zou, Hui, 2006. "The Adaptive Lasso and Its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 101, pages 1418-1429, December.
Harrison, David Jr. & Rubinfeld, Daniel L., 1978. "Hedonic housing prices and the demand for clean air," Journal of Environmental Economics and Management, Elsevier, vol. 5(1), pages 81-102, March.
Robinson, Peter M, 1988. "Root- N-Consistent Semiparametric Regression," Econometrica, Econometric Society, vol. 56(4), pages 931-954, July.
Hendry, David F., 2000. "Econometrics: Alchemy or Science?: Essays in Econometric Methodology," OUP Catalogue, Oxford University Press, number 9780198293545.
Gunnarsson, Björn Rafn & vanden Broucke, Seppe & Baesens, Bart & Óskarsdóttir, María & Lemahieu, Wilfried, 2021. "Deep learning for credit scoring: Do or don’t?," European Journal of Operational Research, Elsevier, vol. 295(1), pages 292-305.
White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
B Baesens & T Van Gestel & S Viaene & M Stepanova & J Suykens & J Vanthienen, 2003. "Benchmarking state-of-the-art classification algorithms for credit scoring," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 54(6), pages 627-635, June.
Søren Johansen & Bent Nielsen, 2016. "Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 321-348, June.
Desai, Vijay S. & Crook, Jonathan N. & Overstreet, George A., 1996. "A comparison of neural networks and linear scoring models in the credit union environment," European Journal of Operational Research, Elsevier, vol. 95(1), pages 24-37, November.
Bertrand Candelon & Elena-Ivona Dumitrescu & Christophe Hurlin, 2012. "How to Evaluate an Early-Warning System: Toward a Unified Statistical Framework for Assessing Financial Crises Forecasting Methods," IMF Economic Review, Palgrave Macmillan;International Monetary Fund, vol. 60(1), pages 75-113, April.
- Candelon, B. & Dumitrescu, E-I. & Hurlin, C., 2010. "How to evaluate an early warning system? Towards a united statistical framework for assessing financial crises forecasting methods," Research Memorandum 046, Maastricht University, Maastricht Research School of Economics of Technology and Organization (METEOR).
- Bertrand Candelon & Elena Ivona Dumitrescu & Christophe Hurlin, 2012. "How to Evaluate an Early Warning System? Towards a Unified Statistical Framework for Assessing Financial Crises Forecasting Methods," Post-Print hal-01385900, HAL.
Paleologo, Giuseppe & Elisseeff, André & Antonini, Gianluca, 2010. "Subagging for credit scoring models," European Journal of Operational Research, Elsevier, vol. 201(2), pages 490-499, March.
Bracke, Philippe & Datta, Anupam & Jung, Carsten & Sen, Shayak, 2019. "Machine learning explainability in finance: an application to default risk analysis," Bank of England working papers 816, Bank of England.
Daniel W. Apley & Jingyu Zhu, 2020. "Visualizing the effects of predictor variables in black box supervised learning models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 82(4), pages 1059-1086, September.
Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2018. "Econometrics and Machine Learning," Economie et Statistique / Economics and Statistics, Institut National de la Statistique et des Etudes Economiques (INSEE), issue 505-506, pages 147-169.
- Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2018. "Econometrics and Machine Learning," Post-Print hal-02163979, HAL.
Søren Johansen & Bent Nielsen, 2016. "Rejoinder: Asymptotic Theory of Outlier Detection Algorithms for Linear Time Series Regression Models," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 43(2), pages 374-381, June.
Vincent Boucher & Yann Bramoullé, 2020. "Binary Outcomes and Linear Interactions," AMSE Working Papers 2038, Aix-Marseille School of Economics, France.
- Vincent Boucher & Yann Bramoullé, 2020. "Binary Outcomes and Linear Interactions," Working Papers halshs-03031767, HAL.
- BramoullÃ©, Yann & Boucher, Vincent, 2020. "Binary Outcomes and Linear Interactions," CEPR Discussion Papers 15505, C.E.P.R. Discussion Papers.
Michael C. Lovell, 1963. "Seasonal Adjustment of Economic Time Series and Multiple Regression," Cowles Foundation Discussion Papers 151, Cowles Foundation for Research in Economics, Yale University.
Godfrey, Leslie G, 1978. "Testing for Higher Order Serial Correlation in Regression Equations When the Regressors Include Lagged Dependent Variables," Econometrica, Econometric Society, vol. 46(6), pages 1303-1310, November.
Peter R. Hansen & Asger Lunde & James M. Nason, 2011. "The Model Confidence Set," Econometrica, Econometric Society, vol. 79(2), pages 453-497, March.
- Peter R. Hansen & Asger Lunde & James M. Nason, 2010. "The Model Confidence Set," CREATES Research Papers 2010-76, Department of Economics and Business Economics, Aarhus University.
Finlay, Steven, 2011. "Multiple classifier architectures and their application to credit risk assessment," European Journal of Operational Research, Elsevier, vol. 210(2), pages 368-378, April.
David F. Hendry & Søren Johansen, 2011. "The Properties of Model Selection when Retaining Theory Variables," Discussion Papers 11-25, University of Copenhagen. Department of Economics.
- David F. Hendry & Søren Johansen, 2011. "The Properties of Model Selection when Retaining Theory Variables," CREATES Research Papers 2011-36, Department of Economics and Business Economics, Aarhus University.
Lessmann, Stefan & Baesens, Bart & Seow, Hsin-Vonn & Thomas, Lyn C., 2015. "Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research," European Journal of Operational Research, Elsevier, vol. 247(1), pages 124-136.
Jurgen A. Doornik & Henrik Hansen, 2008. "An Omnibus Test for Univariate and Multivariate Normality," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 70(s1), pages 927-939, December.
- Jurgen A Doornik & Henrik Hansen, "undated". "An omnibus test for univariate and multivariate normalit," Economics Papers W4&91., Economics Group, Nuffield College, University of Oxford.
Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
- Elena Ivona Dumitrescu & Sullivan Hué & Christophe Hurlin & Sessi Tokpavi, 2022. "Machine Learning for Credit Scoring: Improving Logistic Regression with Non Linear Decision Tree Effects," Post-Print hal-03331114, HAL.
Kozodoi, Nikita & Jacob, Johannes & Lessmann, Stefan, 2022. "Fairness in credit scoring: Assessment, implementation and profit implications," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1083-1094.
Hal R. Varian, 2014. "Big Data: New Tricks for Econometrics," Journal of Economic Perspectives, American Economic Association, vol. 28(2), pages 3-28, Spring.
Nikita Kozodoi & Johannes Jacob & Stefan Lessmann, 2021. "Fairness in Credit Scoring: Assessment, Implementation and Profit Implications," Papers 2103.01907, arXiv.org, revised Jun 2022.
Arthur Charpentier & Emmanuel Flachaire & Antoine Ly, 2017. "Econom\'etrie et Machine Learning," Papers 1708.06992, arXiv.org, revised Mar 2018.
Castle, Jennifer L. & Hendry, David F., 2010. "A low-dimension portmanteau test for non-linearity," Journal of Econometrics, Elsevier, vol. 158(2), pages 231-245, October.
- Jennifer Castle & David Hendry, 2010. "A Low-Dimension Portmanteau Test for Non-linearity," Economics Series Working Papers 471, University of Oxford, Department of Economics.
Castle, Jennifer L. & Clements, Michael P. & Hendry, David F., 2013. "Forecasting by factors, by variables, by both or neither?," Journal of Econometrics, Elsevier, vol. 177(2), pages 305-319.
Fan J. & Li R., 2001. "Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties," Journal of the American Statistical Association, American Statistical Association, vol. 96, pages 1348-1360, December.
Engle, Robert F, 1982. "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation," Econometrica, Econometric Society, vol. 50(4), pages 987-1007, July.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Sullivan Hué, 2022. "GAM(L)A: An econometric model for interpretable machine learning," French Stata Users' Group Meetings 2022 19, Stata Users Group.
Elena Ivona DUMITRESCU & Sullivan HUE & Christophe HURLIN & Sessi TOKPAVI, 2020. "Machine Learning or Econometrics for Credit Scoring: Let’s Get the Best of Both Worlds," LEO Working Papers / DR LEO 2839, Orleans Economics Laboratory / Laboratoire d'Economie d'Orleans (LEO), University of Orleans.
- Elena Dumitrescu & Sullivan Hué & Christophe Hurlin & Sessi Tokpavi, 2021. "Machine Learning or Econometrics for Credit Scoring: Let's Get the Best of Both Worlds," Working Papers hal-02507499, HAL.
Dumitrescu, Elena & Hué, Sullivan & Hurlin, Christophe & Tokpavi, Sessi, 2022. "Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects," European Journal of Operational Research, Elsevier, vol. 297(3), pages 1178-1192.
- Elena Ivona Dumitrescu & Sullivan Hué & Christophe Hurlin & Sessi Tokpavi, 2022. "Machine Learning for Credit Scoring: Improving Logistic Regression with Non Linear Decision Tree Effects," Post-Print hal-03331114, HAL.
David F. Hendry & Grayham E. Mizon, 2016. "Improving the teaching of econometrics," Cogent Economics & Finance, Taylor & Francis Journals, vol. 4(1), pages 1170096-117, December.
- David Hendry & Grayham E. Mizon, 2016. "Improving the Teaching of Econometrics," Economics Series Working Papers 785, University of Oxford, Department of Economics.
Jurgen A. Doornik & David F. Hendry & Steve Cook, 2015. "Statistical model selection with “Big Data”," Cogent Economics & Finance, Taylor & Francis Journals, vol. 3(1), pages 1045216-104, December.
- David Hendry & Jurgen A. Doornik, 2014. "Statistical Model Selection with 'Big Data'," Economics Series Working Papers 735, University of Oxford, Department of Economics.
Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2023. "Robust Discovery of Regression Models," Econometrics and Statistics, Elsevier, vol. 26(C), pages 31-51.
- Jennifer L. Castle & Jurgen A. Doornik & David F. Hendry, 2020. "Robust Discovery of Regression Models," Economics Papers 2020-W04, Economics Group, Nuffield College, University of Oxford.
Tigges, Maximilian & Mestwerdt, Sönke & Tschirner, Sebastian & Mauer, René, 2024. "Who gets the money? A qualitative analysis of fintech lending and credit scoring through the adoption of AI and alternative data," Technological Forecasting and Social Change, Elsevier, vol. 205(C).
Castle, Jennifer L. & Doornik, Jurgen A. & Hendry, David F., 2021. "Modelling non-stationary ‘Big Data’," International Journal of Forecasting, Elsevier, vol. 37(4), pages 1556-1575.
- Jennifer Castle & Jurgen Doornik & David Hendry, 2020. "Modelling Non-stationary 'Big Data'," Economics Series Working Papers 905, University of Oxford, Department of Economics.
Ericsson Neil R., 2016. "Testing for and estimating structural breaks and other nonlinearities in a dynamic monetary sector," Studies in Nonlinear Dynamics & Econometrics, De Gruyter, vol. 20(4), pages 377-398, September.
Jennifer Castle & David Hendry, 2013. "Semi-automatic Non-linear Model selection," Economics Series Working Papers 654, University of Oxford, Department of Economics.
David F. Hendry & Felix Pretis, 2013. "Anthropogenic influences on atmospheric CO2," Chapters, in: Roger Fouquet (ed.), Handbook on Energy and Climate Change, chapter 12, pages 287-326, Edward Elgar Publishing.
- David Hendry & Felix Pretis, 2011. "Anthropogenic Influences on Atmospheric CO2," Economics Series Working Papers 584, University of Oxford, Department of Economics.
Chen, Yujia & Calabrese, Raffaella & Martin-Barragan, Belen, 2024. "Interpretable machine learning for imbalanced credit scoring datasets," European Journal of Operational Research, Elsevier, vol. 312(1), pages 357-372.
Dangxing Chen & Weicheng Ye & Jiahui Ye, 2022. "Interpretable Selective Learning in Credit Risk," Papers 2209.10127, arXiv.org.
Ericsson, Neil R., 2016. "Eliciting GDP forecasts from the FOMC’s minutes around the financial crisis," International Journal of Forecasting, Elsevier, vol. 32(2), pages 571-583.
- Neil R. Ericsson, 2015. "Eliciting GDP Forecasts from the FOMC’s Minutes Around the Financial Crisis," International Finance Discussion Papers 1152, Board of Governors of the Federal Reserve System (U.S.).
- Neil R. Ericsson, 2015. "Eliciting GDP Forecasts from the FOMC’s Minutes Around the Financial Crisis," Working Papers 2015-003, The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting.
Camila Epprecht & Dominique Guegan & Álvaro Veiga, 2013. "Comparing variable selection techniques for linear regression: LASSO and Autometrics," Documents de travail du Centre d'Economie de la Sorbonne 13080, Université Panthéon-Sorbonne (Paris 1), Centre d'Economie de la Sorbonne.
Jennifer L. Castle & David F. Hendry & Andrew B. Martinez, 2022. "The Historical Role of Energy in UK Inflation and Productivity and Implications for Price Inflation in 2022," Working Papers 2022-001, The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting.
- Jennifer L. Castle & David F. Hendry & Andrew B. Martinez, 2022. "The historical role of energy in UK inflation and productivity and implications for price inflation in 2022," Economics Series Working Papers 983, University of Oxford, Department of Economics.
David F. Hendry, 2020. "First in, First out: Econometric Modelling of UK Annual CO_2 Emissions, 1860–2017," Economics Papers 2020-W02, Economics Group, Nuffield College, University of Oxford.
Camila Epprecht & Dominique Guegan & Álvaro Veiga & Joel Correa da Rosa, 2017. "Variable selection and forecasting via automated methods for linear models: LASSO/adaLASSO and Autometrics," Université Paris1 Panthéon-Sorbonne (Post-Print and Working Papers) halshs-00917797, HAL.
Stillwagon, Josh R., 2016. "Non-linear exchange rate relationships: An automated model selection approach with indicator saturation," The North American Journal of Economics and Finance, Elsevier, vol. 37(C), pages 84-109.
- Josh R. Stillwagon, 2014. "Non-Linear Exchange Rate Relationships: An Automated Model Selection Approach with Indicator Saturation," Working Papers 1405, Trinity College, Department of Economics.
Andrew B. Martinez, 2020. "Forecast Accuracy Matters for Hurricane Damage," Econometrics, MDPI, vol. 8(2), pages 1-24, May.
- Andrew B. Martinez, 2020. "Forecast Accuracy Matters for Hurricane Damages," Working Papers 2020-003, The George Washington University, Department of Economics, H. O. Stekler Research Program on Forecasting.

More about this item

NEP fields

This paper has been announced in the following NEP Reports:

NEP-BIG-2022-04-18 (Big Data)
NEP-CMP-2022-04-18 (Computational Economics)
NEP-ECM-2022-04-18 (Econometrics)

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2203.11691. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

GAM(L)A: An econometric model for interpretable Machine Learning

Author

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

NEP fields

Statistics

Corrections

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data