[go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/p/nbr/nberwo/26517.html
   My bibliography  Save this paper

Text Selection

Author

Listed:
  • Bryan T. Kelly
  • Asaf Manela
  • Alan Moreira
Abstract
Text data is ultra-high dimensional, which makes machine learning techniques indispensable for textual analysis. Text is often selected—journalists, speechwriters, and others craft messages to target their audiences’ limited attention. We develop an economically motivated high dimensional selection model that improves learning from text (and from sparse counts data more generally). Our model is especially useful when the choice to include a phrase is more interesting than the choice of how frequently to repeat it. It allows for parallel estimation, making it computationally scalable. A first application revisits the partisanship of US congressional speech. We find that earlier spikes in partisanship manifested in increased repetition of different phrases, whereas the upward trend starting in the 1990s is due to entirely distinct phrase selection. Additional applications show how our model can backcast, nowcast, and forecast macroeconomic indicators using newspaper text, and that it substantially improves out-of-sample fit relative to alternative approaches.

Suggested Citation

  • Bryan T. Kelly & Asaf Manela & Alan Moreira, 2019. "Text Selection," NBER Working Papers 26517, National Bureau of Economic Research, Inc.
  • Handle: RePEc:nbr:nberwo:26517
    Note: AP
    as

    Download full text from publisher

    File URL: http://www.nber.org/papers/w26517.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Giannone, Domenico & Reichlin, Lucrezia & Small, David, 2008. "Nowcasting: The real-time informational content of macroeconomic data," Journal of Monetary Economics, Elsevier, vol. 55(4), pages 665-676, May.
    2. A. Chudik & G. Kapetanios & M. Hashem Pesaran, 2018. "A One Covariate at a Time, Multiple Testing Approach to Variable Selection in High‐Dimensional Linear Regression Models," Econometrica, Econometric Society, vol. 86(4), pages 1479-1512, July.
    3. Zhigu He & Arvind Krishnamurthy, 2012. "A Model of Capital and Crises," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 79(2), pages 735-777.
    4. Boot, Tom & Nibbering, Didier, 2019. "Forecasting using random subspace methods," Journal of Econometrics, Elsevier, vol. 209(2), pages 391-406.
    5. Lubos Pástor & Pietro Veronesi, 2012. "Uncertainty about Government Policy and Stock Prices," Journal of Finance, American Finance Association, vol. 67(4), pages 1219-1264, August.
    6. Matthew Gentzkow & Jesse M. Shapiro, 2006. "Media Bias and Reputation," Journal of Political Economy, University of Chicago Press, vol. 114(2), pages 280-316, April.
    7. A. Belloni & D. Chen & V. Chernozhukov & C. Hansen, 2012. "Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain," Econometrica, Econometric Society, vol. 80(6), pages 2369-2429, November.
    8. Alexandre Belloni & Victor Chernozhukov & Christian Hansen, 2014. "Inference on Treatment Effects after Selection among High-Dimensional Controlsâ€," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 81(2), pages 608-650.
    9. Scott R. Baker & Nicholas Bloom & Steven J. Davis, 2016. "Measuring Economic Policy Uncertainty," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(4), pages 1593-1636.
    10. Mullahy, John, 1986. "Specification and testing of some modified count data models," Journal of Econometrics, Elsevier, vol. 33(3), pages 341-365, December.
    11. Kathleen Weiss Hanley & Gerard Hoberg, 2019. "Dynamic Interpretation of Emerging Risks in the Financial Sector," The Review of Financial Studies, Society for Financial Studies, vol. 32(12), pages 4543-4603.
    12. Tobias Adrian & Erkko Etula & Tyler Muir, 2014. "Financial Intermediaries and the Cross-Section of Asset Returns," Journal of Finance, American Finance Association, vol. 69(6), pages 2557-2596, December.
    13. Reichlin, Lucrezia & Giannone, Domenico & Small, David, 2005. "Nowcasting GDP and Inflation: The Real Time Informational Content of Macroeconomic Data Releases," CEPR Discussion Papers 5178, C.E.P.R. Discussion Papers.
    14. Ruben Durante & Ekaterina Zhuravskaya, 2018. "Attack When the World Is Not Watching? US News and the Israeli-Palestinian Conflict," Journal of Political Economy, University of Chicago Press, vol. 126(3), pages 1085-1133.
    15. Gerard Hoberg & Gordon Phillips, 2016. "Text-Based Network Industries and Endogenous Product Differentiation," Journal of Political Economy, University of Chicago Press, vol. 124(5), pages 1423-1465.
    16. James H. Stock & Mark W. Watson, 2012. "Generalized Shrinkage Methods for Forecasting Using Many Predictors," Journal of Business & Economic Statistics, Taylor & Francis Journals, vol. 30(4), pages 481-493, June.
    17. Cepni, Oguzhan & Güney, I. Ethem & Swanson, Norman R., 2019. "Nowcasting and forecasting GDP in emerging markets using global financial and macroeconomic diffusion indexes," International Journal of Forecasting, Elsevier, vol. 35(2), pages 555-572.
    18. Bryan Kelly & Seth Pruitt, 2013. "Market Expectations in the Cross-Section of Present Values," Journal of Finance, American Finance Association, vol. 68(5), pages 1721-1756, October.
    19. Hodrick, Robert J, 1992. "Dividend Yields and Expected Stock Returns: Alternative Procedures for Inference and Measurement," The Review of Financial Studies, Society for Financial Studies, vol. 5(3), pages 357-386.
    20. Shihao Gu & Bryan Kelly & Dacheng Xiu, 2020. "Empirical Asset Pricing via Machine Learning," The Review of Financial Studies, Society for Financial Studies, vol. 33(5), pages 2223-2273.
    21. Xavier Gabaix, 2014. "A Sparsity-Based Model of Bounded Rationality," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1661-1710.
    22. A. Belloni & V. Chernozhukov & I. Fernández‐Val & C. Hansen, 2017. "Program Evaluation and Causal Inference With High‐Dimensional Data," Econometrica, Econometric Society, vol. 85, pages 233-298, January.
    23. Lily Fang & Joel Peress, 2009. "Media Coverage and the Cross‐section of Stock Returns," Journal of Finance, American Finance Association, vol. 64(5), pages 2023-2052, October.
    24. Matthew Gentzkow & Jesse M. Shapiro & Matt Taddy, 2019. "Measuring Group Differences in High‐Dimensional Choices: Method and Application to Congressional Speech," Econometrica, Econometric Society, vol. 87(4), pages 1307-1340, July.
    25. Jiang, Fuwei & Lee, Joshua & Martin, Xiumin & Zhou, Guofu, 2019. "Manager sentiment and stock returns," Journal of Financial Economics, Elsevier, vol. 132(1), pages 126-149.
    26. Heckman, James, 2013. "Sample selection bias as a specification error," Applied Econometrics, Russian Presidential Academy of National Economy and Public Administration (RANEPA), vol. 31(3), pages 129-137.
    27. Kim, Hyun Hak & Swanson, Norman R., 2014. "Forecasting financial and macroeconomic variables using data reduction methods: New empirical evidence," Journal of Econometrics, Elsevier, vol. 178(P2), pages 352-367.
    28. Markus K. Brunnermeier & Yuliy Sannikov, 2014. "A Macroeconomic Model with a Financial Sector," American Economic Review, American Economic Association, vol. 104(2), pages 379-421, February.
    29. Jon Kleinberg & Jens Ludwig & Sendhil Mullainathan & Ziad Obermeyer, 2015. "Prediction Policy Problems," American Economic Review, American Economic Association, vol. 105(5), pages 491-495, May.
    30. Greene, William, 2007. "Functional Form and Heterogeneity in Models for Count Data," Foundations and Trends(R) in Econometrics, now publishers, vol. 1(2), pages 113-218, August.
    31. Diebold, Francis X & Mariano, Roberto S, 2002. "Comparing Predictive Accuracy," Journal of Business & Economic Statistics, American Statistical Association, vol. 20(1), pages 134-144, January.
    32. Friedman, Jerome H. & Hastie, Trevor & Tibshirani, Rob, 2010. "Regularization Paths for Generalized Linear Models via Coordinate Descent," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 33(i01).
    33. Ralph S. J. Koijen & Motohiro Yogo, 2019. "A Demand System Approach to Asset Pricing," Journal of Political Economy, University of Chicago Press, vol. 127(4), pages 1475-1515.
    34. Thomas Eisensee & David Strömberg, 2007. "News Droughts, News Floods, and U. S. Disaster Relief," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 122(2), pages 693-728.
    35. Gallant, A Ronald & Nychka, Douglas W, 1987. "Semi-nonparametric Maximum Likelihood Estimation," Econometrica, Econometric Society, vol. 55(2), pages 363-390, March.
    36. He, Zhiguo & Kelly, Bryan & Manela, Asaf, 2017. "Intermediary asset pricing: New evidence from many asset classes," Journal of Financial Economics, Elsevier, vol. 126(1), pages 1-35.
    37. Patrick Bajari & Denis Nekipelov & Stephen P. Ryan & Miaoyu Yang, 2015. "Machine Learning Methods for Demand Estimation," American Economic Review, American Economic Association, vol. 105(5), pages 481-485, May.
    38. Manela, Asaf & Moreira, Alan, 2017. "News implied volatility and disaster concerns," Journal of Financial Economics, Elsevier, vol. 123(1), pages 137-162.
    39. Susan Athey & Guido Imbens & Thai Pham & Stefan Wager, 2017. "Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges," American Economic Review, American Economic Association, vol. 107(5), pages 278-281, May.
    40. Sendhil Mullainathan & Jann Spiess, 2017. "Machine Learning: An Applied Econometric Approach," Journal of Economic Perspectives, American Economic Association, vol. 31(2), pages 87-106, Spring.
    41. Paul C. Tetlock, 2007. "Giving Content to Investor Sentiment: The Role of Media in the Stock Market," Journal of Finance, American Finance Association, vol. 62(3), pages 1139-1168, June.
    42. repec:bla:jfinan:v:59:y:2004:i:3:p:1259-1294 is not listed on IDEAS
    43. Manela, Asaf, 2014. "The value of diffusing information," Journal of Financial Economics, Elsevier, vol. 111(1), pages 181-199.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Philippe Goulet Coulombe & Maxime Leroux & Dalibor Stevanovic & Stéphane Surprenant, 2022. "How is machine learning useful for macroeconomic forecasting?," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 37(5), pages 920-964, August.
    2. Obaid, Khaled & Pukthuanthong, Kuntara, 2022. "A picture is worth a thousand words: Measuring investor sentiment by combining machine learning and photos from news," Journal of Financial Economics, Elsevier, vol. 144(1), pages 273-297.
    3. Dou, Winston Wei & Ji, Yan & Wu, Wei, 2021. "Competition, profitability, and discount rates," Journal of Financial Economics, Elsevier, vol. 140(2), pages 582-620.
    4. Manela, Asaf & Moreira, Alan, 2017. "News implied volatility and disaster concerns," Journal of Financial Economics, Elsevier, vol. 123(1), pages 137-162.
    5. Petropoulos, Fotios & Apiletti, Daniele & Assimakopoulos, Vassilios & Babai, Mohamed Zied & Barrow, Devon K. & Ben Taieb, Souhaib & Bergmeir, Christoph & Bessa, Ricardo J. & Bijak, Jakub & Boylan, Joh, 2022. "Forecasting: theory and practice," International Journal of Forecasting, Elsevier, vol. 38(3), pages 705-871.
      • Fotios Petropoulos & Daniele Apiletti & Vassilios Assimakopoulos & Mohamed Zied Babai & Devon K. Barrow & Souhaib Ben Taieb & Christoph Bergmeir & Ricardo J. Bessa & Jakub Bijak & John E. Boylan & Jet, 2020. "Forecasting: theory and practice," Papers 2012.03854, arXiv.org, revised Jan 2022.
    6. Anesti, Nikoleta & Kalamara, Eleni & Kapetanios, George, 2021. "Forecasting UK GDP growth with large survey panels," Bank of England working papers 923, Bank of England.
    7. Baur, Dirk G. & Smales, Lee A., 2020. "Hedging geopolitical risk with precious metals," Journal of Banking & Finance, Elsevier, vol. 117(C).
    8. Buffa, Andrea M. & Hodor, Idan, 2023. "Institutional investors, heterogeneous benchmarks and the comovement of asset prices," Journal of Financial Economics, Elsevier, vol. 147(2), pages 352-381.
    9. Ching Hsu & Tina Yu & Shu-Heng Chen, 2021. "Narrative economics using textual analysis of newspaper data: new insights into the U.S. Silver Purchase Act and Chinese price level in 1928–1936," Journal of Computational Social Science, Springer, vol. 4(2), pages 761-785, November.
    10. Kargar, Mahyar, 2021. "Heterogeneous intermediary asset pricing," Journal of Financial Economics, Elsevier, vol. 141(2), pages 505-532.
    11. Dylan Brewer & Alyssa Carlson, 2024. "Addressing sample selection bias for machine learning methods," Journal of Applied Econometrics, John Wiley & Sons, Ltd., vol. 39(3), pages 383-400, April.
    12. Pan, Zhiyuan & Zhong, Hao & Wang, Yudong & Huang, Juan, 2024. "Forecasting oil futures returns with news," Energy Economics, Elsevier, vol. 134(C).
    13. Hoang, Daniel & Wiegratz, Kevin, 2022. "Machine learning methods in finance: Recent applications and prospects," Working Paper Series in Economics 158, Karlsruhe Institute of Technology (KIT), Department of Economics and Management.
    14. García, Diego & Hu, Xiaowen & Rohrer, Maximilian, 2023. "The colour of finance words," Journal of Financial Economics, Elsevier, vol. 147(3), pages 525-549.
    15. Zongwu Cai & Pixiong Chen, 2022. "New Online Investor Sentiment and Asset Returns," WORKING PAPERS SERIES IN THEORETICAL AND APPLIED ECONOMICS 202216, University of Kansas, Department of Economics, revised Nov 2022.
    16. Alessandro Girardi & Roberto Golinelli & Carmine Pappalardo, 2017. "The role of indicator selection in nowcasting euro-area GDP in pseudo-real time," Empirical Economics, Springer, vol. 53(1), pages 79-99, August.
    17. Andres Algaba & David Ardia & Keven Bluteau & Samuel Borms & Kris Boudt, 2020. "Econometrics Meets Sentiment: An Overview Of Methodology And Applications," Journal of Economic Surveys, Wiley Blackwell, vol. 34(3), pages 512-547, July.
    18. Crocker H. Liu & Adam Nowak & Patrick S. Smith, 2018. "Does the Asset Pricing Premium Reflect Asymmetric or Incomplete Information?," Working Papers 18-06, Department of Economics, West Virginia University.
    19. Freire, Gustavo, 2021. "Tail risk and investors’ concerns: Evidence from Brazil," The North American Journal of Economics and Finance, Elsevier, vol. 58(C).
    20. Mr. Tobias Adrian & Peichu Xie, 2020. "The Non-U.S. Bank Demand for U.S. Dollar Assets," IMF Working Papers 2020/101, International Monetary Fund.

    More about this item

    JEL classification:

    • C1 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General
    • C4 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics
    • C55 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Large Data Sets: Modeling and Analysis
    • C58 - Mathematical and Quantitative Methods - - Econometric Modeling - - - Financial Econometrics
    • E17 - Macroeconomics and Monetary Economics - - General Aggregative Models - - - Forecasting and Simulation: Models and Applications
    • G12 - Financial Economics - - General Financial Markets - - - Asset Pricing; Trading Volume; Bond Interest Rates
    • G17 - Financial Economics - - General Financial Markets - - - Financial Forecasting and Simulation

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nbr:nberwo:26517. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: the person in charge (email available below). General contact details of provider: https://edirc.repec.org/data/nberrus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.