[go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/a/aea/jeclit/v58y2020i4p997-1044.html
   My bibliography  Save this article

How Well Do Automated Linking Methods Perform? Lessons from US Historical Data

Author

Listed:
  • Martha J. Bailey
  • Connor Cole
  • Morgan Henderson
  • Catherine Massey
Abstract
This paper reviews the literature in historical record linkage in the United States and examines the performance of widely used record-linking algorithms and common variations in their assumptions. We use two high-quality, hand-linked data sets and one synthetic ground truth to examine the direct effects of linking algorithms on data quality. We find that (i) no algorithm (including hand linking) consistently produces representative samples; (ii) 15 to 37 percent of links chosen by widely used algorithms are classified as errors by trained human reviewers; and (iii) false links are systematically related to baseline sample characteristics, showing that some algorithms may introduce systematic measurement error into analyses. A case study shows that the combined effects of (i)–(iii) attenuate estimates of the intergenerational income elasticity by up to 29 percent, and common variations in algorithm assumptions result in greater attenuation. As current practice moves to automate linking and increase link rates, these results highlight the important potential consequences of linking errors on inferences with linked data. We conclude with constructive suggestions for reducing linking errors and directions for future research.

Suggested Citation

  • Martha J. Bailey & Connor Cole & Morgan Henderson & Catherine Massey, 2020. "How Well Do Automated Linking Methods Perform? Lessons from US Historical Data," Journal of Economic Literature, American Economic Association, vol. 58(4), pages 997-1044, December.
  • Handle: RePEc:aea:jeclit:v:58:y:2020:i:4:p:997-1044
    DOI: 10.1257/jel.20191526
    as

    Download full text from publisher

    File URL: https://www.aeaweb.org/doi/10.1257/jel.20191526
    Download Restriction: no

    File URL: https://doi.org/10.3886/E119932V1
    Download Restriction: no

    File URL: https://www.aeaweb.org/doi/10.1257/jel.20191526.appx
    Download Restriction: no

    File URL: https://www.aeaweb.org/doi/10.1257/jel.20191526.ds
    Download Restriction: Access to full text is restricted to AEA members and institutional subscribers.

    File URL: https://libkey.io/10.1257/jel.20191526?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Catherine G. Massey, 2017. "Playing with matches: An assessment of accuracy in linked historical data," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(3), pages 129-143, July.
    2. Leah Platt Boustan & Matthew E. Kahn & Paul W. Rhode, 2012. "Moving to Higher Ground: Migration Response to Natural Disasters in the Early Twentieth Century," American Economic Review, American Economic Association, vol. 102(3), pages 238-244, May.
    3. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2012. "Europe's Tired, Poor, Huddled Masses: Self-Selection and Economic Outcomes in the Age of Mass Migration," American Economic Review, American Economic Association, vol. 102(5), pages 1832-1856, August.
    4. Leah Platt Boustan & Carola Frydman & Robert A. Margo, 2014. "Human Capital in History: The American Record," NBER Books, National Bureau of Economic Research, Inc, number bous12-1.
    5. DiNardo, John & Fortin, Nicole M & Lemieux, Thomas, 1996. "Labor Market Institutions and the Distribution of Wages, 1973-1992: A Semiparametric Approach," Econometrica, Econometric Society, vol. 64(5), pages 1001-1044, September.
    6. Hoyt Bleakley & Joseph Ferrie, 2016. "Shocking Behavior: Random Wealth in Antebellum Georgia and Human Capital Across Generations," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 131(3), pages 1455-1495.
    7. Margo, Robert A., 2016. "Obama, Katrina, and the Persistence of Racial Inequality," The Journal of Economic History, Cambridge University Press, vol. 76(2), pages 301-341, June.
    8. Melvin Stephens & Takashi Unayama, 2019. "Estimating the Impacts of Program Benefits: Using Instrumental Variables with Underreported and Imputed Data," The Review of Economics and Statistics, MIT Press, vol. 101(3), pages 468-475, July.
    9. Solon, Gary, 1999. "Intergenerational mobility in the labor market," Handbook of Labor Economics, in: O. Ashenfelter & D. Card (ed.), Handbook of Labor Economics, edition 1, volume 3, chapter 29, pages 1761-1800, Elsevier.
    10. Bhashkar Mazumder, 2005. "Fortunate Sons: New Estimates of Intergenerational Mobility in the United States Using Social Security Earnings Data," The Review of Economics and Statistics, MIT Press, vol. 87(2), pages 235-255, May.
    11. Kasey S. Buckles & Daniel M. Hungerman, 2013. "Season of Birth and Later Outcomes: Old Questions, New Answers," The Review of Economics and Statistics, MIT Press, vol. 95(3), pages 711-724, July.
    12. Steven Haider & Gary Solon, 2006. "Life-Cycle Variation in the Association between Current and Lifetime Earnings," American Economic Review, American Economic Association, vol. 96(4), pages 1308-1320, September.
    13. Collins, William J. & Wanamaker, Marianne H., 2015. "The Great Migration in Black and White: New Evidence on the Selection and Sorting of Southern Migrants," The Journal of Economic History, Cambridge University Press, vol. 75(4), pages 947-992, December.
    14. Dora L. Costa & Heather DeSomer & Eric Hanss & Christopher Roudiez & Sven E. Wilson & Noelle Yetter, 2017. "Union Army veterans, all grown up," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 50(2), pages 79-95, April.
    15. P. Lahiri & Michael D. Larsen, 2005. "Regression Analysis With Linked Data," Journal of the American Statistical Association, American Statistical Association, vol. 100, pages 222-230, March.
    16. Horowitz, Joel L & Manski, Charles F, 1995. "Identification and Robustness with Contaminated and Corrupted Data," Econometrica, Econometric Society, vol. 63(2), pages 281-302, March.
    17. Raj Chetty & Nathaniel Hendren & Patrick Kline & Emmanuel Saez & Nicholas Turner, 2014. "Is the United States Still a Land of Opportunity? Recent Trends in Intergenerational Mobility," American Economic Review, American Economic Association, vol. 104(5), pages 141-147, May.
    18. Abowd, John M. & Vilhuber, Lars, 2005. "The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers," Journal of Business & Economic Statistics, American Statistical Association, vol. 23, pages 133-152, April.
    19. Maria J. Wisselgren & S�ren Edvinsson & Mats Berggren & Maria Larsson, 2014. "Testing Methods of Record Linkage on Swedish Censuses," Historical Methods: A Journal of Quantitative and Interdisciplinary History, Taylor & Francis Journals, vol. 47(3), pages 138-151, September.
    20. Bhashkar Mazumder, 2015. "Estimating the Intergenerational Elasticity and Rank Association in the U.S.: Overcoming the Current Limitations of Tax Data," Working Paper Series WP-2015-4, Federal Reserve Bank of Chicago.
    21. Raj Chetty & Nathaniel Hendren & Patrick Kline & Emmanuel Saez, 2014. "Where is the land of Opportunity? The Geography of Intergenerational Mobility in the United States," The Quarterly Journal of Economics, President and Fellows of Harvard College, vol. 129(4), pages 1553-1623.
    22. Richard Hornbeck & Suresh Naidu, 2014. "When the Levee Breaks: Black Migration and Economic Development in the American South," American Economic Review, American Economic Association, vol. 104(3), pages 963-990, March.
    23. A'Hearn, Brian & Baten, Jörg & Crayen, Dorothee, 2009. "Quantifying Quantitative Literacy: Age Heaping and the History of Human Capital," The Journal of Economic History, Cambridge University Press, vol. 69(3), pages 783-808, September.
    24. Shari Eli & Laura Salisbury & Allison Shertzer, 2016. "Migration Responses to Conflict: Evidence from the Border of the American Civil War," NBER Working Papers 22591, National Bureau of Economic Research, Inc.
    25. Michael Hout & Avery M. Guest, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Comment," American Economic Review, American Economic Association, vol. 103(5), pages 2021-2040, August.
    26. Abramitzky, Ran & Boustan, Leah Platt & Eriksson, Katherine, 2013. "Have the poor always been less likely to migrate? Evidence from inheritance practices during the age of mass migration," Journal of Development Economics, Elsevier, vol. 102(C), pages 2-14.
    27. White, Halbert, 1980. "A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity," Econometrica, Econometric Society, vol. 48(4), pages 817-838, May.
    28. Ran Abramitzky & Leah Platt Boustan & Katherine Eriksson, 2014. "A Nation of Immigrants: Assimilation and Economic Outcomes in the Age of Mass Migration," Journal of Political Economy, University of Chicago Press, vol. 122(3), pages 467-506.
    29. Hoyt Bleakley & Joseph P. Ferrie, 2013. "Up from Poverty? The 1832 Cherokee Land Lottery and the Long-run Distribution of Wealth," NBER Working Papers 19175, National Bureau of Economic Research, Inc.
    30. Otis Duncan, 1968. "Patterns of occupational mobility among Negro men," Demography, Springer;Population Association of America (PAA), vol. 5(1), pages 11-22, March.
    31. Solon, Gary, 1992. "Intergenerational Income Mobility in the United States," American Economic Review, American Economic Association, vol. 82(3), pages 393-408, June.
    32. Jørgen Modalsli, 2017. "Intergenerational Mobility in Norway, 1865–2011," Scandinavian Journal of Economics, Wiley Blackwell, vol. 119(1), pages 34-71, January.
    33. James Heckman & Hidehiko Ichimura & Jeffrey Smith & Petra Todd, 1998. "Characterizing Selection Bias Using Experimental Data," Econometrica, Econometric Society, vol. 66(5), pages 1017-1098, September.
    34. Zimmerman, David J, 1992. "Regression toward Mediocrity in Economic Stature," American Economic Review, American Economic Association, vol. 82(3), pages 409-429, June.
    35. William J. Collins & Marianne H. Wanamaker, 2017. "African American Intergenerational Economic Mobility Since 1880," NBER Working Papers 23395, National Bureau of Economic Research, Inc.
    36. Yu Xie & Alexandra Killewald, 2013. "Intergenerational Occupational Mobility in Great Britain and the United States since 1850: Comment," American Economic Review, American Economic Association, vol. 103(5), pages 2003-2020, August.
    37. Leah Platt Boustan & Carola Frydman & Robert A. Margo, 2014. "Introduction to "Human Capital in History: The American Record"," NBER Chapters, in: Human Capital in History: The American Record, pages 1-14, National Bureau of Economic Research, Inc.
    38. Gunky Kim & Raymond Chambers, 2012. "Regression Analysis under Probabilistic Multi‐Linkage," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(1), pages 64-79, February.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zachary Ward, 2023. "Intergenerational Mobility in American History: Accounting for Race and Measurement Error," American Economic Review, American Economic Association, vol. 113(12), pages 3213-3248, December.
    2. Ran Abramitzky & Leah Boustan & Katherine Eriksson & James Feigenbaum & Santiago Pérez, 2021. "Automated Linking of Historical Data," Journal of Economic Literature, American Economic Association, vol. 59(3), pages 865-918, September.
    3. Brantly Callaway & Weige Huang, 2020. "Distributional Effects of a Continuous Treatment with an Application on Intergenerational Mobility," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 82(4), pages 808-842, August.
    4. Elisa Jácome & Ilyana Kuziemko & Suresh Naidu, 2021. "Mobility for All: Representative Intergenerational Mobility Estimates over the 20th Century," Working Papers 302, Princeton University, Department of Economics, Center for Economic Policy Studies..
    5. Martin Nybom & Jan Stuhler, 2017. "Biases in Standard Measures of Intergenerational Income Dependence," Journal of Human Resources, University of Wisconsin Press, vol. 52(3), pages 800-825.
    6. Markus Jäntti & Stephen P. Jenkins, 2013. "Income Mobility," SOEPpapers on Multidisciplinary Panel Data Research 607, DIW Berlin, The German Socio-Economic Panel (SOEP).
    7. Chelsea Murray & Robert Graham Clark & Silvia Mendolia & Peter Siminski, 2018. "Direct Measures of Intergenerational Income Mobility for Australia," The Economic Record, The Economic Society of Australia, vol. 94(307), pages 445-468, December.
    8. Florencia Torche, 2015. "Analyses of Intergenerational Mobility," The ANNALS of the American Academy of Political and Social Science, , vol. 657(1), pages 37-62, January.
    9. Catherine G. Massey, 2016. "Playing with Matches: An Assessment of Accuracy in Linked Historical Data," CARRA Working Papers 2016-05, Center for Economic Studies, U.S. Census Bureau.
    10. Galassi, Gabriela & Koll, David & Mayr, Lukas, 2019. "The Intergenerational Correlation of Employment: Is There a Role for Work Culture?," IZA Discussion Papers 12595, Institute of Labor Economics (IZA).
    11. Chenhong Peng & Paul Siu Fai Yip & Yik Wa Law, 2019. "Intergenerational Earnings Mobility and Returns to Education in Hong Kong: A Developed Society with High Economic Inequality," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 143(1), pages 133-156, May.
    12. Gabriela Galassi & David Koll & Lukas Mayr, 2019. "The Intergenerational Correlation of Employment: Is There a Role for Work Culture?," Staff Working Papers 19-33, Bank of Canada.
    13. Bhashkar Mazumder, 2018. "Intergenerational Mobility in the United States: What We Have Learned from the PSID," The ANNALS of the American Academy of Political and Social Science, , vol. 680(1), pages 213-234, November.
    14. Huang, Xiao & Huang, Shoujun & Shui, Ailun, 2021. "Government spending and intergenerational income mobility: Evidence from China," Journal of Economic Behavior & Organization, Elsevier, vol. 191(C), pages 387-414.
    15. Inwood, Kris & Minns, Chris & Summerfield, Fraser, 2019. "Occupational income scores and immigrant assimilation. Evidence from the Canadian census," Explorations in Economic History, Elsevier, vol. 72(C), pages 114-122.
    16. Chu, Luke Yu-Wei & Lin, Ming-Jen, 2016. "Economic development and intergenerational earnings mobility: Evidence from Taiwan," Working Paper Series 19495, Victoria University of Wellington, School of Economics and Finance.
    17. Tharcisio Leone, 2019. "The Geography of Intergenerational Mobility: Evidence of Educational Persistence and the “Great Gatsby Curve" in Brazil," Documentos de Trabajo 17526, The Latin American and Caribbean Economic Association (LACEA).
    18. Jaehyun Nam, 2021. "Does Economic Inequality Constrain Intergenerational Economic Mobility? The Association Between Income Inequality During Childhood and Intergenerational Income Persistence in the United States," Social Indicators Research: An International and Interdisciplinary Journal for Quality-of-Life Measurement, Springer, vol. 154(2), pages 469-488, April.
    19. Leone, Tharcisio, 2019. "The geography of intergenerational mobility: Evidence of educational persistence and the "Great Gatsby Curve" in Brazil," GIGA Working Papers 318, GIGA German Institute of Global and Area Studies.
    20. Michelle M. Miller & Frank McIntyre, 2020. "Does Money Matter for Intergenerational Income Transmission?," Southern Economic Journal, John Wiley & Sons, vol. 86(3), pages 941-970, January.

    More about this item

    JEL classification:

    • C45 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Neural Networks and Related Topics
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • J62 - Labor and Demographic Economics - - Mobility, Unemployment, Vacancies, and Immigrant Workers - - - Job, Occupational and Intergenerational Mobility; Promotion
    • N31 - Economic History - - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy - - - U.S.; Canada: Pre-1913
    • N32 - Economic History - - Labor and Consumers, Demography, Education, Health, Welfare, Income, Wealth, Religion, and Philanthropy - - - U.S.; Canada: 1913-

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:aea:jeclit:v:58:y:2020:i:4:p:997-1044. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Michael P. Albert (email available below). General contact details of provider: https://edirc.repec.org/data/aeaaaea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.