1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms."> 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.">
[go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/p/bfi/wpaper/2020-152.html
   My bibliography  Save this paper

A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers

Author

Listed:
  • Tengyuan Liang

    (University of Chicago - Booth School of Business)

  • Pragya Sur

    (Harvard University - Department of Statistics)

Abstract
This paper establishes a precise high-dimensional asymptotic theory for boosting on separable data, taking statistical and computational perspectives. We consider the setting where the number of features (weak learners) p scales with the sample size n, in an over-parametrized regime. Under a broad class of statistical models, we provide an exact analysis of the generalization error of boosting, when the algorithm interpolates the training data and maximizes the empirical L1-margin. The relation between the boosting test error and the optimal Bayes error is pinned down explicitly. In turn, these precise characterizations resolve several open questions raised in [15, 81] surrounding boosting. On the computational front, we provide a sharp analysis of the stopping time when boosting approximately maximizes the empirical L1 margin. Furthermore, we discover that the larger the overparametrization ratio p/n, the smaller the proportion of active features (with zero initialization), and the faster the optimization reaches interpolation. At the heart of our theory lies an in-depth study of the maximum L1-margin, which can be accurately described by a new system of non-linear equations; we analyze this margin and the properties of this system, using Gaussian comparison techniques and a novel uniform deviation argument. Variants of AdaBoost corresponding to general Lq geometry, for q > 1, are also presented, together with an exact analysis of the high-dimensional generalization and optimization behavior of a class of these algorithms.

Suggested Citation

  • Tengyuan Liang & Pragya Sur, 2020. "A Precise High-Dimensional Asymptotic Theory for Boosting and Minimum-L1-Norm Interpolated Classifiers," Working Papers 2020-152, Becker Friedman Institute for Research In Economics.
  • Handle: RePEc:bfi:wpaper:2020-152
    as

    Download full text from publisher

    File URL: https://repec.bfi.uchicago.edu/RePEc/pdfs/BFI_WP_2020152.pdf
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Jon Kleinberg & Sendhil Mullainathan, 2019. "Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability," NBER Working Papers 25854, National Bureau of Economic Research, Inc.
    2. Tengyuan Liang & Hai Tran-Bach, 2020. "Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks," Working Papers 2020-151, Becker Friedman Institute for Research In Economics.
    3. Alexander Hanbo Li & Jelena Bradic, 2018. "Boosting in the Presence of Outliers: Adaptive Classification With Nonconvex Loss Functions," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(522), pages 660-674, April.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Kuanhao Jiang & Rajarshi Mukherjee & Subhabrata Sen & Pragya Sur, 2022. "A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance and Beyond," Papers 2205.10198, arXiv.org, revised Oct 2022.
    2. Tengyuan Liang, 2021. "Universal Prediction Band via Semi-Definite Programming," Papers 2103.17203, arXiv.org, revised Jan 2023.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Zhu Wang, 2022. "MM for penalized estimation," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 31(1), pages 54-75, March.
    2. Semyon Malamud & Andreas Schrimpf, 2021. "Persuasion by Dimension Reduction," Swiss Finance Institute Research Paper Series 21-69, Swiss Finance Institute.
    3. Claire Lazar Reich, 2021. "Affirmative Action vs. Affirmative Information," Papers 2102.10019, arXiv.org, revised Oct 2024.
    4. John W. Patty & Elizabeth Maggie Penn, 2022. "Algorithmic Fairness and Statistical Discrimination," Papers 2208.08341, arXiv.org.
    5. Ju, Xiaomeng & Salibián-Barrera, Matías, 2021. "Robust boosting for regression problems," Computational Statistics & Data Analysis, Elsevier, vol. 153(C).
    6. Piotr Skórka & Beata Grzywacz & Dawid Moroń & Magdalena Lenda, 2020. "The macroecology of the COVID-19 pandemic in the Anthropocene," PLOS ONE, Public Library of Science, vol. 15(7), pages 1-17, July.
    7. Ashesh Rambachan & Jon Kleinberg & Sendhil Mullainathan & Jens Ludwig, 2020. "An Economic Approach to Regulating Algorithms," NBER Working Papers 27111, National Bureau of Economic Research, Inc.
    8. Elizabeth Maggie Penn & John W. Patty, 2023. "Algorithms, Incentives, and Democracy," Papers 2307.02319, arXiv.org.
    9. Malamud, Semyon & Cieslak, Anna & Schrimpf, Paul, 2021. "Optimal Transport of Information," CEPR Discussion Papers 15859, C.E.P.R. Discussion Papers.
    10. Max H. Farrell & Tengyuan Liang & Sanjog Misra, 2020. "Deep Learning for Individual Heterogeneity: An Automatic Inference Framework," Papers 2010.14694, arXiv.org, revised Jul 2021.
    11. Runshan Fu & Manmohan Aseri & Param Vir Singh & Kannan Srinivasan, 2022. "“Un”Fair Machine Learning Algorithms," Management Science, INFORMS, vol. 68(6), pages 4173-4195, June.
    12. Heng Xu & Nan Zhang, 2022. "Implications of Data Anonymization on the Statistical Evidence of Disparity," Management Science, INFORMS, vol. 68(4), pages 2600-2618, April.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bfi:wpaper:2020-152. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Toni Shears (email available below). General contact details of provider: https://edirc.repec.org/data/mfichus.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.