[go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/p/arx/papers/2105.02344.html
   My bibliography  Save this paper

Policy Learning with Adaptively Collected Data

Author

Listed:
  • Ruohan Zhan
  • Zhimei Ren
  • Susan Athey
  • Zhengyuan Zhou
Abstract
Learning optimal policies from historical data enables personalization in a wide variety of applications including healthcare, digital recommendations, and online education. The growing policy learning literature focuses on settings where the data collection rule stays fixed throughout the experiment. However, adaptive data collection is becoming more common in practice, from two primary sources: 1) data collected from adaptive experiments that are designed to improve inferential efficiency; 2) data collected from production systems that progressively evolve an operational policy to improve performance over time (e.g. contextual bandits). Yet adaptivity complicates the optimal policy identification ex post, since samples are dependent, and each treatment may not receive enough observations for each type of individual. In this paper, we make initial research inquiries into addressing the challenges of learning the optimal policy with adaptively collected data. We propose an algorithm based on generalized augmented inverse propensity weighted (AIPW) estimators, which non-uniformly reweight the elements of a standard AIPW estimator to control worst-case estimation variance. We establish a finite-sample regret upper bound for our algorithm and complement it with a regret lower bound that quantifies the fundamental difficulty of policy learning with adaptive data. When equipped with the best weighting scheme, our algorithm achieves minimax rate optimal regret guarantees even with diminishing exploration. Finally, we demonstrate our algorithm's effectiveness using both synthetic data and public benchmark datasets.

Suggested Citation

  • Ruohan Zhan & Zhimei Ren & Susan Athey & Zhengyuan Zhou, 2021. "Policy Learning with Adaptively Collected Data," Papers 2105.02344, arXiv.org, revised Nov 2022.
  • Handle: RePEc:arx:papers:2105.02344
    as

    Download full text from publisher

    File URL: http://arxiv.org/pdf/2105.02344
    File Function: Latest version
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    2. Y. Q. Zhao & D. Zeng & E. B. Laber & R. Song & M. Yuan & M. R. Kosorok, 2015. "Doubly robust learning for estimating individualized treatment with censored data," Biometrika, Biometrika Trust, vol. 102(1), pages 151-168.
    3. Toru Kitagawa & Aleksey Tetenov, 2018. "Who Should Be Treated? Empirical Welfare Maximization Methods for Treatment Choice," Econometrica, Econometric Society, vol. 86(2), pages 591-616, March.
    4. Vivek F. Farias & Andrew A. L, 2019. "Learning Preferences with Side Information," Management Science, INFORMS, vol. 65(7), pages 3131-3149, July.
    5. S. A. Murphy, 2003. "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 65(2), pages 331-355, May.
    6. Hamsa Bastani & Mohsen Bayati, 2020. "Online Decision Making with High-Dimensional Covariates," Operations Research, INFORMS, vol. 68(1), pages 276-294, January.
    7. Zhan, Ruohan & Hadad, Vitor & Hirshberg, David A. & Athey, Susan, 2021. "Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits," Research Papers 3970, Stanford University, Graduate School of Business.
    8. Susan Athey & Stefan Wager, 2021. "Policy Learning With Observational Data," Econometrica, Econometric Society, vol. 89(1), pages 133-161, January.
    9. Xin Zhou & Nicole Mayer-Hamblett & Umer Khan & Michael R. Kosorok, 2017. "Residual Weighted Learning for Estimating Individualized Treatment Rules," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 112(517), pages 169-187, January.
    10. Andrew Bennett & Nathan Kallus, 2020. "Efficient Policy Learning from Surrogate-Loss Classification Reductions," Papers 2002.05153, arXiv.org.
    11. Guido W. Imbens, 2004. "Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review," The Review of Economics and Statistics, MIT Press, vol. 86(1), pages 4-29, February.
    12. Omar Besbes & Assaf Zeevi, 2009. "Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms," Operations Research, INFORMS, vol. 57(6), pages 1407-1420, December.
    13. Mert Demirer & Vasilis Syrgkanis & Greg Lewis & Victor Chernozhukov, 2019. "Semi-Parametric Efficient Policy Learning with Continuous Actions," Papers 1905.10116, arXiv.org, revised Jul 2019.
    14. Daniel Russo & Benjamin Van Roy, 2014. "Learning to Optimize via Posterior Sampling," Mathematics of Operations Research, INFORMS, vol. 39(4), pages 1221-1243, November.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Shantanu Gupta & Zachary C. Lipton & David Childers, 2021. "Efficient Online Estimation of Causal Effects by Deciding What to Observe," Papers 2108.09265, arXiv.org, revised Oct 2021.
    2. Masahiro Kato & Kyohei Okumura & Takuya Ishihara & Toru Kitagawa, 2024. "Adaptive Experimental Design for Policy Learning," Papers 2401.03756, arXiv.org, revised Feb 2024.
    3. Keshav Agrawal & Susan Athey & Ayush Kanodia & Emil Palikot, 2022. "Personalized Recommendations in EdTech: Evidence from a Randomized Controlled Trial," Papers 2208.13940, arXiv.org, revised Dec 2022.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Weibin Mo & Yufeng Liu, 2022. "Efficient learning of optimal individualized treatment rules for heteroscedastic or misspecified treatment‐free effect models," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 84(2), pages 440-472, April.
    2. Zhengyuan Zhou & Susan Athey & Stefan Wager, 2023. "Offline Multi-Action Policy Learning: Generalization and Optimization," Operations Research, INFORMS, vol. 71(1), pages 148-183, January.
    3. Shosei Sakaguchi, 2021. "Estimation of Optimal Dynamic Treatment Assignment Rules under Policy Constraints," Papers 2106.05031, arXiv.org, revised Aug 2024.
    4. Huber, Martin, 2019. "An introduction to flexible methods for policy evaluation," FSES Working Papers 504, Faculty of Economics and Social Sciences, University of Freiburg/Fribourg Switzerland.
    5. Davide Viviano & Jelena Bradic, 2020. "Fair Policy Targeting," Papers 2005.12395, arXiv.org, revised Jun 2022.
    6. Davide Viviano, 2019. "Policy Targeting under Network Interference," Papers 1906.10258, arXiv.org, revised Apr 2024.
    7. Chunrong Ai & Yue Fang & Haitian Xie, 2024. "Data-driven Policy Learning for Continuous Treatments," Papers 2402.02535, arXiv.org, revised Nov 2024.
    8. Q. Clairon & R. Henderson & N. J. Young & E. D. Wilson & C. J. Taylor, 2021. "Adaptive treatment and robust control," Biometrics, The International Biometric Society, vol. 77(1), pages 223-236, March.
    9. Yi Zhang & Kosuke Imai, 2023. "Individualized Policy Evaluation and Learning under Clustered Network Interference," Papers 2311.02467, arXiv.org, revised Feb 2024.
    10. Garbero, Alessandra & Sakos, Grayson & Cerulli, Giovanni, 2023. "Towards data-driven project design: Providing optimal treatment rules for development projects," Socio-Economic Planning Sciences, Elsevier, vol. 89(C).
    11. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Machine-Learning Approach," Economics working papers 2021-08, Department of Economics, Johannes Kepler University Linz, Austria.
    12. Rahul Singh & Liyuan Xu & Arthur Gretton, 2020. "Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves," Papers 2010.04855, arXiv.org, revised Oct 2022.
    13. Ganesh Karapakula, 2023. "Stable Probability Weighting: Large-Sample and Finite-Sample Estimation and Inference Methods for Heterogeneous Causal Effects of Multivalued Treatments Under Limited Overlap," Papers 2301.05703, arXiv.org, revised Jan 2023.
    14. Giorgos Bakoyannis, 2023. "Estimating optimal individualized treatment rules with multistate processes," Biometrics, The International Biometric Society, vol. 79(4), pages 2830-2842, December.
    15. Andrew Bennett & Nathan Kallus, 2020. "Efficient Policy Learning from Surrogate-Loss Classification Reductions," Papers 2002.05153, arXiv.org.
    16. Jiacheng Wu & Nina Galanter & Susan M. Shortreed & Erica E.M. Moodie, 2022. "Ranking tailoring variables for constructing individualized treatment rules: An application to schizophrenia," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 71(2), pages 309-330, March.
    17. Davide Viviano & Jess Rudder, 2020. "Policy design in experiments with unknown interference," Papers 2011.08174, arXiv.org, revised May 2024.
    18. Kitagawa, Toru & Wang, Guanyi, 2023. "Who should get vaccinated? Individualized allocation of vaccines over SIR network," Journal of Econometrics, Elsevier, vol. 232(1), pages 109-131.
    19. Tobias Cagala & Ulrich Glogowsky & Johannes Rincke & Anthony Strittmatter, 2021. "Optimal Targeting in Fundraising: A Causal Machine-Learning Approach," Papers 2103.10251, arXiv.org, revised Sep 2021.
    20. Yunan Wu & Lan Wang, 2021. "Resampling‐based confidence intervals for model‐free robust inference on optimal treatment regimes," Biometrics, The International Biometric Society, vol. 77(2), pages 465-476, June.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:arx:papers:2105.02344. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: arXiv administrators (email available below). General contact details of provider: http://arxiv.org/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.