[go: up one dir, main page]

IDEAS home Printed from https://ideas.repec.org/a/inm/ormoor/v48y2023i4p2308-2336.html
   My bibliography  Save this article

Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation

Author

Listed:
  • Wenlong Mou

    (University of California, Berkeley, Berkeley, California 94720)

  • Ashwin Pananjady

    (Georgia Institute of Technology, Atlanta, Georgia 30332)

  • Martin J. Wainwright

    (University of California, Berkeley, Berkeley, California 94720; Massachusetts Institute of Technology, Cambridge, Massachusetts 02139)

Abstract
Linear fixed-point equations in Hilbert spaces arise in a variety of settings, including reinforcement learning, and computational methods for solving differential and integral equations. We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space. First, we prove an instance-dependent upper bound on the mean-squared error for a linear stochastic approximation scheme that exploits Polyak–Ruppert averaging. This bound consists of two terms: an approximation error term with an instance-dependent approximation factor and a statistical error term that captures the instance-specific complexity of the noise when projected onto the low-dimensional subspace. Using information-theoretic methods, we also establish lower bounds showing that both of these terms cannot be improved, again in an instance-dependent sense. A concrete consequence of our characterization is that the optimal approximation factor in this problem can be much larger than a universal constant. We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation, establishing their optimality.

Suggested Citation

  • Wenlong Mou & Ashwin Pananjady & Martin J. Wainwright, 2023. "Optimal Oracle Inequalities for Projected Fixed-Point Equations, with Applications to Policy Evaluation," Mathematics of Operations Research, INFORMS, vol. 48(4), pages 2308-2336, November.
  • Handle: RePEc:inm:ormoor:v:48:y:2023:i:4:p:2308-2336
    DOI: 10.1287/moor.2022.1341
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/moor.2022.1341
    Download Restriction: no

    File URL: https://libkey.io/10.1287/moor.2022.1341?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:48:y:2023:i:4:p:2308-2336. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.