Federated Offline Policy Learning
Aldo Gael Carranza and
Susan Athey
Additional contact information
Aldo Gael Carranza: Stanford U
Research Papers from Stanford University, Graduate School of Business
Abstract:
We consider the problem of learning personalized decision policies from observational bandit feedback data across multiple heterogeneous data sources. In our approach, we introduce a novel regret analysis that establishes finite-sample upper bounds on distinguishing notions of global regret for all data sources on aggregate and of local regret for any given data source. We characterize these regret bounds by expressions of source heterogeneity and distribution shift. Moreover, we examine the practical considerations of this problem in the federated setting where a central server aims to train a policy on data distributed across the heterogeneous sources without collecting any of their raw data. We present a policy learning algorithm amenable to federation based on the aggregation of local policies trained with doubly robust offline policy evaluation strategies. Our analysis and supporting experimental results provide insights into tradeoffs in the participation of heterogeneous data sources in offline policy learning.
Date: 2024-10
References: Add references at CitEc
Citations:
Downloads: (external link)
https://www.gsb.stanford.edu/faculty-research/work ... line-policy-learning
Related works:
Working Paper: Federated Offline Policy Learning (2024)
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:ecl:stabus:4215
Access Statistics for this paper
More papers in Research Papers from Stanford University, Graduate School of Business Contact information at EDIRC.
Bibliographic data for series maintained by ().