Enhancing Preference-based Linear Bandits via Human Response Time
Shen Li,
Yuyang Zhang,
Zhaolin Ren,
Claire Liang,
Na Li and
Julie A. Shah
Papers from arXiv.org
Abstract:
Interactive preference learning systems present humans with queries as pairs of options; humans then select their preferred choice, allowing the system to infer preferences from these binary choices. While binary choice feedback is simple and widely used, it offers limited information about preference strength. To address this, we leverage human response times, which inversely correlate with preference strength, as complementary information. We introduce a computationally efficient method based on the EZ-diffusion model, combining choices and response times to estimate the underlying human utility function. Theoretical and empirical comparisons with traditional choice-only estimators show that for queries where humans have strong preferences (i.e., "easy" queries), response times provide valuable complementary information and enhance utility estimates. We integrate this estimator into preference-based linear bandits for fixed-budget best-arm identification. Simulations on three real-world datasets demonstrate that incorporating response times significantly accelerates preference learning.
Date: 2024-09, Revised 2024-10
New Economics Papers: this item is included in nep-dcm, nep-ecm, nep-ipr and nep-upt
References: View references in EconPapers View complete reference list from CitEc
Citations:
Downloads: (external link)
http://arxiv.org/pdf/2409.05798 Latest version (application/pdf)
Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.
Export reference: BibTeX
RIS (EndNote, ProCite, RefMan)
HTML/Text
Persistent link: https://EconPapers.repec.org/RePEc:arx:papers:2409.05798
Access Statistics for this paper
More papers in Papers from arXiv.org
Bibliographic data for series maintained by arXiv administrators ().