default search action
Philip S. Thomas
Person information
- affiliation: University of Massachusetts Amherst, Department of Computer Science
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j3]Scott M. Jordan, Samuel Neumann, James E. Kostas, Adam White, Philip S. Thomas:
The Cliff of Overcommitment with Policy Gradient Step Sizes. RLJ 2: 864-883 (2024) - [j2]Kartik Choudhary, Dhawal Gupta, Philip S. Thomas:
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data. RLJ 4: 1546-1566 (2024) - [c56]Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva:
From Past to Future: Rethinking Eligibility Traces. AAAI 2024: 12253-12260 - [c55]Min-Hsuan Yeh, Blossom Metevier, Austin Hoag, Philip S. Thomas:
Analyzing the Relationship Between Difference and Ratio-Based Fairness Metrics. FAccT 2024: 518-528 - [c54]Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas:
Position: Benchmarking is Limited in Reinforcement Learning Research. ICML 2024 - [i43]Kartik Choudhary, Dhawal Gupta, Philip S. Thomas:
ICU-Sepsis: A Benchmark MDP Built from Real Medical Data. CoRR abs/2406.05646 (2024) - [i42]Scott M. Jordan, Adam White, Bruno Castro da Silva, Martha White, Philip S. Thomas:
Position: Benchmarking is Limited in Reinforcement Learning Research. CoRR abs/2406.16241 (2024) - [i41]Shreyas Chaudhari, Ameet Deshpande, Bruno Castro da Silva, Philip S. Thomas:
Abstract Reward Processes: Leveraging State Abstraction for Consistent Off-Policy Evaluation. CoRR abs/2410.02172 (2024) - 2023
- [c53]Vincent Liu, Yash Chandak, Philip S. Thomas, Martha White:
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments. AISTATS 2023: 5474-5492 - [c52]Austin Hoag, James E. Kostas, Bruno Castro da Silva, Philip S. Thomas, Yuriy Brun:
Seldonian Toolkit: Building Software with Safe and Fair Machine Learning. ICSE Companion 2023: 107-111 - [c51]Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno C. da Silva:
Behavior Alignment via Reward Function Optimization. NeurIPS 2023 - [i40]Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno Castro da Silva, Emma Brunskill, Philip S. Thomas:
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments. CoRR abs/2301.10330 (2023) - [i39]Yash Chandak, Shiv Shankar, Venkata Gandikota, Philip S. Thomas, Arya Mazumdar:
Optimization using Parallel Gradient Evaluations on Multiple Parameters. CoRR abs/2302.03161 (2023) - [i38]Vincent Liu, Yash Chandak, Philip S. Thomas, Martha White:
Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments. CoRR abs/2302.11725 (2023) - [i37]James E. Kostas, Scott M. Jordan, Yash Chandak, Georgios Theocharous, Dhawal Gupta, Martha White, Bruno Castro da Silva, Philip S. Thomas:
Coagent Networks: Generalized and Scaled. CoRR abs/2305.09838 (2023) - [i36]Yuhong Luo, Austin Hoag, Philip S. Thomas:
Learning Fair Representations with High-Confidence Guarantees. CoRR abs/2310.15358 (2023) - [i35]Dhawal Gupta, Yash Chandak, Scott M. Jordan, Philip S. Thomas, Bruno Castro da Silva:
Behavior Alignment via Reward Function Optimization. CoRR abs/2310.19007 (2023) - [i34]Dhawal Gupta, Scott M. Jordan, Shreyas Chaudhari, Bo Liu, Philip S. Thomas, Bruno Castro da Silva:
From Past to Future: Rethinking Eligibility Traces. CoRR abs/2312.12972 (2023) - 2022
- [c50]Stephen Giguere, Blossom Metevier, Bruno Castro da Silva, Yuriy Brun, Philip S. Thomas, Scott Niekum:
Fairness Guarantees under Demographic Shift. ICLR 2022 - [c49]Jared Yeager, J. Eliot B. Moss, Michael Norrish, Philip S. Thomas:
Mechanizing Soundness of Off-Policy Evaluation. ITP 2022: 32:1-32:20 - [c48]Yash Chandak, Shiv Shankar, Nathaniel D. Bastian, Bruno C. da Silva, Emma Brunskill, Philip S. Thomas:
Off-Policy Evaluation for Action-Dependent Non-stationary Environments. NeurIPS 2022 - [i33]Abhinav Bhatia, Philip S. Thomas, Shlomo Zilberstein:
Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL. CoRR abs/2206.02380 (2022) - [i32]Aline Weber, Blossom Metevier, Yuriy Brun, Philip S. Thomas, Bruno Castro da Silva:
Enforcing Delayed-Impact Fairness Guarantees. CoRR abs/2208.11744 (2022) - 2021
- [c47]Yash Chandak, Shiv Shankar, Philip S. Thomas:
High-Confidence Off-Policy (or Counterfactual) Variance Estimation. AAAI 2021: 6939-6947 - [c46]James E. Kostas, Yash Chandak, Scott M. Jordan, Georgios Theocharous, Philip S. Thomas:
High Confidence Generalization for Reinforcement Learning. ICML 2021: 5764-5773 - [c45]Chris Nota, Philip S. Thomas, Bruno C. da Silva:
Posterior Value Functions: Hindsight Baselines for Policy Gradient Methods. ICML 2021: 8238-8247 - [c44]My Phan, Philip S. Thomas, Erik G. Learned-Miller:
Towards Practical Mean Bounds for Small Samples. ICML 2021: 8567-8576 - [c43]Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche:
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs. NeurIPS 2021: 2004-2017 - [c42]Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum:
SOPE: Spectrum of Off-Policy Estimators. NeurIPS 2021: 18958-18969 - [c41]Yash Chandak, Scott Niekum, Bruno C. da Silva, Erik G. Learned-Miller, Emma Brunskill, Philip S. Thomas:
Universal Off-Policy Evaluation. NeurIPS 2021: 27475-27490 - [c40]Dhawal Gupta, Gabor Mihucz, Matthew Schlegel, James E. Kostas, Philip S. Thomas, Martha White:
Structural Credit Assignment in Neural Networks using Reinforcement Learning. NeurIPS 2021: 30257-30270 - [c39]Ali Montazeralghaem, James Allan, Philip S. Thomas:
Large-scale Interactive Conversational Recommendation System using Actor-Critic Framework. RecSys 2021: 220-229 - [i31]Yash Chandak, Shiv Shankar, Philip S. Thomas:
High-Confidence Off-Policy (or Counterfactual) Variance Estimation. CoRR abs/2101.09847 (2021) - [i30]Yash Chandak, Scott Niekum, Bruno Castro da Silva, Erik G. Learned-Miller, Emma Brunskill, Philip S. Thomas:
Universal Off-Policy Evaluation. CoRR abs/2104.12820 (2021) - [i29]Harsh Satija, Philip S. Thomas, Joelle Pineau, Romain Laroche:
Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs. CoRR abs/2106.00099 (2021) - [i28]Christina J. Yuan, Yash Chandak, Stephen Giguere, Philip S. Thomas, Scott Niekum:
SOPE: Spectrum of Off-Policy Estimators. CoRR abs/2111.03936 (2021) - [i27]James E. Kostas, Philip S. Thomas, Georgios Theocharous:
Edge-Compatible Reinforcement Learning for Recommendations. CoRR abs/2112.05812 (2021) - 2020
- [c38]Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas:
Lifelong Learning with a Changing Action Set. AAAI 2020: 3373-3380 - [c37]Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas:
Reinforcement Learning When All Actions Are Not Always Available. AAAI 2020: 3381-3388 - [c36]Chris Nota, Philip S. Thomas:
Is the Policy Gradient a Gradient? AAMAS 2020: 939-947 - [c35]Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas:
Optimizing for the Future in Non-Stationary MDPs. ICML 2020: 1414-1425 - [c34]Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas:
Evaluating the Performance of Reinforcement Learning Algorithms. ICML 2020: 4962-4973 - [c33]James E. Kostas, Chris Nota, Philip S. Thomas:
Asynchronous Coagent Networks. ICML 2020: 5426-5435 - [c32]Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas:
Towards Safe Policy Improvement for Non-Stationary MDPs. NeurIPS 2020 - [c31]Pinar Ozisik, Philip S. Thomas:
Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms. NeurIPS 2020 - [i26]Francisco M. Garcia, Chris Nota, Philip S. Thomas:
Learning Reusable Options for Multi-Task Reinforcement Learning. CoRR abs/2001.01577 (2020) - [i25]Yash Chandak, Georgios Theocharous, Shiv Shankar, Martha White, Sridhar Mahadevan, Philip S. Thomas:
Optimizing for the Future in Non-Stationary MDPs. CoRR abs/2005.08158 (2020) - [i24]Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas:
Evaluating the Performance of Reinforcement Learning Algorithms. CoRR abs/2006.16958 (2020) - [i23]Georgios Theocharous, Yash Chandak, Philip S. Thomas, Frits de Nijs:
Reinforcement Learning for Strategic Recommendations. CoRR abs/2009.07346 (2020) - [i22]Yash Chandak, Scott M. Jordan, Georgios Theocharous, Martha White, Philip S. Thomas:
Towards Safe Policy Improvement for Non-Stationary MDPs. CoRR abs/2010.12645 (2020)
2010 – 2019
- 2019
- [c30]Saket Tiwari, Philip S. Thomas:
Natural Option Critic. AAAI 2019: 5175-5182 - [c29]Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas:
A Compression-Inspired Framework for Macro Discovery. AAMAS 2019: 1973-1975 - [c28]Francisco M. Garcia, Philip S. Thomas:
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning. AAMAS 2019: 1976-1978 - [c27]Yash Chandak, Georgios Theocharous, James E. Kostas, Scott M. Jordan, Philip S. Thomas:
Learning Action Representations for Reinforcement Learning. ICML 2019: 941-950 - [c26]Philip S. Thomas, Erik G. Learned-Miller:
Concentration Inequalities for Conditional Value at Risk. ICML 2019: 6225-6233 - [c25]Francisco M. Garcia, Philip S. Thomas:
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning. NeurIPS 2019: 5692-5701 - [c24]Blossom Metevier, Stephen Giguere, Sarah Brockman, Ari Kobren, Yuriy Brun, Emma Brunskill, Philip S. Thomas:
Offline Contextual Bandits with High Probability Fairness Guarantees. NeurIPS 2019: 14893-14904 - [i21]Tengyang Xie, Philip S. Thomas, Gerome Miklau:
Privacy Preserving Off-Policy Evaluation. CoRR abs/1902.00174 (2019) - [i20]Yash Chandak, Georgios Theocharous, James E. Kostas, Scott M. Jordan, Philip S. Thomas:
Learning Action Representations for Reinforcement Learning. CoRR abs/1902.00183 (2019) - [i19]Francisco M. Garcia, Philip S. Thomas:
A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning. CoRR abs/1902.00843 (2019) - [i18]James E. Kostas, Chris Nota, Philip S. Thomas:
Asynchronous Coagent Networks: Stochastic Networks for Reinforcement Learning without Backpropagation or a Clock. CoRR abs/1902.05650 (2019) - [i17]Erik G. Learned-Miller, Philip S. Thomas:
A New Confidence Interval for the Mean of a Bounded Random Variable. CoRR abs/1905.06208 (2019) - [i16]Yash Chandak, Georgios Theocharous, Chris Nota, Philip S. Thomas:
Lifelong Learning with a Changing Action Set. CoRR abs/1906.01770 (2019) - [i15]Yash Chandak, Georgios Theocharous, Blossom Metevier, Philip S. Thomas:
Reinforcement Learning When All Actions are Not Always Available. CoRR abs/1906.01772 (2019) - [i14]Philip S. Thomas, Scott M. Jordan, Yash Chandak, Chris Nota, James E. Kostas:
Classical Policy Gradient: Preserving Bellman's Principle of Optimality. CoRR abs/1906.03063 (2019) - [i13]Chris Nota, Philip S. Thomas:
Is the Policy Gradient a Gradient? CoRR abs/1906.07073 (2019) - [i12]Sneha Aenugu, Abhishek Sharma, Sasikiran Yelamarthi, Hananel Hazan, Philip S. Thomas, Robert Kozma:
Reinforcement learning with spiking coagents. CoRR abs/1910.06489 (2019) - 2018
- [c23]Philip S. Thomas, Christoph Dann, Emma Brunskill:
Decoupling Gradient-Like Learning Rules from Representations. ICML 2018: 4924-4932 - [c22]Shayan Doroudi, Philip S. Thomas, Emma Brunskill:
Importance Sampling for Fair Policy Selection. IJCAI 2018: 5239-5243 - [i11]Saket Tiwari, Philip S. Thomas:
Natural Option Critic. CoRR abs/1812.01488 (2018) - 2017
- [c21]Philip S. Thomas, Emma Brunskill:
Importance Sampling with Unequal Support. AAAI 2017: 2646-2652 - [c20]Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh, Ishan Durugkar, Emma Brunskill:
Predictive Off-Policy Policy Evaluation for Nonstationary Decision Problems, with Applications to Digital Marketing. AAAI 2017: 4740-4745 - [c19]Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum:
Data-Efficient Policy Evaluation Through Behavior Policy Search. ICML 2017: 1394-1403 - [c18]Zhaohan Guo, Philip S. Thomas, Emma Brunskill:
Using Options and Covariance Testing for Long Horizon Off-Policy Policy Evaluation. NIPS 2017: 2492-2501 - [c17]Shayan Doroudi, Philip S. Thomas, Emma Brunskill:
Importance Sampling for Fair Policy Selection. UAI 2017 - [i10]Zhaohan Daniel Guo, Philip S. Thomas, Emma Brunskill:
Using Options for Long-Horizon Off-Policy Evaluation. CoRR abs/1703.03453 (2017) - [i9]Philip S. Thomas, Christoph Dann, Emma Brunskill:
Decoupling Learning Rules from Representations. CoRR abs/1706.03100 (2017) - [i8]Josiah P. Hanna, Philip S. Thomas, Peter Stone, Scott Niekum:
Data-Efficient Policy Evaluation Through Behavior Policy Search. CoRR abs/1706.03469 (2017) - [i7]Philip S. Thomas, Emma Brunskill:
Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines. CoRR abs/1706.06643 (2017) - [i6]Philip S. Thomas, Bruno Castro da Silva, Andrew G. Barto, Emma Brunskill:
On Ensuring that Intelligent Machines Are Well-Behaved. CoRR abs/1708.05448 (2017) - 2016
- [j1]Kathleen M. Jagodnik, Philip S. Thomas, Antonie J. van den Bogert, Michael S. Branicky, Robert F. Kirsch:
Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement. IEEE Trans. Hum. Mach. Syst. 46(5): 723-733 (2016) - [c16]Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos:
Increasing the Action Gap: New Operators for Reinforcement Learning. AAAI 2016: 1476-1483 - [c15]Philip S. Thomas, Emma Brunskill:
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. ICML 2016: 2139-2148 - [c14]Philip S. Thomas, Bruno Castro da Silva, Christoph Dann, Emma Brunskill:
Energetic Natural Gradient Descent. ICML 2016: 2887-2895 - [i5]Philip S. Thomas, Emma Brunskill:
Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning. CoRR abs/1604.00923 (2016) - [i4]Philip S. Thomas, Emma Brunskill:
Importance Sampling with Unequal Support. CoRR abs/1611.03451 (2016) - 2015
- [c13]Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh:
High-Confidence Off-Policy Evaluation. AAAI 2015: 3000-3006 - [c12]Philip S. Thomas, Georgios Theocharous, Mohammad Ghavamzadeh:
High Confidence Policy Improvement. ICML 2015: 2380-2388 - [c11]Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh:
Personalized Ad Recommendation Systems for Life-Time Value Optimization with Guarantees. IJCAI 2015: 1806-1812 - [c10]Philip S. Thomas, Scott Niekum, Georgios Theocharous, George Dimitri Konidaris:
Policy Evaluation Using the Ω-Return. NIPS 2015: 334-342 - [c9]Georgios Theocharous, Philip S. Thomas, Mohammad Ghavamzadeh:
Ad Recommendation Systems for Life-Time Value Optimization. WWW (Companion Volume) 2015: 1305-1310 - [i3]Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos:
Increasing the Action Gap: New Operators for Reinforcement Learning. CoRR abs/1512.04860 (2015) - [i2]Philip S. Thomas:
A Notation for Markov Decision Processes. CoRR abs/1512.09075 (2015) - 2014
- [c8]William Dabney, Philip S. Thomas:
Natural Temporal Difference Learning. AAAI 2014: 1767-1773 - [i1]Sridhar Mahadevan, Bo Liu, Philip S. Thomas, William Dabney, Stephen Giguere, Nicholas Jacek, Ian Gemp, Ji Liu:
Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces. CoRR abs/1405.6757 (2014) - 2013
- [c7]Philip S. Thomas, William Dabney, Stephen Giguere, Sridhar Mahadevan:
Projected Natural Actor-Critic. NIPS 2013: 2337-2345 - 2012
- [c6]Philip S. Thomas, Andrew G. Barto:
Motor primitive discovery. ICDL-EPIROB 2012: 1-8 - 2011
- [c5]George Dimitri Konidaris, Sarah Osentoski, Philip S. Thomas:
Value Function Approximation in Reinforcement Learning Using the Fourier Basis. AAAI 2011: 380-385 - [c4]Philip S. Thomas, Andrew G. Barto:
Conjugate Markov Decision Processes. ICML 2011: 137-144 - [c3]Philip S. Thomas:
Policy Gradient Coagent Networks. NIPS 2011: 1944-1952 - [c2]George Dimitri Konidaris, Scott Niekum, Philip S. Thomas:
TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning. NIPS 2011: 2402-2410
2000 – 2009
- 2009
- [c1]Philip S. Thomas, Antonie J. van den Bogert, Kathleen M. Jagodnik, Michael S. Branicky:
Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm. IAAI 2009
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-25 22:46 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint