default search action
Mohammad Gheshlaghi Azar
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j3]Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney:
An Analysis of Quantile Temporal-Difference Learning. J. Mach. Learn. Res. 25: 163:1-163:47 (2024) - [c24]Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Rémi Munos, Mark Rowland, Michal Valko, Daniele Calandriello:
A General Theoretical Paradigm to Understand Learning from Human Preferences. AISTATS 2024: 4447-4455 - [c23]Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Bill Wu, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist:
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion. EMNLP 2024: 21353-21370 - [c22]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. ICML 2024 - [i32]Pierre Harvey Richemond, Yunhao Tang, Daniel Guo, Daniele Calandriello, Mohammad Gheshlaghi Azar, Rafael Rafailov, Bernardo Ávila Pires, Eugene Tarassov, Lucas Spangher, Will Ellsworth, Aliaksei Severyn, Jonathan Mallinson, Lior Shani, Gil Shamir, Rishabh Joshi, Tianqi Liu, Rémi Munos, Bilal Piot:
Offline Regularised Reinforcement Learning for Large Language Models Alignment. CoRR abs/2405.19107 (2024) - [i31]Eugene Choi, Arash Ahmadian, Matthieu Geist, Olivier Pietquin, Mohammad Gheshlaghi Azar:
Self-Improving Robust Preference Optimization. CoRR abs/2406.01660 (2024) - [i30]Yannis Flet-Berliac, Nathan Grinsztajn, Florian Strub, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Mohammad Gheshlaghi Azar, Olivier Pietquin, Matthieu Geist:
Contrastive Policy Gradient: Aligning LLMs on sequence-level scores in a supervised-friendly fashion. CoRR abs/2406.19185 (2024) - [i29]Nathan Grinsztajn, Yannis Flet-Berliac, Mohammad Gheshlaghi Azar, Florian Strub, Bill Wu, Eugene Choi, Chris Cremer, Arash Ahmadian, Yash Chandak, Olivier Pietquin, Matthieu Geist:
Averaging log-likelihoods in direct alignment. CoRR abs/2406.19188 (2024) - 2023
- [c21]Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo:
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. ICML 2023: 17135-17175 - [c20]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. ICML 2023: 33632-33656 - [i28]Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney:
An Analysis of Quantile Temporal-Difference Learning. CoRR abs/2301.04462 (2023) - [i27]Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo:
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. CoRR abs/2305.13185 (2023) - [i26]Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos:
A General Theoretical Paradigm to Understand Learning from Human Preferences. CoRR abs/2310.12036 (2023) - [i25]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. CoRR abs/2312.00886 (2023) - 2022
- [c19]Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Velickovic, Michal Valko:
Large-Scale Representation Learning on Graphs via Bootstrapping. ICLR 2022 - [c18]Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. NeurIPS 2022 - [i24]Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári:
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal. CoRR abs/2205.14211 (2022) - [i23]Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. CoRR abs/2206.08332 (2022) - [i22]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. CoRR abs/2212.03319 (2022) - 2021
- [c17]Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer:
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity. NeurIPS 2021: 10587-10599 - [i21]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Ávila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos:
Geometric Entropic Exploration. CoRR abs/2101.02055 (2021) - [i20]Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Rémi Munos, Petar Velickovic, Michal Valko:
Bootstrapped Representation Learning on Graphs. CoRR abs/2102.06514 (2021) - [i19]Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Keith B. Hengen, William R. Gray Roncal, Michal Valko, Eva L. Dyer:
Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction. CoRR abs/2102.10106 (2021) - [i18]Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer:
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity. CoRR abs/2111.02338 (2021) - 2020
- [c16]Zhaohan Daniel Guo, Bernardo Ávila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar:
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning. ICML 2020: 3875-3886 - [c15]Rémi Munos, Julien Pérolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls:
Fast computation of Nash Equilibria in Imperfect Information Games. ICML 2020: 7119-7129 - [c14]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. NeurIPS 2020 - [i17]Zhaohan Daniel Guo, Bernardo Ávila Pires, Bilal Piot, Jean-Bastien Grill, Florent Altché, Rémi Munos, Mohammad Gheshlaghi Azar:
Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning. CoRR abs/2004.14646 (2020) - [i16]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. CoRR abs/2006.07733 (2020) - [i15]Audrunas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Pérolat, Dustin Morrill, Vinícius Flores Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls:
The Advantage Regret-Matching Actor-Critic. CoRR abs/2008.12234 (2020)
2010 – 2019
- 2019
- [c13]Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Gregory Wayne, Satinder Singh, Doina Precup, Rémi Munos:
Hindsight Credit Assignment. NeurIPS 2019: 12467-12476 - [i14]Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Jean-Bastien Grill, Florent Altché, Rémi Munos:
World Discovery Models. CoRR abs/1902.07685 (2019) - [i13]Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alexander Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin J. Miller, Mohammad Gheshlaghi Azar, Ian Osband, Neil C. Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew M. Botvinick, Shane Legg:
Meta-learning of Sequential Strategies. CoRR abs/1905.03030 (2019) - [i12]Anna Harutyunyan, Will Dabney, Thomas Mesnard, Mohammad Gheshlaghi Azar, Bilal Piot, Nicolas Heess, Hado van Hasselt, Greg Wayne, Satinder Singh, Doina Precup, Rémi Munos:
Hindsight Credit Assignment. CoRR abs/1912.02503 (2019) - 2018
- [c12]Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Gheshlaghi Azar, David Silver:
Rainbow: Combining Improvements in Deep Reinforcement Learning. AAAI 2018: 3215-3222 - [c11]Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Matteo Hessel, Ian Osband, Alex Graves, Volodymyr Mnih, Rémi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg:
Noisy Networks For Exploration. ICLR (Poster) 2018 - [c10]Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, Bilal Piot, Marc G. Bellemare, Rémi Munos:
The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning. ICLR (Poster) 2018 - [i11]Tobias Pohlen, Bilal Piot, Todd Hester, Mohammad Gheshlaghi Azar, Dan Horgan, David Budden, Gabriel Barth-Maron, Hado van Hasselt, John Quan, Mel Vecerík, Matteo Hessel, Rémi Munos, Olivier Pietquin:
Observe and Look Further: Achieving Consistent Performance on Atari. CoRR abs/1805.11593 (2018) - [i10]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Bernardo A. Pires, Toby Pohlen, Rémi Munos:
Neural Predictive Belief Representations. CoRR abs/1811.06407 (2018) - 2017
- [c9]Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos:
Minimax Regret Bounds for Reinforcement Learning. ICML 2017: 263-272 - [i9]Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos:
Minimax Regret Bounds for Reinforcement Learning. CoRR abs/1703.05449 (2017) - [i8]Audrunas Gruslys, Mohammad Gheshlaghi Azar, Marc G. Bellemare, Rémi Munos:
The Reactor: A Sample-Efficient Actor-Critic Architecture. CoRR abs/1704.04651 (2017) - [i7]Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Rémi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg:
Noisy Networks for Exploration. CoRR abs/1706.10295 (2017) - [i6]Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Daniel Horgan, Bilal Piot, Mohammad Gheshlaghi Azar, David Silver:
Rainbow: Combining Improvements in Deep Reinforcement Learning. CoRR abs/1710.02298 (2017) - 2016
- [c8]Vicenç Gómez, Mohammad Gheshlaghi Azar, Hilbert J. Kappen:
Correcting Multivariate Auto-Regressive Models for the Influence of Unobserved Common Input. CCIA 2016: 177-186 - [c7]Mohammad Gheshlaghi Azar, Eva L. Dyer, Konrad P. Körding:
Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes. UAI 2016 - [i5]Mohammad Gheshlaghi Azar, Eva L. Dyer, Konrad P. Körding:
Convex Relaxation Regression: Black-Box Optimization of Smooth Functions by Learning Their Convex Envelopes. CoRR abs/1602.02191 (2016) - 2014
- [c6]Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill:
Online Stochastic Optimization under Correlated Bandit Feedback. ICML 2014: 1557-1565 - [i4]Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill:
Stochastic Optimization of a Locally Smooth Function under Correlated Bandit Feedback. CoRR abs/1402.0562 (2014) - 2013
- [j2]Mohammad Gheshlaghi Azar, Rémi Munos, Hilbert J. Kappen:
Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Mach. Learn. 91(3): 325-349 (2013) - [c5]Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill:
Sequential Transfer in Multi-armed Bandit with Finite Set of Models. NIPS 2013: 2220-2228 - [c4]Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill:
Regret Bounds for Reinforcement Learning with Policy Advice. ECML/PKDD (1) 2013: 97-112 - [i3]Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill:
Regret Bounds for Reinforcement Learning with Policy Advice. CoRR abs/1305.1027 (2013) - [i2]Mohammad Gheshlaghi Azar, Alessandro Lazaric, Emma Brunskill:
Sequential Transfer in Multi-armed Bandit with Finite Set of Models. CoRR abs/1307.6887 (2013) - 2012
- [j1]Mohammad Gheshlaghi Azar, Vicenç Gómez, Hilbert J. Kappen:
Dynamic policy programming. J. Mach. Learn. Res. 13: 3207-3245 (2012) - [c3]Mohammad Gheshlaghi Azar, Rémi Munos, Bert Kappen:
On the Sample Complexity of Reinforcement Learning with a Generative Model . ICML 2012 - 2011
- [c2]Mohammad Gheshlaghi Azar, Rémi Munos, Mohammad Ghavamzadeh, Hilbert J. Kappen:
Speedy Q-Learning. NIPS 2011: 2411-2419 - [c1]Mohammad Gheshlaghi Azar, Vicenç Gómez, Bert Kappen:
Dynamic Policy Programming with Function Approximation. AISTATS 2011: 119-127 - 2010
- [i1]Mohammad Gheshlaghi Azar, Hilbert J. Kappen:
Dynamic Policy Programming. CoRR abs/1004.2027 (2010)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-11-15 19:28 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint