Reinforcement Learning and Sequential Decision-Making
A structured reference to the individuals shaping artificial intelligence across research, industry, governance, ethics, and public discourse. Curated as a resource for professionals navigating the AI landscape. For more see the full list at AI People.
Reinforcement Learning and Sequential Decision-Making
Richard Sutton | Reinforcement Learning Theory | Canada Co-authored the definitive textbook on reinforcement learning with Andrew Barto. Developed temporal-difference learning, the foundation of modern RL. His 2019 essay "The Bitter Lesson" argued that general methods leveraging computation outperform approaches encoding human knowledge-a principle vindicated by recent AI advances. Key works: "Reinforcement Learning: An Introduction" (2018), "The Bitter Lesson" (2019)
Andrew Barto | Reinforcement Learning | United States Co-developed actor-critic methods and co-authored the canonical RL textbook with Sutton. His foundational work established reinforcement learning as a rigorous discipline connecting machine learning, control theory, and neuroscience. Key works: "Reinforcement Learning: An Introduction" (2018), actor-critic methods
Richard Bellman | Dynamic Programming | United States Created dynamic programming and the Bellman equation in the 1950s-mathematical foundations that underpin all modern reinforcement learning. His work on optimal sequential decision-making became essential fifty years later as RL matured. Key works: Dynamic programming, Bellman equations
David Silver | Game AI, Deep RL | United Kingdom Led the teams that created AlphaGo, AlphaZero, and AlphaFold at DeepMind. AlphaGo's 2016 victory over Lee Sedol was a watershed moment for AI. His work demonstrated that deep reinforcement learning could master complex domains previously thought to require human intuition. Key works: AlphaGo, AlphaZero
John Schulman | Policy Optimisation | United States Created TRPO and PPO (Proximal Policy Optimisation)-algorithms that made reinforcement learning practical and stable. PPO became the standard for training RL systems and was central to RLHF techniques used to align language models. Key works: TRPO, PPO algorithms
Pieter Abbeel | Robot Learning, Deep RL | United States Pioneered using deep reinforcement learning for robotics, enabling robots to learn complex manipulation tasks. Co-founded Covariant (warehouse robotics) and Berkeley's robot learning lab. Bridges academic research and practical robotics deployment. Key works: Robot learning, inverse reinforcement learning, apprenticeship learning
Sergey Levine | Robot Learning, Model-Based RL | United States Leads research on enabling robots to learn from real-world experience rather than simulation. His work on model-based reinforcement learning and offline RL addresses key challenges in deploying learning systems in physical environments. Key works: Robot RL, offline RL, decision transformer
Doina Precup | Hierarchical RL, Options Framework | Canada Developed the options framework for hierarchical reinforcement learning, enabling agents to learn and reuse skills. Leads DeepMind's Montreal lab while maintaining academic research on temporal abstraction in RL. Key works: Options framework, hierarchical RL
Shane Legg | Intelligence Measurement, RL | United Kingdom Co-founded DeepMind and developed formal definitions of machine intelligence. His thesis on universal intelligence measures provided theoretical grounding for comparing different AI systems' capabilities. Key works: "Machine Super Intelligence" thesis, universal intelligence
Noam Brown | Game Theory, Strategic AI | United States Created Libratus and Pluribus-AI systems that defeated top humans at poker, a game of imperfect information. Now at OpenAI working on reasoning and planning. His work demonstrates AI can master strategic deception and decision-making under uncertainty. Key works: Libratus, Pluribus poker AI
How to Use This Directory
For research: Each entry includes key works and affiliations for deeper investigation.
For event planning: Filter by geographic base, domain, or public engagement experience.
For understanding the field: The categorisation reveals how different communities, from technical researchers, ethicists, policymakers, industry leaders all shape AI development.
For identifying perspectives: Note whose voices are included and whose might be missing from any particular AI conversation.
This directory is maintained as a resource for the AI age. Last updated: 2026.
Curated by Rahim Hirji for thesuperskills.com.
Buy SuperSkills: The Seven Human Skills for the Age of AI. Available from July 2026.
If this is what you are grappling with in your organisation, the fastest starting point is the Sprint.
Win hearts and minds with a Keynote
or Hire Rahim Hirji on Retainer
Rahim Hirji

