%0 Journal Article %J Proceedings of the National Academy of Sciences (PNAS) %D 2020 %T The logic of universalization guides moral judgment %A Levine, Sydney %A Max Kleiman-Weiner %A Laura Schulz %A Joshua B. Tenenbaum %A Fiery A Cushman %X

To explain why an action is wrong, we sometimes say, “What if everybody did that?” In other words, even if a single person’s behavior is harmless, that behavior may be wrong if it would be harmful once universalized. We formalize the process of universalization in a computational model, test its quantitative predictions in studies of human moral judgment, and distinguish it from alternative models. We show that adults spontaneously make moral judgments consistent with the logic of universalization, and report comparable patterns of judgment in children. We conclude that, alongside other well-characterized mechanisms of moral judgment, such as outcome-based and rule-based thinking, the logic of universalizing holds an important place in our moral minds.

%B Proceedings of the National Academy of Sciences (PNAS) %P 202014505 %8 Feb-10-2020 %G eng %U http://www.pnas.org/lookup/doi/10.1073/pnas.2014505117 %! Proc Natl Acad Sci USA %R 10.1073/pnas.2014505117 %0 Conference Paper %B Cognitive Science Society %D 2019 %T Hard choices: Children’s understanding of the cost of action selection. %A Shari Liu %A Fiery A Cushman %A Samuel J Gershman %A Kool, Wouter %A Elizabeth S Spelke %B Cognitive Science Society %G eng %0 Journal Article %J Journal of Cognitive Neuroscience %D 2018 %T Planning Complexity Registers as a Cost in Metacontrol %A Kool, Wouter %A Samuel J Gershman %A Fiery A Cushman %X

Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning. Intuitively, if two tasks offer the same rewards for accuracy but one of them is much more demanding, we might expect people to rely on habit more in the difficult task: Devoting significant computation to achieve slight marginal accuracy gains would not be "worth it." We test and verify this prediction in a sequential reinforcement learning task. Because our paradigm is amenable to formal analysis, it contributes to the development of a computational model of how people balance the costs and benefits of different decision-making processes in a task-specific manner; in other words, how we decide when hard thinking is worth it.

%B Journal of Cognitive Neuroscience %V 30 %P 1391 - 1404 %8 10/2018 %G eng %U https://www.mitpressjournals.org/doi/abs/10.1162/jocn_a_01263 %N 10 %! Journal of Cognitive Neuroscience %R 10.1162/jocn_a_01263 %0 Journal Article %J Psychol Sci %D 2017 %T Cost-Benefit Arbitration Between Multiple Reinforcement-Learning Systems. %A Kool, Wouter %A Samuel J Gershman %A Fiery A Cushman %X

Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits. To investigate this proposal, we conducted two experiments showing that people increase model-based control when it achieves greater accuracy than model-free control, and especially when the rewards of accurate performance are amplified. In contrast, they are insensitive to reward amplification when model-based and model-free control yield equivalent accuracy. This suggests that humans adaptively balance habitual and planned action through on-line cost-benefit analysis.

%B Psychol Sci %V 28 %P 1321-1333 %8 2017 Sep %G eng %N 9 %R 10.1177/0956797617708288 %0 Generic %D 2017 %T Thinking fast or slow? A reinforcement-learning approach %A Kool, W %A Samuel J Gershman %A Fiery A Cushman %B Society for Personality and Social Psychology %C San Antonio, TX %0 Journal Article %J PLoS Comput Biol %D 2016 %T When Does Model-Based Control Pay Off? %A Kool, Wouter %A Fiery A Cushman %A Samuel J Gershman %X

Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to "model-free" and "model-based" strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding. It is assumed that this trade-off between accuracy and computational demand plays an important role in the arbitration between the two strategies, but we show that the hallmark task for dissociating model-free and model-based strategies, as well as several related variants, do not embody such a trade-off. We describe five factors that reduce the effectiveness of the model-based strategy on these tasks by reducing its accuracy in estimating reward outcomes and decreasing the importance of its choices. Based on these observations, we describe a version of the task that formally and empirically obtains an accuracy-demand trade-off between model-free and model-based strategies. Moreover, we show that human participants spontaneously increase their reliance on model-based control on this task, compared to the original paradigm. Our novel task and our computational analyses may prove important in subsequent empirical investigations of how humans balance accuracy and demand.

%B PLoS Comput Biol %V 12 %P e1005090 %8 2016 Aug %G eng %N 8 %R 10.1371/journal.pcbi.1005090