The evaluative mind. Mind Design III. Forthcoming. [Draft]
I propose that the successes and contributions of reinforcement learning urge us to see the mind in a new light, namely, to recognise that the mind is fundamentally evaluative in nature.
I argue for the role of reinforcement learning in the philosophy of mind. To start, I make several assumptions about the nature of reinforcement learning and its instantiation in minds like ours. I then review some of the contributions of reinforcement learning methods have made across the so-called decision sciences. Finally, I show how principles from reinforcement learning can shape philosophical debates regarding the nature of perception and characterisations of desire.
Ethical and social risks of harm from language models. With Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P. S., ... & Gabriel, I. arXiv preprint arXiv:2112.04359. (2022) [Published]
This paper aims to help structure the risk landscape associated with large-scale Language Models (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas.
The neuroscience of moral judgment: empirical and philosophical developments. With J May, CI Workman, & H Han. in Neuroscience and Philosophy, Eds. Felipe de Brigard & Walter Sinnott-Armstrong. (2022). [Published]
We chart how neuroscience and philosophy have together advanced our understanding of moral judgment with implications for when it goes well or poorly. Combined with rigorous evidence from psychology and careful philosophical analysis, neuroscientific evidence can even help shed light on the extent of moral knowledge and on ways to promote healthy moral development.
Can Hierarchical Predictive Coding Explain Binocular Rivalry? Philosophical Psychology. (2021). [Draft]
Hohwy et al.’s (2008) model of binocular rivalry (BR) is taken as a classic illustration of predictive coding’s explanatory power. I revisit the account and show that it cannot explain the role of reward in BR. I then consider a more recent version of Bayesian model averaging, which recasts the role of reward in (BR) in terms of optimism bias. If we accept this account, however, then we must reconsider our conception of perception. On this latter view, I argue, organisms engage in what amounts to policy-driven, motivated perception.
I draw on computational models and empirical evidence from cognitive neuroscience to describe a naturalistic, multi-system model of the mind. On this model, synchronic self-control is impossible. Must we, then, give up on a meaningful conception of instrumental rationality? No. A multi-system view still permits something like synchronic self-control: an agent can control her very strong desires. Adopting a multi-system model of the mind thus places limitations on our conceptions of instrumental rationality, without requiring that we abandon the notion altogether.
I describe a suite of reinforcement learning environments in which artificial agents learn to value and respond to moral content and contexts. I illustrate the core principles of the framework by characterizing one such environment, or “gridworld,” in which an agent learns to trade-off between monetary profit and fair dealing, as applied in a standard behavioral economic paradigm. I then highlight the core technical and philosophical advantages of the learning approach for modeling moral cognition, and for addressing the so-called value alignment problem in AI.
We argue that Lieder and Griffiths’ method for analyzing rational process models cannot capture an important constraint on resource allocation, which is competition between different processes for shared resources (Klein 2018). We suggest that holistic interactions between processes on at least three different timescales—episodic, developmental, and evolutionary—must be taken into account by a complete resource-bounded explanation.
Revising and Expanding Cushman's Learning-Based Model of Moral Cognition. In Does Neuroscience Have Normative Implications? Eds. Geoffrey Holtzman and Elisabeth Hildt. (2022) [Draft]
Moral cognition refers to the human capacity to experience and respond to situations of moral significance. Recently, philosophers and cognitive scientists have turned to reinforcement learning, a branch of machine learning, to develop formal, mathematical models of normative cognition. I argue that moral cognition instead depends on three or more in decision-making systems, with interactions between the systems producing its characteristic sociological, psychological, and phenomenological features.
May (2018) cites a body of evidence suggesting that participants take consequences, personal harm, and other factors into consideration when making moral judgments. This evidence is used to support the conclusion that moral cognition relies on rule-based inference. This commentary defends an alternative interpretation of this evidence, namely, that it can be explained in terms of domain general valuation mechanisms.
This paper presents an empirical solution to the puzzle of weakness of will. Specifically, it presents a theory of action, grounded in contemporary cognitive neuroscientific accounts of decision making, that explains the phenomenon of weakness of will without resulting in a puzzle.
Recovering Spinoza's Theory of Akrasia. In Doing without Free Will: Spinoza and Contemporary Moral Problems. Eds. Ursula Goldenbaum and Christopher Kluz. Rowman and Littlefield. (2015). [Draft] [Published]
I show that Spinoza defends a causal psychological theory of akrasia, absent a concept of free will. I then challenge three contemporary discussions of Spinoza's view, as put forward by Jonathan Bennett (1984), Michael Della Rocca (1996), and Martin Lin (2006).
Paper on addiction and blameworthiness
Paper on prospective memory and weakness of will
Paper on the nature of valuation
Research in reinforcement learning asks how an agent can learn to optimize its behavior by learning from interactions with its environment. (Sutton and Barto 1998, 2018). The research program’s plurality of analyses, findings, and theories is, at a minimum, a sign of its scientific productivity (Kitcher 1982, 35-48). In this paper, I argue for something stronger: namely, I argue that these models and findings target a sui generis cognitive capacity. This cognitive capacity is valuation, or the goal- and context- dependent subpersonal attribution of subjective reward and value to internal and external stimuli.
Paper on moral valuation
I defend moral valuationism as a theory of moral cognition. I argue that moral valuationism explains standard moral cognitive desiderata including the fundamentally dynamic character of our evaluative attitudes; moral cognition in early childhood development; moral cognition in (bi-directional) cases of pathological dysfunction; and traditional philosophical puzzles, such as moral dumbfounding and trolley problems. In the process, I recast pieces of evidence traditionally appealed to by sentimentalist, rationalist, moral learning, and hybrid views in moral valuational terms. I then compare moral valuationism to these rival theories of moral cognition, highlighting features of the target system that my view can explain and, arguably, those it cannot. I devote the last portion of the paper to some of the implications of my view, including what might be called an economic conception of moral cognition; the resulting picture of indirect moral change; and the prospective convergence of empirical moral psychology and the design of artificial moral cognition.
Paper on valuation and desire
The reward-based theory of desire holds that “to have an intrinsic desire regarding it being the case that p is to constitute p as a reward or a punishment” (Schroeder, 2004; Schroeder and Arpaly, 2014). In doing so, the theory preserves the traditional, philosophical folk psychological notion of desire but specifies it in contemporary computational and empirical terms. In this paper, I defend two related theses. First, I argue that the traditional notion of desire is best expressed by the computational notion of reward and value, rather than only the notion of reward, in order to capture not only intrinsic but instrumental desire. In addition, second, I propose that in theoretical contexts, we can replace the philosophical folk psychological notion of desire with the technical notions of reward and value, allowing these notions to play an explicit role in resolving philosophical puzzles and debates.
Paper on cognitive control [with Colin Klein]
We begin by proposing that there are two, previously undistinguished, senses of cognitive control in the cognitive neuroscience literature. The first sense of cognitive control, which we call ‘psychological cognitive control,’ refers to a psychological capacity posited to explain our widespread but limited ability to multi-task. By contrast, the second sense, which we call ‘connectionist-neural cognitive control,’ refers to a resource allocation mechanism in complex, dynamic systems like the brain. We argue that psychological control is implemented by connectionist control mechanisms, and argue show how multiplexing, a structural feature of connectionist control in which systems reuse control representations across multiple domains, produces the signature limitations of psychological control, e.g., the inability to simultaneously do mental arithmetic and remember a three-digit number
Paper on modeling moral problems [with Colin Klein]
The normative nature of much of moral philosophy suggests that for all moral problems, there is a right thing to do, and the job of moral deliberation is to find it. This obscures important differences between how we might approach moral decisions. We propose that there are in fact two general types of moral problems: pattern-matching moral problems and adaptive moral problems, defined by constant, ethically-loaded adjustment to an uncertain world. We argue that these different types of problems have fundamentally different structures, and so should be modeled using different machine learning approaches.