Mechanisms of Reinforcement Learning and Decision Making in the Primate Dorsolateral Prefrontal Cortex
DAEYEOL LEE
Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut, USA
Search for more papers by this authorHYOJUNG SEO
Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut, USA
Search for more papers by this authorDAEYEOL LEE
Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut, USA
Search for more papers by this authorHYOJUNG SEO
Department of Neurobiology, Yale University School of Medicine, New Haven, Connecticut, USA
Search for more papers by this authorAbstract
Abstract: To a first approximation, decision making is a process of optimization in which the decision maker tries to maximize the desirability of the outcomes resulting from chosen actions. Estimates of desirability are referred to as utilities or value functions, and they must be continually revised through experience according to the discrepancies between the predicted and obtained rewards. Reinforcement learning theory prescribes various algorithms for updating value functions and can parsimoniously account for the results of numerous behavioral, neurophysiological, and imaging studies in humans and other primates. In this article, we first discuss relative merits of various decision-making tasks used in neurophysiological studies of decision making in nonhuman primates. We then focus on how reinforcement learning theory can shed new light on the function of the primate dorsolateral prefrontal cortex. Similar to the findings from other brain areas, such as cingulate cortex and basal ganglia, activity in the dorsolateral prefrontal cortex often signals the value of expected reward and actual outcome. Thus, the dorsolateral prefrontal cortex is likely to be a part of the broader network involved in adaptive decision making. In addition, reward-related activity in the dorsolateral prefrontal cortex is influenced by the animal's choices and other contextual information, and therefore may provide a neural substrate by which the animals can flexibly modify their decision-making strategies according to the demands of specific tasks.
REFERENCES
- 1 Sutton, R.S. & A.G. Barto. 1998. Reinforcement Learning: An Introduction. MIT Press. Cambridge , MA .
- 2 Lee, D. 2006. Neural basis of quasi-rational decision making. Curr. Opin. Neurobiol. 16: 191–198.
- 3 Hampton, A.N., P. Bossaerts & J. P. O'Doherty. 2006. The role of the ventromedial prefrontal cortex in abstract state-based inference during decision making in humans. J. Neurosci. 26: 8360–8367.
- 4 Daw, N.D. & K. Doya. 2006. The computational neurobiology of learning and reward. Curr. Opin. Neurobiol. 16: 199–204.
- 5 Schultz, W. 1998. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80: 1–27.
- 6 Schultz, W., L. Tremblay & J.R. Hollerman. 2000. Reward processing in primate orbitofrontal cortex and basal ganglia. Cereb. Cortex 10: 272–283.
- 7 O'Doherty, J.P., P. Dayan, K. Friston, et al . 2003. Temporal difference models and reward-related learning in the human brain. Neuron 38: 329–337.
- 8 McClure, S.M., G.S. Berns & P.R. Montague. 2003. Temporal prediction errors in a passive learning task activate human striatum. Neuron 38: 339–346.
- 9 Zeaman, D. 1949. Response latency as a function of the amount of reinforcement. J. Exp. Psychol. 39: 466–483.
- 10 Watanabe, M., H.C. Cromwell, L. Tremblay, et al . 2001. Behavioral reactions reflecting different reward expectations in monkeys. Exp. Brain Res. 140: 511–518.
- 11 Takikawa, Y., R. Kawagoe, H. Ito, et al . 2002. Modulation of saccadic eye movements by predicted reward outcome. Exp. Brain Res. 142: 284–291.
- 12 Sohn, J.-W. & D. Lee. 2006. Effects of reward expectancy on sequential eye movements in monkeys. Neural Netw. 19: 1181–1191.
- 13 Staddon, J.E.R., J.M. Hinson & R. Kram. 1981. Optimal choice. J. Exp Anal. Behav. 35: 397–412.
- 14 Lau, B. & P.W. Glimcher. 2005. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84: 555–579.
- 15 Corrado, G.S., L.P. Sugrue, H.S. Seung, et al . 2005. Linear-nonlinear-poisson models of primate choice dynamics. J. Exp. Anal. Behav. 84: 581–617.
- 16 Hikosaka, O., H. Nakahara, M.K. Rand, et al . 1999. Parallel neural networks for learning sequential procedures. Trends Neurosci. 22: 464–471.
- 17 Barraclough, D.J., M.L. Conroy & D. Lee. 2004. Prefrontal cortex and decision making in a mixed-strategy game. Nat. Neurosci. 7: 404–410.
- 18 Von Neumann, J. & O. Morgenstern. 1944. Theory of games and economic behavior. Princeton University Press. Princeton , NJ.
- 19 Lee, D., M.L. Conroy, B.P. McGreevy, et al . 2004. Reinforcement learning and decision making in monkeys during a competitive game. Cogn. Brain Res. 22: 45–58.
- 20 Lee, D., B.P. McGreevy & D.J. Barraclough. 2005. Learning and decision making in monkeys during a rock-paper-scissors game. Cogn. Brain Res. 25: 416–430.
- 21 Doya, K. 2002. Metalearning and neuromodulation. Neural Netw. 15: 495–506.
- 22 Soltani, A., D. Lee & X.-J. Wang. 2006. Neural mechanism for stochastic behavior during a competitive game. Neural Netw. 19: 1075–1090.
- 23 Milner, B. 1963. The effects of different brain lesions on card sorting. Arch. Neurol. 9: 90–100.
- 24 Stuss, D.T., B. Levine, M.P. Alexander, et al . 2000. Wisconsin card sorting test performance in patients with focal frontal and posterior brain damage: effects of lesion location and test structure on separable cognitive processes. Neuropsychologia 38: 388–402.
- 25 Levy, R. & P.S. Goldman-Rakic. 2000. Segregation of working memory functions within the dorsolateral prefrontal cortex. Exp. Brain Res. 133: 23–32.
- 26 Miller, E.K. & J.D. Cohen. 2001. An integrative theory of prefrontal cortex function. Ann. Rev. Neurosci. 24: 167–202.
- 27 Funahashi, S., C.J. Bruce & P.S. Goldman-Rakic. 1989. Mnemonic coding of visual space in the monkey's dorsolateral prefrontal cortex. J. Neurophysiol. 61: 331–349.
- 28 Watanabe, M. 1996. Reward expectancy in primate prefrontal neurons. Nature 382: 629–632.
- 29 Leon, M.I. & M.N. Shadlen. 1999. Effect of expected reward magnitude on the response of neurons in the dorsolateral prefrontal cortex of the macaque. Neuron 24: 415–425.
- 30 Kobayashi, S., J. Lauwereyns, M. Koizumi, et al . 2002. Influence of reward expectation on visuospatial processing in macaque lateral prefrontal cortex. J. Neurophysiol. 87: 1488–1498.
- 31 Watanabe, M., K. Hikosaka, M. Sakagami, et al . 2005. Functional significance of delay-period activity of primate prefrontal neurons in relation to spatial working memory and reward/omission-of-reward expectancy. Exp. Brain Res. 166: 263–276.
- 32 Ichihara-Takeda, S. & S. Funahashi. 2006. Reward-period activity in primate dorsolateral prefrontal and orbitofrontal neurons is affected by reward schedules. Cereb. Cortex 18: 212–226.
- 33 Pochon, J.B., R. Levy, P. Fossati, et al . 2002. The neural system that bridges reward and cognition in humans: an fMRI study. Proc. Natl. Acad. Sci. USA 99: 5669–5674.
- 34 Taylor, S.F., R.C. Welsh, T.D. Wager, et al . 2004. A functional neuroimaging study of motivation and executive function. Neuroimage 21: 1045–1054.
- 35 Watanabe, M. 1990. Prefrontal unit activity during associative learning in the monkey. Exp. Brain Res. 80: 296–309.
- 36 Watanabe, M. 1992. Frontal units of the monkey coding the associative significance of visual and auditory stimuli. Exp. Brain Res. 89: 233–247.
- 37 Amemori, K. & T. Sawaguchi. 2006. Contrasting effects of reward expectation on sensory and motor memories in primate prefrontal neurons. Cereb. Cortex 16: 1002–1015.
- 38 Samejima, K., Y. Ueda, K. Doya, et al . 2005. Representation of action-specific reward values in the striatum. Science 310: 1337–1340.
- 39 Platt, M.L. & P.W. Glimcher. 1999. Neural correlates of decision variables in parietal cortex. Nature 400: 233–238.
- 40 Sugrue, L.P., G.S. Corrado & W.T. Newsome. 2004. Matching behavior and the representation of value in the parietal cortex. Science 304: 1782–1787.
- 41 Dorris, M.C. & P.W. Glimcher. 2004. Activity in posterior parietal cortex is correlated with the relative subjective desirability of action. Neuron 44: 365–378.
- 42 Niki, H. & M. Watanabe. 1979. Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res. 171: 213–224.
- 43 Dreher, J-D., P. Kohn & K.F. Berman. 2006. Neural coding of distinct statistical properties of reward information in humans. Cereb. Cortex 16: 561–573.
- 44 Paulus, M.P., J.S. Feinstein, S.F. Tapert, et al . 2004. Trend detection via temporal difference model predicts inferior prefrontal cortex activation during acquisition of advantageous action selection. Neuroimage 21: 733–743.
- 45 Amador, N., M. Schlag-rey & J. Schlag. 2000. Reward-predicting and reward-detecting neuronal activity in the primate supplementary eye field. J. Neurophysiol. 84: 2166–2170.
- 46 Stuphorn, V., T.L. Taylor & J.D. Schall. 2000. Performance monitoring by the supplementary eye field. Nature 408: 857–860.
- 47 Ito, S., V. Stuphorn, J.W. Brown, et al . 2003. Performance monitoring by the anterior cingulate cortex during saccade countermanding. Science 302: 120–122.
- 48 McCoy, A.N., J.C. Crowley, G. Haghighian, et al . 2003. Saccade reward signals in posterior cingulate cortex. Neuron 40: 1031–1040.
- 49 Tremblay, L. & W. Schultz. 2000. Reward-related neuronal activity during go-nogo task performance in primate orbitofrontal cortex. J. Neurophysiol. 83: 1864–1876.
- 50 Hollerman, J.R., L. Tremblay & W. Schultz. 1998. Influence of reward expectation on behavior-related neuronal activity in primate striatum. J. Neurophysiol. 80: 947–963.
- 51 Tsujimoto, S. & T. Sawaguchi. 2004. Neuronal representation of response-outcome in the primate prefrontal cortex. Cereb. Cortex 14: 47–55.
- 52 Seo, H., D.J. Barraclough & D. Lee. 2007. Dynamic signals related to choices and outcomes in the dorsolateral prefrontal cortex. Cereb. Cortex. In press.
- 53 Tsujimoto, S. & T. Sawaguchi. 2005. Context-dependent representation of response-outcome in monkey prefrontal neurons. Cereb. Cortex 15: 888–898.