Reinforcement or Reward in Learning: Striatum
Striatum
The role of rewards in the survival and well-being of biological agents ranges from the control of vegetative functions to the organization of voluntary, goal-directed behavior. The brain extracts the reward information from a large variety of stimuli and events for appropriate use in the control of behavior. Recent studies revealed that neurons in a limited number of brain structures carry specific signals about past and future rewards. The neurophysiological study of reward processing within the framework of goal-directed behavior may contribute to a basic understanding of mechanisms of drug abuse and could thus have a strong medical and social impact. This article concerns the reward signals in the striatum and describes how its neurons detect rewards, learn to predict future rewards from past experience, and use reward information for learning, choosing, preparing and executing goal-directed behavior.
Behavioral Functions of Rewards
Biological agents need to acquire nutrional substances from the environment and interact with sexual partners for reproduction. The brain controls the contact with foods, fluids, and sexual partners and mediates the adaptation of behavior to novel or changed situations. Neuronal mechanisms not only detect rewards but also predict them on the basis of representations formed by past experience. Through these mechanisms rewards serve as goals for voluntary and intentional forms of behavior.
Rewards have several basic functions. A popular view holds that rewards induce subjective feelings of pleasure and contribute to positive emotions. Rewards act as positive reinforcers by increasing the frequency and intensity of behavior leading to the acquisition of goal objects in classical and instrumental conditioning procedures. The learning function is based on the discrepancy between the prediction and occurrence of rewards (reward-prediction error).
Rewards are goals that elicit approach and consummatory behavior. Reward objects have positive motivational value because they elicit effortful behavioral reactions. The values arise either through innate mechanisms or, in most cases, learning. In this way rewards help to establish value systems for behavior and serve as key references for behavioral decisions. Differences in perceived reward values in individual agents and situations may help to explain variations in behavioral choices.
A Case for the Striatum in Reward Mechanisms
The basal ganglia are composed of the striatum (caudate nucleus, putamen, and ventral striatum, including nucleus accumbens), dopamine neurons of the substantia nigra and ventral tegmental area (groups A8, A9, and A10), and a number of other structures. Current evidence suggests that the basal ganglia are important for control of voluntary, goal-directed behavior and the processing of rewarding outcomes. Human diseases involving these structures lead to deficits in voluntary behavior and movements (Parkinsonism, schizophrenia, and Huntington's chorea). Lesioning and psychopharmacological and electrical self-stimulation experiments strongly indicate that dopamine neurons and the ventral striatum serve prime motivational functions (Robbins and Everitt 1996). Major addictive substances, such as cocaine and heroin, increase the dopamine concentration in the ventral striatum, and animals try to receive injections of dopamine or opiates into the ventral striatum (Wise, 1996). Single neurons in the basal ganglia preparate and initate movements and process information about rewarding outcomes (Schultz, 2000).
Neurophysiology of Reward Mechanisms in the Striatum
Neurons in the striatum detect the delivery of rewards and discriminate among them. Other striatal neurons detect conditioned, reward-predicting visual stimuli and discriminate reward-predicting from nonpredicting stimuli. Many natural situations involve sequences of individual movements that lead to a final reward. Neurons in the ventral striatum respond differentially to sequential cues at different steps away from the reward. It appears that these neurons report the positions of individual stimuli within a behavioral sequence and thus signal the progress towards the reward.
A well learned reward-predicting stimulus evokes a state of expectation. Striatal neurons are active during several seconds of expectation of predictable rewards. Their activity follows a reward-predicting stimulus and persists for several seconds until the reward is delivered. Such neurons discriminate between different future rewards and other, nonrewarding predictable task events, such as movement-eliciting stimuli or instructions cues. The activity may reflect a neuronal representation of reward established through previous experience. Reward-detecting and reward-expecting neurons are found about twice as often in the ventral striatum as in the caudate and putamen. These findings may provide neurophysiological correlates for the known motivational functions of the ventral striatum.
Expectations change with experience. Animals expect reward initially on all trials when learning to discriminate rewarded from nonrewarded stimuli. Similarly, neurons in the striatum show reward expectation activity during initial trials with novel stimuli, and this activity is progressively restricted to rewarded rather than unrewarded trials. These adaptations are reflected in the activity of neurons in the striatum.
Expected rewards may serve as goals for voluntary behavior if information about the reward is present while behavioral reactions toward the reward are being prepared and executed (Dickinson and Balleine, 1994). Neuronal mechanisms in the striatum may integrate reward information into processes mediating the behavior leading to the reward. Some neurons in the anterior striatum show sustained activity for a few seconds during the preparation of a movement. These activations occur much more commonly in rewarded than unrewarded trials and vary with the type of reward expected for the movement. Thus both the future reward and the movement toward the reward are represented by these neurons. The reward-dependent activity may be a way in which the expected reward is represented by neurons and can influence neuronal processes underlying the behavior toward that reward.
Reward Information Arriving at the Striatum
The striatum is a part of a limited network of brain structures involved in the processing of reward information. Although it is difficult at the moment to assess the different functions of each of these structures, we can describe the neuronal activities in some of the structures sending information to the striatum.
The orbitofrontal cortex, the ventral part of the frontal lobe, is a part of the limbic system involved in motivation and emotions. Some of its neurons project to the striatum, in particular its ventral part, including nucleus accumbens. Orbitofrontal neurons discriminate between different rewards on the basis of the subject's preferences (Tremblay and Schultz, 1999). For example, a neuron is more active when a preferred rather than a nonpreferred reward is expected. But when the initially nonpreferred reward occurs in trials alternating with an even less preferred reward, the nonpreferred award will become the preferred one and the neuron will be activated predominantly by this reward. These neurons do not code rewards on the basis of their physical properties. Rather, they code the relative preference of the reward. Neurons coding the relative preference for rewards might provide important information to neuronal mechanism in the frontal lobe underlying goal-directed behavioral choices.
Dopamine neurons in the substantia nigra and the ventral tegmental area project to the dorsal and ventral striatum, respectively. Most dopamine neurons show short, phasic activations following the presentation of liquid and food rewards and conditioned visual and auditory reward-predicting stimuli. However, dopamine neurons do not respond to rewards unconditionally but code them relative to what is predicted, suggesting that they code a reward-prediction error (Waelti et al., 2001). The dopamine response may be a global reinforcement signal sent in parallel to all neurons in the striatum. Learning theory suggests that prediction errors play a crucial role in learning and that the phasic dopamine response may be an ideal teaching signal for approach learning. This signal strongly resembles the teaching signal used by effective temporal-difference reinforcement models (Montague et al., 1996). Indeed, artificial neuronal networks that use this type of teaching signal learn to play world-class backgammon.
Less is known about reward processing in other brain structures. Neurons in the amygdala, which projects to the ventral striatum, react to the occurrence of rewards and discriminate between different liquid and food rewards. Neurons in the dorsolateral prefrontal cortex show reward-discriminating activity during the preparation of movements and may thus be involved in the organisation of behavior directed towards rewarding goals (Watanabe, 1996).
Conclusion
The striatum processes reward information in different ways. Neurons in the striatum detect and discriminate between among rewards and may play a role in assessing the nature and identity of individual rewards. However, the striatum not only detects and analyses past events, but it also constructs and dynamically modifies predictions based on past experience. Striatal neurons respond to learned stimuli that predict rewards and show sustained activity during periods pr reward expectations. They even take guesses about future rewards and adapt their activity to experience.
Once we understand more about how the brain treats rewards, we can investigate how reward information produces motivated behavior. Neurons in structures that control behavior seem to incorporate information about upcoming rewards when coding reward-seeking behavior. Future research may permit us to understand how such activity may lead to choices and decisions incorporating both the cost and the benefit of behavior. We should also investigate neuronal mechanisms behind higher, more cognitive rewards typical for human behavior. Such research would help us to understand further how brains control important characteristics of voluntary and motivated behavior of complex organisms, possibly to the extent that individual differences in brain function explain variations in basic traits of personality.
Bibliography
Dickinson, A., and Balleine, B. (1994). Motivational control of goal-directed action. Animal Learning and Behavior 22, 1-18.
Montague, P. R., Dayan, P., and Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive Hebbian learning. Journal of Neuroscience 16, 1,936-1,947.
Robbins, T. W., and Everitt, B. J. (1996). Neurobehavioral mechanisms of reward and motivation. Current Opinion in Neurobiology 6, 228-236.
Robinson, T. E., and Berridge, K. C. (1993). The neural basis for drug craving: An incentive-sensitization theory of addiction. Brain Research Reviews 18, 247-291.
Schultz, W. (2000). Multiple reward systems in the brain. Nature Reviews Neuroscience 1, 199-207.
Waelti, P., Dickinson, A., and Schultz, W. (2001). Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43-48.
Watanabe, M. (1996). Reward expectancy in primate prefrontal neurons. Nature 382, 629-632.
Wise, R. A. (1996). Neurobiology of addiction. Current Opinion in Neurobiology 6, 243-251.
WolframSchultz