Reinforcement or Reward in Learning: Anatomical Substrates

views updated

Anatomical Substrates

The relationship of reinforcement to behavior is a useful prologue to a discussion of the brain circuits that mediate reinforcement (see Figure 1). Reinforcement and reward are parts of a larger system that mediates adaptive behavior that ensures the survival of the animal and the species. Adaptive behaviors, often called goal-directed or appetitive behaviors, include feeding, drinking, and reproduction. Goal-directed behavior can be divided into three phases: initiation, procurement, and consummatory (Watts and Swanson, 2002). In the initiation phase, there is a need or desire for the goal and a decision to fulfill it. For example, when you realize you are hungry, you first make the decision to eat. The consummatory phase involves fulfillment of the goal—in this case, eating. In between is the procurement phase where the goal is sought out. In reinforcement, a part of the procurement phase, the goal or reward increases the probability of occurrence of the behavior used to obtain it.

There are many factors that influence reinforcement: learning, memory, motivation, emotion, and locomotion. There must be a memory of previous experience with the goal to adopt the best strategy to obtain it. Learning occurs when the animal (or person) adapts its previous experience to the current circumstances. This may involve a Pavlovian associative mechanism, through which environmental stimuli that signal reward availability become conditioned stimuli, or instrumental learning, to determine which actions are effective in obtaining the goal, or an interaction between the two (Parkinson, Cardinal, and Everitt, 2000). Motivation is how hard one is willing to work to obtain a goal. Put another way, motivation energizes the response needed to obtain the goal. Emotion is very difficult to define precisely. However, by introspection we know that events with a high emotional impact profoundly affect our ability to learn and remember. Once the associations have been formed and the strategies selected, any attempt to reach the goal requires the skeletomotor system. If the response requires locomotion, then it also requires spatial or contextual learning to navigate through the environment to a remembered location of the reward. However, the response may require other forms of motor output instead, such as reaching and grasping, or pressing a lever, as in the classic laboratory experiment.

Historical Perspective

The first evidence that reinforcement is mediated by dissociable brain pathways came from studies of intracranial self-stimulation. This research demonstrated that focal electrical stimulation of various sites in the brain could be reinforcing events (Wise, 1998). Identifying sites involved in self-stimulation made it possible to ask questions about the systems with which these sites interact and how they ultimately interact with motor systems to influence behavior. Mappings of the most sensitive reinforcing sites implicated a variety of structures in the limbic system, including the nucleus accumbens, limbic cortex, amygdala, hippocampus, and dopaminergic pathways originating in the brain stem (Wise, 1998; Ikemoto and Panksepp, 1999). Subsequently, the emphasis shifted to the mesolimbic dopamine system that arises in the ventral tegmental area and projects densely to the nucleus accumbens. Research showed that systemic administration of dopamine antagonists, which block or inactivate dopamine receptors, decrease rates of spontaneous locomotion; rates of self-stimulation, regardless of the site of stimulation; and rates of instrumental responding for both natural (e.g., food or water) and drug reward (e.g., cocaine or heroin) (Salamone, Cousins, and Snyder, 1997; Ikemoto and Panksepp, 1999). The latter observation is important because it implies that drug abuse involves the same reinforcement mechanisms in the brain as natural rewards (Wise, 1997). While these observations are compelling, it should be noted that dopamine antagonists decrease responding but do not abolish it; therefore, the learned response is intact. In addition, dopamine-depleted rats always choose a less preferred but freely available reward, whereas intact animals always do the opposite. These observations suggest that dopamine is more involved in the arousal or motivational aspects of reward than in the informational aspects of stimulus-reward association formation (Salamone, Cousins, and Snyder, 1997; Parkinson, Cardinal, and Everitt, 2000).

The nucleus accumbens is a major site at which dopamine exerts its effects, in part because dopamine levels in the nucleus accumbens increase in response to the presentation of natural reward, the self-administration of most drugs of abuse, or in response to conditioned stimuli that predict their availability (Di Chiara, 1998). Moreover, microinjection of dopamine antagonists directly into this structure had much the same effect on the above responses as did systemic administration (Salamone, Cousins, and Snyder, 1997; Ikemoto and Panksepp, 1999).

The Nucleus Accumbens

A major advance in understanding the circuits that mediate reinforcement came with the realization that the nucleus accumbens was the ventral component of a larger striatal system. The dorsal striatum is part of the basal ganglia or extrapyramidal motor system and has been studied extensively for its role in initiating or coordinating movements of individual joints and limbs (Mink). The advantage of viewing the nucleus accumbens as part of this system is that it generates predictions of its anatomical organization and provides insights into the functional organization of behavior. Thus, the close relationship between the nucleus accumbens, or ventral striatum, and motor systems was fully appreciated only after recognition of its striatal nature.

Anatomically, this organization has several consequences. One is the appreciation of the predominance of inputs from the cortical mantle. Whereas the cortical inputs to the dorsal striatum arise primarily from somatosensory and motor-cortex and other isocortcal (i.e., neocortex, or six-layered cortex) regions, the inputs to the ventral striatum originate in allocortical (a general term for cortex with less than six layers) regions such as the limbic cortex, amygdala, and hippocampus (Groenewegen, Wright, and Beijer, 1996). A second consequence is the realization that, like the dorsal striatum, most of the output of the nucleus accumbens is to nigral-and pallidal-like structures (Groenewegen, Wright, and Beijer, 1996). Projections to nigral-like structures are inputs to the ventral tegmental area and to the substantia nigra that provide the dopaminergic input to the ventral and dorsal striata, respectively (Redgrave, Prescott, and Gurney, 1999). The nucleus accumbens also projects to a ventral pallidum, whose output is organized in a manner similar to that of the globus pallidus of the dorsal basal ganglia; its major projections are to the thalamus and brain stem (Groenewegen, Wright, and Beijer, 1996).

The thalamic projection of the ventral pallidum is to the mediodorsal nucleus. This nucleus is reciprocally connected with the limbic cortex. Thus, the same cortical regions that project to the nucleus accumbens receive feedback related to nucleus accumbens output. This pallidothalamocortical loop may have a cognitive function through its influence on the planning of subsequent actions (Floresco, Braaksma, and Phillips, 1999). These limbic cortical regions also project to the (pre) motor cortex, providing yet another avenue through which the ventral striatal system can influence the extrapyramidal motor system (Groenewegen and Uylings, 2000).

The projection from the ventral pallidum to the brain stem is primarily to the pedunculopontine tegmental nucleus, an interdigitated population of neurons with ascending and descending projections (Winn, Brown, and Inglis, 1997). The region of descending projections is sometimes called the mesencephalic locomotor region because of its projections to the spinal cord and to reticulospinal neurons and because electrical stimulation of this region elicits locomotion that varies with the intensity of stimulation. Thus, low-intensity stimulation elicits walking, whereas increasing levels elicit first a trot and then a gallop in four-legged animals. The region of ascending projections is sometimes called the midbrain extrapyramidal area because of its projections to the dorsal basal ganglia, which include an input to the dopaminergic neurons of the substantia nigra.

Thus, the ventral striatal circuit has two major routes to gain access to the motor system and influence behavioral output: directly, via projections to the brain stem, and indirectly, via interactions with the basal ganglia. Influences on the dorsal basal ganglia primarily involve modulation of its dopaminergic input but also include those mediated by the more extensive projections of the midbrain extrapyramidal area and those transmitted via the thalamocortical loop.

Cortical Contributions to Reinforcement

The major inputs to the nucleus accumbens (other than dopamine) originate in the limbic cortex, amygdala, and hippocampus. In addition to dense, convergent inputs to the nucleus accumbens, each of these structures is reciprocally connected with the other (Groenewegen and Uylings, 2000). Although a selective role in reinforcement has been shown for each, these interconnections illustrate the interdependent nature of the various aspects of reinforcement. For example, learning about the contingency of reward, to some degree depends on the context in which it is learned, whereas the reverse holds true for contextual learning.

The functions attributed to each of these regions have been based primarily on lesion studies, wherein a nucleus or a region is inactivated to allow an assessment of its contribution to behavior. From these studies, the limbic cortex has been implicated in forming the neural representations that underlie assessment of contingency (determining the relation between response and outcome) and goal status (the current value of the reward) (Balleine and Dickinson, 1998). The amygdala has been implicated in the process whereby affective value is attributed to events. The amygdala is actually composed of many nuclei, each implicated in its own set of emotional responses (Aggleton, 1992). In terms of appetitive behavior, the basolateral nucleus of the amygdala has received the most extensive study. The basolateral nucleus mediates the ability of conditioned stimuli to influence behavior (Parkinson, Cardinal, and Everitt, 2000). This is especially relevant in the context of drug abuse, wherein conditioned stimuli that predict drug availability can elicit craving and precipitate relapse, even after long periods of abstinence (Shalev, Grimm, and Shaham, 2002). The hippocampus has long been implicated in spatial and contextual learning. Lesions of this structure decrease exploratory locomotion and impair the ability of animals to navigate through the environment while searching for food and other rewards (Mogenson et al., 1993).

Thus, the allocortical inputs to nucleus accumbens are collectively involved in assessing the relevance of primary and conditioned stimuli and their relation to the required response. These are essential contributions to Pavlovian, instrumental, and contextual learning. Therefore, these inputs to the nucleus accumbens transmit the informational content relevant to reinforcement learning (Parkinson, Cardinal, and Everitt, 2000).

Information Flow Through the Nucleus Accumbens

Information flow through the nucleus accumbens appears to operate on the principle of disinhibition. The GABA neurons in the ventral pallidum have high rates of spontaneous activity that tonically inhibit their downstream targets in the mediodorsal nucleus and brainstem (Mogenson et al., 1993). The projection neurons in the nucleus accumbens are GABAergic but have very low rates of spontaneous activity. Thus, glutamatergic inputs from the allocortex excite nucleus accumbens neurons, causing them to fire and inhibit the pallidal neurons, which, in turn, disinhibit the projections from the thalamus and brain stem. An important aspect of this disinhibitory process is that it permits or releases behavior rather than initiating it directly. This process is consistent with the idea that the command is for a general direction or plan of behavior, rather than dictating the individual components.

Accordingly, the nucleus accumbens may serve as a selection filter, first matching the information about the nature of the stimulus and the response requirements from the allocortex with the motivational, or arousing, effects mediated by dopaminergic projections from the brain stem and then selecting the appropriate response strategy (Redgrave, Prescott, and Gurney, 1999). The ventral striatal circuit is well-suited to this task. The projection neurons in the nucleus accumbens are bistable—they exhibit two relatively stable states: a hyperpolarized "down" state that results in low levels of spontaneous activity and a more depolarized "up" state in which synaptic inputs easily trigger bursts of action potentials (O'Donnell et al., 1999).

The cellular function of dopamine varies with the state of the nucleus accumbens neuron such that dopamine may act as an excitatory, or facilitory, neurotransmitter when the neuron is in the up state or may exert an inhibitory effect when they are in the down state (O'Donnell et al., 1999). Such a mechanism would increase the output of the most active nucleus accumbens neurons and suppress activity in the remaining neurons. The effect would create a "winner-take-all" system where only one functionally related ensemble of neurons is active at a time. This can be conceptualized as different patterns of allocortical activity selecting or disinhibiting different channels that are maintained as they pass through the nucleus accumbens circuit on their way to the motor system to influence specific sets of behaviors.

Conclusion

We can now more precisely define reinforcement as a process that selectively potentiates goal-directed behavior, the form and direction of which depends upon learned associations relating the significance of predictive stimuli and environmental conditions. In terms of the neural circuits mediating reinforcement (see Figure 1), information about stimulus-reward associations is generated in the allocortex, where each part mediates a specific set of functions: The limbic cortex evaluates goal status and its relation to predictive cues, the amygdala generates the emotional value attached to primary reward and conditioned stimuli, whereas the hippocampus generates contextual representations of the environment. In the nucleus accumbens, these inputs are integrated with the arousing or motivational input subserved by dopaminergic neurons in the midbrain. The nucleus accumbens then selects the appropriate channel before transmitting its output to motor systems. Thus, reinforcement occurs when the nucleus accumbens integrates informational aspects of stimulus-reward associations with motivational arousal to select and initiate the behavioral strategy best suited to obtaining the goal.

See also:REINFORCEMENT

Bibliography

Aggleton, J. P. (1992). The amygdala: Neurobiological aspects of emotion, memory, and mental dysfunction. New York: Wiley-Liss.

Balleine, B. W., and Dickinson, A. (1998). Goal-directed instrumental action: contingency and incentive learning and their cortical substrates. Neuropharmacology 37, 407-419.

Di Chiara, G. (1998). A motivational learning hypothesis of the role of mesolimbic dopamine in compulsive drug use. Journal of Psychopharmacology 12, 54-67.

Floresco, S. B., Braaksma, D. N., and Phillips, A. G. (1999). Thalamic-cortical-striatal circuitry subserves working memory during delayed responding on a radial arm maze. Journal of Neuroscience 19, 11,061-11,071.

Groenewegen, H. J., and Uylings, H. B. (2000). The prefrontal cortex and the integration of sensory, limbic and autonomic information. Progress in Brain Research 126, 3-28.

Groenewegen, H. J., Wright, C. I., and Beijer, A. V. (1996). The nucleus accumbens: Gateway for limbic structures to reach the motor system? Progress in Brain Research 107, 485-511.

Ikemoto, S., and Panksepp, J. (1999). The role of nucleus accumbens dopamine in motivated behavior: A unifying interpretation with special reference to reward-seeking. Brain Research Reviews 31, 6-41.

Mink, J. W. (1996). The basal ganglia: Focused selection and inhibition of competing motor programs. Progress in Neurobiology 50, 381-425.

Mogenson, G. J., Brudzynski, S. M., Wu, M., Yang, C. R., and Yim, C. C. Y. (1993). From motivation to action: A review of dopaminergic regulation of limbic?nucleus accumbens?ventral pallidum?pedunculopontine nucleus circuitries involved in limbic motor integration. In P. W. Kalivas and C. D. Barnes, eds., Limbic motor circuits and neuropsychiatry. Boca Raton, FL: CRC Press.

O'Donnell, P., Greene, J., Pabello, N., Lewis, B. L., and Grace, A. A. (1999). Modulation of cell firing in the nucleus accumbens. Annals of the New York Academy of Sciences 877, 157-175.

Parkinson, J. A., Cardinal, R. N., and Everitt, B. J. (2000). Limbic cortical-ventral striatal systems underlying appetitive conditioning. Progress in Brain Research 126, 263-285.

Redgrave, P., Prescott, T. J., and Gurney, K. (1999). The basal ganglia: A vertebrate solution to the selection problem? Neuroscience 89, 1,009-1,023.

Salamone, J. D., Cousins, M. S., and Snyder, B. J. (1997). Behavioral functions of nucleus accumbens dopamine: Empirical and conceptual problems with the anhedonia hypothesis. Neuroscience and Biobehavioral Reviews 21, 341-359.

Shalev, U., Grimm, J. W., and Shaham, Y. (2002). Neurobiology of relapse to heroin and cocaine seeking: A review. Pharmacological Reviews 54, 1-42.

Watts, A. G., and Swanson, L. W. (2002). Anatomy of motivational systems. In H. Pashler and R. Gallistell, eds., Stevens's handbook of experimental psychology. New York: John Wiley.

Winn, P., Brown, V. J., and Inglis, W. L. (1997). On the relationships between the striatum and the pedunculopontine teg-mental nucleus. Critical Reviews in Neurobiology 11, 241-261.

Wise, R. A. (1997). Drug self-administration viewed as ingestive behaviour. Appetite 28, 1-5.

—— (1998). Drug-activation of brain reward pathways. Drug and Alcohol Dependence 51, 13-22.

Richard H.Thompson

Learning and Memory