collection
MENU ▼
Read by QxMD icon Read
search

Reinforcement Learning

shared collection
24 papers 0 to 25 followers
By Abraham Nunes Psychiatry resident interested in computational neuroscience, forensic psychiatry, and neuropsychiatry.
https://www.readbyqxmd.com/read/25122479/optimal-behavioral-hierarchy
#1
Alec Solway, Carlos Diuk, Natalia Córdova, Debbie Yee, Andrew G Barto, Yael Niv, Matthew M Botvinick
Human behavior has long been recognized to display hierarchical structure: actions fit together into subtasks, which cohere into extended goal-directed activities. Arranging actions hierarchically has well established benefits, allowing behaviors to be represented efficiently by the brain, and allowing solutions to new tasks to be discovered easily. However, these payoffs depend on the particular way in which actions are organized into a hierarchy, the specific way in which tasks are carved up into subtasks...
August 2014: PLoS Computational Biology
https://www.readbyqxmd.com/read/27589489/learning-reward-uncertainty-in-the-basal-ganglia
#2
John G Mikhael, Rafal Bogacz
Learning the reliability of different sources of rewards is critical for making optimal choices. However, despite the existence of detailed theory describing how the expected reward is learned in the basal ganglia, it is not known how reward uncertainty is estimated in these circuits. This paper presents a class of models that encode both the mean reward and the spread of the rewards, the former in the difference between the synaptic weights of D1 and D2 neurons, and the latter in their sum. In the models, the tendency to seek (or avoid) options with variable reward can be controlled by increasing (or decreasing) the tonic level of dopamine...
September 2016: PLoS Computational Biology
https://www.readbyqxmd.com/read/25267822/model-based-hierarchical-reinforcement-learning-and-human-action-control
#3
Matthew Botvinick, Ari Weinstein
Recent work has reawakened interest in goal-directed or 'model-based' choice, where decisions are based on prospective evaluation of potential action outcomes. Concurrently, there has been growing attention to the role of hierarchy in decision-making and action control. We focus here on the intersection between these two areas of interest, considering the topic of hierarchical model-based control. To characterize this form of action control, we draw on the computational framework of hierarchical reinforcement learning, using this to interpret recent empirical findings...
November 5, 2014: Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences
https://www.readbyqxmd.com/read/26851575/measuring-wanting-and-liking-from-animals-to-humans-a-systematic-review
#4
REVIEW
Eva Pool, Vanessa Sennwald, Sylvain Delplanque, Tobias Brosch, David Sander
Animal research has shown it is possible to want a reward that is not liked once obtained. Although these findings have elicited interest, human experiments have produced contradictory results, raising doubts about the existence of separate wanting and liking influences in human reward processing. This discrepancy could be due to inconsistences in the operationalization of these concepts. We systematically reviewed the methodologies used to assess human wanting and/or liking and found that most studies operationalized these concepts in congruency with the animal literature...
April 2016: Neuroscience and Biobehavioral Reviews
https://www.readbyqxmd.com/read/25734662/a-spiking-neural-network-model-of-model-free-reinforcement-learning-with-high-dimensional-sensory-input-and-perceptual-ambiguity
#5
Takashi Nakano, Makoto Otsuka, Junichiro Yoshimoto, Kenji Doya
A theoretical framework of reinforcement learning plays an important role in understanding action selection in animals. Spiking neural networks provide a theoretically grounded means to test computational hypotheses on neurally plausible algorithms of reinforcement learning through numerical simulation. However, most of these models cannot handle observations which are noisy, or occurred in the past, even though these are inevitable and constraining features of learning in real environments. This class of problem is formally known as partially observable reinforcement learning (PORL) problems...
2015: PloS One
https://www.readbyqxmd.com/read/26394333/convergence-of-eeg-and-fmri-measures-of-reward-anticipation
#6
Stephanie M Gorka, K Luan Phan, Stewart A Shankman
Deficits in reward anticipation are putative mechanisms for multiple psychopathologies. Research indicates that these deficits are characterized by reduced left (relative to right) frontal electroencephalogram (EEG) activity and blood oxygenation level-dependent (BOLD) signal abnormalities in mesolimbic and prefrontal neural regions during reward anticipation. Although it is often assumed that these two measures capture similar mechanisms, no study to our knowledge has directly examined the convergence between frontal EEG alpha asymmetry and functional magnetic resonance imaging (fMRI) during reward anticipation in the same sample...
December 2015: Biological Psychology
https://www.readbyqxmd.com/read/26095906/computing-reward-prediction-error-an-integrated-account-of-cortical-timing-and-basal-ganglia-pathways-for-appetitive-and-aversive-learning
#7
Kenji Morita, Yasuo Kawaguchi
There are two prevailing notions regarding the involvement of the corticobasal ganglia system in value-based learning: (i) the direct and indirect pathways of the basal ganglia are crucial for appetitive and aversive learning, respectively, and (ii) the activity of midbrain dopamine neurons represents reward-prediction error. Although (ii) constitutes a critical assumption of (i), it remains elusive how (ii) holds given (i), with the basal-ganglia influence on the dopamine neurons. Here we present a computational neural-circuit model that potentially resolves this issue...
August 2015: European Journal of Neuroscience
https://www.readbyqxmd.com/read/26220740/differential-contributions-of-the-globus-pallidus-and-ventral-thalamus-to-stimulus-response-learning-in-humans
#8
Henning Schroll, Andreas Horn, Christine Gröschel, Christof Brücke, Götz Lütjens, Gerd-Helge Schneider, Joachim K Krauss, Andrea A Kühn, Fred H Hamker
The ability to learn associations between stimuli, responses and rewards is a prerequisite for survival. Models of reinforcement learning suggest that the striatum, a basal ganglia input nucleus, vitally contributes to these learning processes. Our recently presented computational model predicts, first, that not only the striatum, but also the globus pallidus contributes to the learning (i.e., exploration) of stimulus-response associations based on rewards. Secondly, it predicts that the stable execution (i...
November 15, 2015: NeuroImage
https://www.readbyqxmd.com/read/26379600/reduction-in-ventral-striatal-activity-when-anticipating-a-reward-in-depression-and-schizophrenia-a-replicated-cross-diagnostic-finding
#9
Gonzalo Arrondo, Nuria Segarra, Antonio Metastasio, Hisham Ziauddeen, Jennifer Spencer, Niels R Reinders, Robert B Dudas, Trevor W Robbins, Paul C Fletcher, Graham K Murray
In the research domain framework (RDoC), dysfunctional reward expectation has been proposed to be a cross-diagnostic domain in psychiatry, which may contribute to symptoms common to various neuropsychiatric conditions, such as anhedonia or apathy/avolition. We used a modified version of the Monetary Incentive Delay (MID) paradigm to obtain functional MRI images from 22 patients with schizophrenia, 24 with depression and 21 controls. Anhedonia and other symptoms of depression, and overall positive and negative symptomatology were also measured...
2015: Frontiers in Psychology
https://www.readbyqxmd.com/read/26379239/model-based-reasoning-in-humans-becomes-automatic-with-training
#10
Marcos Economides, Zeb Kurth-Nelson, Annika Lübbert, Marc Guitart-Masip, Raymond J Dolan
Model-based and model-free reinforcement learning (RL) have been suggested as algorithmic realizations of goal-directed and habitual action strategies. Model-based RL is more flexible than model-free but requires sophisticated calculations using a learnt model of the world. This has led model-based RL to be identified with slow, deliberative processing, and model-free RL with fast, automatic processing. In support of this distinction, it has recently been shown that model-based reasoning is impaired by placing subjects under cognitive load--a hallmark of non-automaticity...
September 2015: PLoS Computational Biology
https://www.readbyqxmd.com/read/26379518/modeling-choice-and-reaction-time-during-arbitrary-visuomotor-learning-through-the-coordination-of-adaptive-working-memory-and-reinforcement-learning
#11
Guillaume Viejo, Mehdi Khamassi, Andrea Brovelli, Benoît Girard
Current learning theory provides a comprehensive description of how humans and other animals learn, and places behavioral flexibility and automaticity at heart of adaptive behaviors. However, the computations supporting the interactions between goal-directed and habitual decision-making systems are still poorly understood. Previous functional magnetic resonance imaging (fMRI) results suggest that the brain hosts complementary computations that may differentially support goal-directed and habitual processes in the form of a dynamical interplay rather than a serial recruitment of strategies...
2015: Frontiers in Behavioral Neuroscience
https://www.readbyqxmd.com/read/26339919/a-new-computational-account-of-cognitive-control-over-reinforcement-based-decision-making-modeling-of-a-probabilistic-learning-task
#12
Sareh Zendehrouh
Recent work on decision-making field offers an account of dual-system theory for decision-making process. This theory holds that this process is conducted by two main controllers: a goal-directed system and a habitual system. In the reinforcement learning (RL) domain, the habitual behaviors are connected with model-free methods, in which appropriate actions are learned through trial-and-error experiences. However, goal-directed behaviors are associated with model-based methods of RL, in which actions are selected using a model of the environment...
November 2015: Neural Networks: the Official Journal of the International Neural Network Society
https://www.readbyqxmd.com/read/26361052/psychology-of-habit
#13
REVIEW
Wendy Wood, Dennis Rünger
As the proverbial creatures of habit, people tend to repeat the same behaviors in recurring contexts. This review characterizes habits in terms of their cognitive, motivational, and neurobiological properties. In so doing, we identify three ways that habits interface with deliberate goal pursuit: First, habits form as people pursue goals by repeating the same responses in a given context. Second, as outlined in computational models, habits and deliberate goal pursuit guide actions synergistically, although habits are the efficient, default mode of response...
2016: Annual Review of Psychology
https://www.readbyqxmd.com/read/26322583/arithmetic-and-local-circuitry-underlying-dopamine-prediction-errors
#14
Neir Eshel, Michael Bukwich, Vinod Rao, Vivian Hemmelder, Ju Tian, Naoshige Uchida
Dopamine neurons are thought to facilitate learning by comparing actual and expected reward. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain...
September 10, 2015: Nature
https://www.readbyqxmd.com/read/26321934/anticipatory-pleasure-predicts-effective-connectivity-in-the-mesolimbic-system
#15
Zhi Li, Chao Yan, Wei-Zhen Xie, Ke Li, Ya-Wei Zeng, Zhen Jin, Eric F C Cheung, Raymond C K Chan
Convergent evidence suggests the important role of the mesolimbic pathway in anticipating monetary rewards. However, the underlying mechanism of how the sub-regions interact with each other is still not clearly understood. Using dynamic causal modeling, we constructed a reward-related network for anticipating monetary reward using the Monetary Incentive Delay Task. Twenty-six healthy adolescents (Female/Male = 11/15; age = 18.69 ± 1.35 years; education = 12 ± 1.58 years) participated in the present study...
2015: Frontiers in Behavioral Neuroscience
https://www.readbyqxmd.com/read/26317249/exploration-exploitation-a-cognitive-dilemma-still-unresolved
#16
COMMENT
Russell N James
The solution to the exploration-exploitation dilemma presented essentially subsumes exploitation into an information-maximizing model. Such a single-maximization model is shown to be (1) more tractable than the initial dual-maximization dilemma, (2) useful in modeling information-maximizing subsystems, and (3) profitably applied in artificial simulations where exploration is costless. However, the model fails to resolve the dilemma in ethological or practical circumstances with objective outcomes, such as inclusive fitness, rather than information outcomes, such as lack of surprise...
2015: Cognitive Neuroscience
https://www.readbyqxmd.com/read/26276036/neurophysiology-of-reward-guided-behavior-correlates-related-to-predictions-value-motivation-errors-attention-and-action
#17
REVIEW
Gregory B Bissonette, Matthew R Roesch
Many brain areas are activated by the possibility and receipt of reward. Are all of these brain areas reporting the same information about reward? Or are these signals related to other functions that accompany reward-guided learning and decision-making? Through carefully controlled behavioral studies, it has been shown that reward-related activity can represent reward expectations related to future outcomes, errors in those expectations, motivation, and signals related to goal- and habit-driven behaviors. These dissociations have been accomplished by manipulating the predictability of positively and negatively valued events...
2016: Current Topics in Behavioral Neurosciences
https://www.readbyqxmd.com/read/26257618/avoidance-learning-a-review-of-theoretical-models-and-recent-developments
#18
REVIEW
Angelos-Miltiadis Krypotos, Marieke Effting, Merel Kindt, Tom Beckers
Avoidance is a key characteristic of adaptive and maladaptive fear. Here, we review past and contemporary theories of avoidance learning. Based on the theories, experimental findings and clinical observations reviewed, we distill key principles of how adaptive and maladaptive avoidance behavior is acquired and maintained. We highlight clinical implications of avoidance learning theories and describe intervention strategies that could reduce maladaptive avoidance and prevent its return. We end with a brief overview of recent developments and avenues for further research...
2015: Frontiers in Behavioral Neuroscience
https://www.readbyqxmd.com/read/26237363/instrumental-learning-of-traits-versus-rewards-dissociable-neural-correlates-and-effects-on-choice
#19
COMPARATIVE STUDY
Leor M Hackel, Bradley B Doll, David M Amodio
Humans learn about people and objects through positive and negative experiences, yet they can also look beyond the immediate reward of an interaction to encode trait-level attributes. We found that perceivers encoded both reward and trait-level information through feedback in an instrumental learning task, but relied more heavily on trait representations in cross-context decisions. Both learning types implicated ventral striatum, but trait learning also recruited a network associated with social impression formation...
September 2015: Nature Neuroscience
https://www.readbyqxmd.com/read/26192748/automatic-integration-of-confidence-in-the-brain-valuation-signal
#20
Maël Lebreton, Raphaëlle Abitbol, Jean Daunizeau, Mathias Pessiglione
A key process in decision-making is estimating the value of possible outcomes. Growing evidence suggests that different types of values are automatically encoded in the ventromedial prefrontal cortex (VMPFC). Here we extend this idea by suggesting that any overt judgment is accompanied by a second-order valuation (a confidence estimate), which is also automatically incorporated in VMPFC activity. In accordance with the predictions of our normative model of rating tasks, two behavioral experiments showed that confidence levels were quadratically related to first-order judgments (age, value or probability ratings)...
August 2015: Nature Neuroscience
label_collection
label_collection
5371
1
2
2015-09-14 19:25:39
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"