Read by QxMD icon Read

model based model free reinforcement learning

Angela J Langdon, Melissa J Sharpe, Geoffrey Schoenbaum, Yael Niv
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning...
October 30, 2017: Current Opinion in Neurobiology
Katherine E Twomey, Gert Westermann
Infants are curious learners who drive their own cognitive development by imposing structure on their learning environment as they explore. Understanding the mechanisms by which infants structure their own learning is therefore critical to our understanding of development. Here we propose an explicit mechanism for intrinsically motivated information selection that maximizes learning. We first present a neurocomputational model of infant visual category learning, capturing existing empirical data on the role of environmental complexity on learning...
October 26, 2017: Developmental Science
Jaron T Colas, Wolfgang M Pauli, Tobias Larsen, J Michael Tyszka, John P O'Doherty
Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e...
October 2017: PLoS Computational Biology
Julie J Lee, Mehdi Keramati
Decision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter...
September 2017: PLoS Computational Biology
Evan M Russek, Ida Momennejad, Matthew M Botvinick, Samuel J Gershman, Nathaniel D Daw
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning...
September 2017: PLoS Computational Biology
Melissa J Sharpe, Hannah M Batchelor, Geoffrey Schoenbaum
Sensory preconditioning has been used to implicate midbrain dopamine in model-based learning, contradicting the view that dopamine transients reflect model-free value. However, it has been suggested that model-free value might accrue directly to the preconditioned cue through mediated learning. Here, building on previous work (Sadacca et al., 2016), we address this question by testing whether a preconditioned cue will support conditioned reinforcement in rats. We found that while both directly conditioned and second-order conditioned cues supported robust conditioned reinforcement, a preconditioned cue did not...
September 19, 2017: ELife
Nils Kolling, Thomas Akam
Foraging effectively is critical to the survival of all animals and this imperative is thought to have profoundly shaped brain evolution. Decisions made by foraging animals often approximate optimal strategies, but the learning and decision mechanisms generating these choices remain poorly understood. Recent work with laboratory foraging tasks in humans suggest their behaviour is poorly explained by model-free reinforcement learning, with simple heuristic strategies better describing behaviour in some tasks, and in others evidence of prospective prediction of the future state of the environment...
October 2017: Current Opinion in Neurobiology
Regina Padmanabhan, Nader Meskin, Wassim M Haddad
The increasing threat of cancer to human life and the improvement in survival rate of this disease due to effective treatment has promoted research in various related fields. This research has shaped clinical trials and emphasized the necessity to properly schedule cancer chemotherapy to ensure effective and safe treatment. Most of the control methodologies proposed for cancer chemotherapy scheduling treatment are model-based. In this paper, a reinforcement learning (RL)-based, model-free method is proposed for the closed-loop control of cancer chemotherapy drug dosing...
August 16, 2017: Mathematical Biosciences
Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto
We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback-Leibler divergence between current and updated policies...
June 29, 2017: Neural Networks: the Official Journal of the International Neural Network Society
Wouter Kool, Samuel J Gershman, Fiery A Cushman
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits...
July 1, 2017: Psychological Science
Heyeon Park, Daeyeol Lee, Jeanyung Chey
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i...
2017: PloS One
Eva Friedel, Miriam Sebold, Sören Kuitunen-Paul, Stephan Nebe, Ilya M Veer, Ulrich S Zimmermann, Florian Schlagenhauf, Michael N Smolka, Michael Rapp, Henrik Walter, Andreas Heinz
Rationale: Advances in neurocomputational modeling suggest that valuation systems for goal-directed (deliberative) on one side, and habitual (automatic) decision-making on the other side may rely on distinct computational strategies for reinforcement learning, namely model-free vs. model-based learning. As a key theoretical difference, the model-based system strongly demands cognitive functions to plan actions prospectively based on an internal cognitive model of the environment, whereas valuation in the model-free system relies on rather simple learning rules from operant conditioning to retrospectively associate actions with their outcomes and is thus cognitively less demanding...
2017: Frontiers in Human Neuroscience
Valerie Voon, Andrea Reiter, Miriam Sebold, Stephanie Groman
We use parallel interacting goal-directed and habitual strategies to make our daily decisions. The arbitration between these strategies is relevant to inflexible repetitive behaviors in psychiatric disorders. Goal-directed control, also known as model-based control, is based on an affective outcome relying on a learned internal model to prospectively make decisions. In contrast, habit control, also known as model-free control, is based on an integration of previous reinforced learning autonomous of the current outcome value and is implicit and more efficient but at the cost of greater inflexibility...
April 23, 2017: Biological Psychiatry
Aaron M Bornstein, Kenneth A Norman
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event...
July 2017: Nature Neuroscience
Asako Toyama, Kentaro Katahira, Hideki Ohira
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process...
June 1, 2017: Cognitive, Affective & Behavioral Neuroscience
Huaguang Zhang, Xiaohong Cui, Yanhong Luo, He Jiang
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon H∞ optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity...
March 1, 2017: IEEE Transactions on Neural Networks and Learning Systems
Elisa M Tartaglia, Aaron M Clarke, Michael H Herzog
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs...
2017: Frontiers in Psychology
Alexandre Pitti, Philippe Gaussier, Mathias Quoy
The intra-parietal lobe coupled with the Basal Ganglia forms a working memory that demonstrates strong planning capabilities for generating robust yet flexible neuronal sequences. Neurocomputational models however, often fails to control long range neural synchrony in recurrent spiking networks due to spontaneous activity. As a novel framework based on the free-energy principle, we propose to see the problem of spikes' synchrony as an optimization problem of the neurons sub-threshold activity for the generation of long neuronal chains...
2017: PloS One
Hamidreza Modares, Frank L Lewis, Zhong-Ping Jiang
A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem...
November 2016: IEEE Transactions on Cybernetics
Zsolt Turi, Matthias Mittner, Walter Paulus, Andrea Antal
According to the placebo-reward hypothesis, placebo is a reward-anticipation process that increases midbrain dopamine (DA) levels. Reward-based learning processes, such as reinforcement learning, involves a large part of the DA-ergic network that is also activated by the placebo intervention. Given the neurochemical overlap between placebo and reward learning, we investigated whether verbal instructions in conjunction with a placebo intervention are capable of enhancing reward learning in healthy individuals by using a monetary reward-based reinforcement-learning task...
January 23, 2017: Scientific Reports
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"