Read by QxMD icon Read

model based model free reinforcement learning

Jane X Wang, Zeb Kurth-Nelson, Dharshan Kumaran, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Demis Hassabis, Matthew Botvinick
Over the past 20 years, neuroscience research on reward-based learning has converged on a canonical model, under which the neurotransmitter dopamine 'stamps in' associations between situations, actions and rewards by modulating the strength of synaptic connections between neurons. However, a growing number of recent findings have placed this standard model under strain. We now draw on recent advances in artificial intelligence to introduce a new theory of reward-based learning. Here, the dopamine system trains another part of the brain, the prefrontal cortex, to operate as its own free-standing learning system...
May 14, 2018: Nature Neuroscience
Thomas D Sambrook, Ben Hardwick, Andy J Wills, Jeremy Goslin
Learning theorists posit two reinforcement learning systems: model-free and model-based. Model-based learning incorporates knowledge about structure and contingencies in the world to assign candidate actions with an expected value. Model-free learning is ignorant of the world's structure; instead, actions hold a value based on prior reinforcement, with this value updated by expectancy violation in the form of a reward prediction error. Because they use such different learning mechanisms, it has been previously assumed that model-based and model-free learning are computationally dissociated in the brain...
May 11, 2018: NeuroImage
Wouter Kool, Samuel J Gershman, Fiery A Cushman
Decision-making algorithms face a basic tradeoff between accuracy and effort (i.e., computational demands). It is widely agreed that humans can choose between multiple decision-making processes that embody different solutions to this tradeoff: Some are computationally cheap but inaccurate, whereas others are computationally expensive but accurate. Recent progress in understanding this tradeoff has been catalyzed by formalizing it in terms of model-free (i.e., habitual) versus model-based (i.e., planning) approaches to reinforcement learning...
April 18, 2018: Journal of Cognitive Neuroscience
Dave McKenney, Tony White
Background: Complex networks are found in many domains and the control of these networks is a research topic that continues to draw increasing attention. This paper proposes a method of network control that attempts to maintain a specified target distribution of the network state. In contrast to many existing network control research works, which focus exclusively on structural analysis of the network, this paper also accounts for user actions/behaviours within the network control problem...
2018: Computational social networks
Benedicte M Babayan, Aurélie Watilliaux, Guillaume Viejo, Anne-Lise Paradis, Benoît Girard, Laure Rondi-Reig
How do we translate self-motion into goal-directed actions? Here we investigate the cognitive architecture underlying self-motion processing during exploration and goal-directed behaviour. The task, performed in an environment with limited and ambiguous external landmarks, constrained mice to use self-motion based information for sequence-based navigation. The post-behavioural analysis combined brain network characterization based on c-Fos imaging and graph theory analysis as well as computational modelling of the learning process...
December 19, 2017: Scientific Reports
Angela J Langdon, Melissa J Sharpe, Geoffrey Schoenbaum, Yael Niv
Phasic dopamine responses are thought to encode a prediction-error signal consistent with model-free reinforcement learning theories. However, a number of recent findings highlight the influence of model-based computations on dopamine responses, and suggest that dopamine prediction errors reflect more dimensions of an expected outcome than scalar reward value. Here, we review a selection of these recent results and discuss the implications and complications of model-based predictions for computational theories of dopamine and learning...
April 2018: Current Opinion in Neurobiology
Katherine E Twomey, Gert Westermann
Infants are curious learners who drive their own cognitive development by imposing structure on their learning environment as they explore. Understanding the mechanisms by which infants structure their own learning is therefore critical to our understanding of development. Here we propose an explicit mechanism for intrinsically motivated information selection that maximizes learning. We first present a neurocomputational model of infant visual category learning, capturing existing empirical data on the role of environmental complexity on learning...
October 26, 2017: Developmental Science
Jaron T Colas, Wolfgang M Pauli, Tobias Larsen, J Michael Tyszka, John P O'Doherty
Prediction-error signals consistent with formal models of "reinforcement learning" (RL) have repeatedly been found within dopaminergic nuclei of the midbrain and dopaminoceptive areas of the striatum. However, the precise form of the RL algorithms implemented in the human brain is not yet well determined. Here, we created a novel paradigm optimized to dissociate the subtypes of reward-prediction errors that function as the key computational signatures of two distinct classes of RL models-namely, "actor/critic" models and action-value-learning models (e...
October 2017: PLoS Computational Biology
Julie J Lee, Mehdi Keramati
Decision-making in the real world presents the challenge of requiring flexible yet prompt behavior, a balance that has been characterized in terms of a trade-off between a slower, prospective goal-directed model-based (MB) strategy and a fast, retrospective habitual model-free (MF) strategy. Theory predicts that flexibility to changes in both reward values and transition contingencies can determine the relative influence of the two systems in reinforcement learning, but few studies have manipulated the latter...
September 2017: PLoS Computational Biology
Evan M Russek, Ida Momennejad, Matthew M Botvinick, Samuel J Gershman, Nathaniel D Daw
Humans and animals are capable of evaluating actions by considering their long-run future rewards through a process described using model-based reinforcement learning (RL) algorithms. The mechanisms by which neural circuits perform the computations prescribed by model-based RL remain largely unknown; however, multiple lines of evidence suggest that neural circuits supporting model-based behavior are structurally homologous to and overlapping with those thought to carry out model-free temporal difference (TD) learning...
September 2017: PLoS Computational Biology
Melissa J Sharpe, Hannah M Batchelor, Geoffrey Schoenbaum
Sensory preconditioning has been used to implicate midbrain dopamine in model-based learning, contradicting the view that dopamine transients reflect model-free value. However, it has been suggested that model-free value might accrue directly to the preconditioned cue through mediated learning. Here, building on previous work (Sadacca et al., 2016), we address this question by testing whether a preconditioned cue will support conditioned reinforcement in rats. We found that while both directly conditioned and second-order conditioned cues supported robust conditioned reinforcement, a preconditioned cue did not...
September 19, 2017: ELife
Nils Kolling, Thomas Akam
Foraging effectively is critical to the survival of all animals and this imperative is thought to have profoundly shaped brain evolution. Decisions made by foraging animals often approximate optimal strategies, but the learning and decision mechanisms generating these choices remain poorly understood. Recent work with laboratory foraging tasks in humans suggest their behaviour is poorly explained by model-free reinforcement learning, with simple heuristic strategies better describing behaviour in some tasks, and in others evidence of prospective prediction of the future state of the environment...
October 2017: Current Opinion in Neurobiology
Regina Padmanabhan, Nader Meskin, Wassim M Haddad
The increasing threat of cancer to human life and the improvement in survival rate of this disease due to effective treatment has promoted research in various related fields. This research has shaped clinical trials and emphasized the necessity to properly schedule cancer chemotherapy to ensure effective and safe treatment. Most of the control methodologies proposed for cancer chemotherapy scheduling treatment are model-based. In this paper, a reinforcement learning (RL)-based, model-free method is proposed for the closed-loop control of cancer chemotherapy drug dosing...
November 2017: Mathematical Biosciences
Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto
We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback-Leibler divergence between current and updated policies...
October 2017: Neural Networks: the Official Journal of the International Neural Network Society
Wouter Kool, Samuel J Gershman, Fiery A Cushman
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits...
July 1, 2017: Psychological Science
Heyeon Park, Daeyeol Lee, Jeanyung Chey
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i...
2017: PloS One
Eva Friedel, Miriam Sebold, Sören Kuitunen-Paul, Stephan Nebe, Ilya M Veer, Ulrich S Zimmermann, Florian Schlagenhauf, Michael N Smolka, Michael Rapp, Henrik Walter, Andreas Heinz
Rationale: Advances in neurocomputational modeling suggest that valuation systems for goal-directed (deliberative) on one side, and habitual (automatic) decision-making on the other side may rely on distinct computational strategies for reinforcement learning, namely model-free vs. model-based learning. As a key theoretical difference, the model-based system strongly demands cognitive functions to plan actions prospectively based on an internal cognitive model of the environment, whereas valuation in the model-free system relies on rather simple learning rules from operant conditioning to retrospectively associate actions with their outcomes and is thus cognitively less demanding...
2017: Frontiers in Human Neuroscience
Valerie Voon, Andrea Reiter, Miriam Sebold, Stephanie Groman
We use parallel interacting goal-directed and habitual strategies to make our daily decisions. The arbitration between these strategies is relevant to inflexible repetitive behaviors in psychiatric disorders. Goal-directed control, also known as model-based control, is based on an affective outcome relying on a learned internal model to prospectively make decisions. In contrast, habit control, also known as model-free control, is based on an integration of previous reinforced learning autonomous of the current outcome value and is implicit and more efficient but at the cost of greater inflexibility...
September 15, 2017: Biological Psychiatry
Aaron M Bornstein, Kenneth A Norman
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event...
July 2017: Nature Neuroscience
Asako Toyama, Kentaro Katahira, Hideki Ohira
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process...
August 2017: Cognitive, Affective & Behavioral Neuroscience
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"