Read by QxMD icon Read

model based model free reinforcement learning

Eva Friedel, Miriam Sebold, Sören Kuitunen-Paul, Stephan Nebe, Ilya M Veer, Ulrich S Zimmermann, Florian Schlagenhauf, Michael N Smolka, Michael Rapp, Henrik Walter, Andreas Heinz
Rationale: Advances in neurocomputational modeling suggest that valuation systems for goal-directed (deliberative) on one side, and habitual (automatic) decision-making on the other side may rely on distinct computational strategies for reinforcement learning, namely model-free vs. model-based learning. As a key theoretical difference, the model-based system strongly demands cognitive functions to plan actions prospectively based on an internal cognitive model of the environment, whereas valuation in the model-free system relies on rather simple learning rules from operant conditioning to retrospectively associate actions with their outcomes and is thus cognitively less demanding...
2017: Frontiers in Human Neuroscience
Valerie Voon, Andrea Reiter, Miriam Sebold, Stephanie Groman
We use parallel interacting goal-directed and habitual strategies to make our daily decisions. The arbitration between these strategies is relevant to inflexible repetitive behaviors in psychiatric disorders. Goal-directed control, also known as model-based control, is based on an affective outcome relying on a learned internal model to prospectively make decisions. In contrast, habit control, also known as model-free control, is based on an integration of previous reinforced learning autonomous of the current outcome value and is implicit and more efficient but at the cost of greater inflexibility...
April 23, 2017: Biological Psychiatry
Aaron M Bornstein, Kenneth A Norman
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event...
June 5, 2017: Nature Neuroscience
Asako Toyama, Kentaro Katahira, Hideki Ohira
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process...
June 1, 2017: Cognitive, Affective & Behavioral Neuroscience
Huaguang Zhang, Xiaohong Cui, Yanhong Luo, He Jiang
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon H∞ optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity...
March 1, 2017: IEEE Transactions on Neural Networks and Learning Systems
Elisa M Tartaglia, Aaron M Clarke, Michael H Herzog
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs...
2017: Frontiers in Psychology
Alexandre Pitti, Philippe Gaussier, Mathias Quoy
The intra-parietal lobe coupled with the Basal Ganglia forms a working memory that demonstrates strong planning capabilities for generating robust yet flexible neuronal sequences. Neurocomputational models however, often fails to control long range neural synchrony in recurrent spiking networks due to spontaneous activity. As a novel framework based on the free-energy principle, we propose to see the problem of spikes' synchrony as an optimization problem of the neurons sub-threshold activity for the generation of long neuronal chains...
2017: PloS One
Hamidreza Modares, Frank L Lewis, Zhong-Ping Jiang
A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem...
September 22, 2016: IEEE Transactions on Cybernetics
Zsolt Turi, Matthias Mittner, Walter Paulus, Andrea Antal
According to the placebo-reward hypothesis, placebo is a reward-anticipation process that increases midbrain dopamine (DA) levels. Reward-based learning processes, such as reinforcement learning, involves a large part of the DA-ergic network that is also activated by the placebo intervention. Given the neurochemical overlap between placebo and reward learning, we investigated whether verbal instructions in conjunction with a placebo intervention are capable of enhancing reward learning in healthy individuals by using a monetary reward-based reinforcement-learning task...
January 23, 2017: Scientific Reports
Sebastian Gluth, Jared M Hotaling, Jörg Rieskamp
Classical economic theory contends that the utility of a choice option should be independent of other options. This view is challenged by the attraction effect, in which the relative preference between two options is altered by the addition of a third, asymmetrically dominated option. Here, we leveraged the attraction effect in the context of intertemporal choices to test whether both decisions and reward prediction errors (RPE) in the absence of choice violate the independence of irrelevant alternatives principle...
January 11, 2017: Journal of Neuroscience: the Official Journal of the Society for Neuroscience
I El Naqa, R Ten
PURPOSE: There is tremendous excitement in radiotherapy about applying data-driven methods to develop personalized clinical decisions for real-time response-based adaptation. However, classical statistical learning methods lack in terms of efficiency and ability to predict outcomes under conditions of uncertainty and incomplete information. Therefore, we are investigating physics-inspired machine learning approaches by utilizing quantum principles for developing a robust framework to dynamically adapt treatments to individual patient's characteristics and optimize outcomes...
June 2016: Medical Physics
Sebastian Gluth, Jared M Hotaling, Jörg Rieskamp
Classical economic theory contends that the utility of a choice option should be independent of other options. This view is challenged by the attraction effect, in which the relative preference between two options is altered by the addition of a third, asymmetrically dominated option. Here, we leveraged the attraction effect in the context of intertemporal choices to test whether both decisions and reward prediction errors (RPE)-in the absence of choice-violate the independence of irrelevant alternatives principle...
December 1, 2016: Journal of Neuroscience: the Official Journal of the Society for Neuroscience
Tracey C S Potter, Nessa V Bryce, Catherine A Hartley
Reinforcement learning theory distinguishes "model-free" learning, which fosters reflexive repetition of previously rewarded actions, from "model-based" learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice...
June 2017: Developmental Cognitive Neuroscience
Judit Zsuga, Klara Biro, Gabor Tajti, Magdolna Emma Szilasi, Csaba Papp, Bela Juhasz, Rudolf Gesztelyi
BACKGROUND: Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability...
October 28, 2016: BMC Neuroscience
Jason M Scimeca, Perri L Katzman, David Badre
Adaptive memory requires context-dependent control over how information is retrieved, evaluated and used to guide action, yet the signals that drive adjustments to memory decisions remain unknown. Here we show that prediction errors (PEs) coded by the striatum support control over memory decisions. Human participants completed a recognition memory test that incorporated biased feedback to influence participants' recognition criterion. Using model-based fMRI, we find that PEs-the deviation between the outcome and expected value of a memory decision-correlate with striatal activity and predict individuals' final criterion...
October 7, 2016: Nature Communications
John P O'Doherty, Jeffrey Cockburn, Wolfgang M Pauli
In this review, we summarize findings supporting the existence of multiple behavioral strategies for controlling reward-related behavior, including a dichotomy between the goal-directed or model-based system and the habitual or model-free system in the domain of instrumental conditioning and a similar dichotomy in the realm of Pavlovian conditioning. We evaluate evidence from neuroscience supporting the existence of at least partly distinct neuronal substrates contributing to the key computations necessary for the function of these different control systems...
January 3, 2017: Annual Review of Psychology
Wouter Kool, Fiery A Cushman, Samuel J Gershman
Many accounts of decision making and reinforcement learning posit the existence of two distinct systems that control choice: a fast, automatic system and a slow, deliberative system. Recent research formalizes this distinction by mapping these systems to "model-free" and "model-based" strategies in reinforcement learning. Model-free strategies are computationally cheap, but sometimes inaccurate, because action values can be accessed by inspecting a look-up table constructed through trial-and-error. In contrast, model-based strategies compute action values through planning in a causal model of the environment, which is more accurate but also more cognitively demanding...
August 2016: PLoS Computational Biology
Arkady Konovalov, Ian Krajbich
Organisms appear to learn and make decisions using different strategies known as model-free and model-based learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Here using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset...
August 11, 2016: Nature Communications
Gautam Reddy, Antonio Celani, Terrence J Sejnowski, Massimo Vergassola
Birds and gliders exploit warm, rising atmospheric currents (thermals) to reach heights comparable to low-lying clouds with a reduced expenditure of energy. This strategy of flight (thermal soaring) is frequently used by migratory birds. Soaring provides a remarkable instance of complex decision making in biology and requires a long-term strategy to effectively use the ascending thermals. Furthermore, the problem is technologically relevant to extend the flying range of autonomous gliders. Thermal soaring is commonly observed in the atmospheric convective boundary layer on warm, sunny days...
August 16, 2016: Proceedings of the National Academy of Sciences of the United States of America
Martin V Butz
This paper proposes how various disciplinary theories of cognition may be combined into a unifying, sub-symbolic, computational theory of cognition. The following theories are considered for integration: psychological theories, including the theory of event coding, event segmentation theory, the theory of anticipatory behavioral control, and concept development; artificial intelligence and machine learning theories, including reinforcement learning and generative artificial neural networks; and theories from theoretical and computational neuroscience, including predictive coding and free energy-based inference...
2016: Frontiers in Psychology
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"