keyword
MENU ▼
Read by QxMD icon Read
search

model based model free reinforcement learning

keyword
https://www.readbyqxmd.com/read/28918312/-reinforcement-learning-to-forage-optimally
#1
REVIEW
Nils Kolling, Thomas Akam
Foraging effectively is critical to the survival of all animals and this imperative is thought to have profoundly shaped brain evolution. Decisions made by foraging animals often approximate optimal strategies, but the learning and decision mechanisms generating these choices remain poorly understood. Recent work with laboratory foraging tasks in humans suggest their behaviour is poorly explained by model-free reinforcement learning, with simple heuristic strategies better describing behaviour in some tasks, and in others evidence of prospective prediction of the future state of the environment...
September 14, 2017: Current Opinion in Neurobiology
https://www.readbyqxmd.com/read/28822813/reinforcement-learning-based-control-of-drug-dosing-for-cancer-chemotherapy-treatment
#2
Regina Padmanabhan, Nader Meskin, Wassim M Haddad
The increasing threat of cancer to human life and the improvement in survival rate of this disease due to effective treatment has promoted research in various related fields. This research has shaped clinical trials and emphasized the necessity to properly schedule cancer chemotherapy to ensure effective and safe treatment. Most of the control methodologies proposed for cancer chemotherapy scheduling treatment are model-based. In this paper, a reinforcement learning (RL)-based, model-free method is proposed for the closed-loop control of cancer chemotherapy drug dosing...
August 16, 2017: Mathematical Biosciences
https://www.readbyqxmd.com/read/28732231/kernel-dynamic-policy-programming-applicable-reinforcement-learning-to-robot-systems-with-high-dimensional-states
#3
Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto
We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback-Leibler divergence between current and updated policies...
June 29, 2017: Neural Networks: the Official Journal of the International Neural Network Society
https://www.readbyqxmd.com/read/28731839/cost-benefit-arbitration-between-multiple-reinforcement-learning-systems
#4
Wouter Kool, Samuel J Gershman, Fiery A Cushman
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits...
July 1, 2017: Psychological Science
https://www.readbyqxmd.com/read/28723943/stress-enhances-model-free-reinforcement-learning-only-after-negative-outcome
#5
Heyeon Park, Daeyeol Lee, Jeanyung Chey
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i...
2017: PloS One
https://www.readbyqxmd.com/read/28642696/how-accumulated-real-life-stress-experience-and-cognitive-speed-interact-on-decision-making-processes
#6
Eva Friedel, Miriam Sebold, Sören Kuitunen-Paul, Stephan Nebe, Ilya M Veer, Ulrich S Zimmermann, Florian Schlagenhauf, Michael N Smolka, Michael Rapp, Henrik Walter, Andreas Heinz
Rationale: Advances in neurocomputational modeling suggest that valuation systems for goal-directed (deliberative) on one side, and habitual (automatic) decision-making on the other side may rely on distinct computational strategies for reinforcement learning, namely model-free vs. model-based learning. As a key theoretical difference, the model-based system strongly demands cognitive functions to plan actions prospectively based on an internal cognitive model of the environment, whereas valuation in the model-free system relies on rather simple learning rules from operant conditioning to retrospectively associate actions with their outcomes and is thus cognitively less demanding...
2017: Frontiers in Human Neuroscience
https://www.readbyqxmd.com/read/28599832/model-based-control-in-dimensional-psychiatry
#7
REVIEW
Valerie Voon, Andrea Reiter, Miriam Sebold, Stephanie Groman
We use parallel interacting goal-directed and habitual strategies to make our daily decisions. The arbitration between these strategies is relevant to inflexible repetitive behaviors in psychiatric disorders. Goal-directed control, also known as model-based control, is based on an affective outcome relying on a learned internal model to prospectively make decisions. In contrast, habit control, also known as model-free control, is based on an integration of previous reinforced learning autonomous of the current outcome value and is implicit and more efficient but at the cost of greater inflexibility...
April 23, 2017: Biological Psychiatry
https://www.readbyqxmd.com/read/28581478/reinstated-episodic-context-guides-sampling-based-decisions-for-reward
#8
Aaron M Bornstein, Kenneth A Norman
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event...
July 2017: Nature Neuroscience
https://www.readbyqxmd.com/read/28573384/a-simple-computational-algorithm-of-model-based-choice-preference
#9
Asako Toyama, Kentaro Katahira, Hideki Ohira
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process...
June 1, 2017: Cognitive, Affective & Behavioral Neuroscience
https://www.readbyqxmd.com/read/28362620/finite-horizon-h%C3%A2-tracking-control-for-unknown-nonlinear-systems-with-saturating-actuators
#10
Huaguang Zhang, Xiaohong Cui, Yanhong Luo, He Jiang
In this paper, a neural network (NN)-based online model-free integral reinforcement learning algorithm is developed to solve the finite-horizon H∞ optimal tracking control problem for completely unknown nonlinear continuous-time systems with disturbance and saturating actuators (constrained control input). An augmented system is constructed with the tracking error system and the command generator system. A time-varying Hamilton-Jacobi-Isaacs (HJI) equation is formulated for the augmented problem, which is extremely difficult or impossible to solve due to its time-dependent property and nonlinearity...
March 1, 2017: IEEE Transactions on Neural Networks and Learning Systems
https://www.readbyqxmd.com/read/28326050/what-to-choose-next-a-paradigm-for-testing-human-sequential-decision-making
#11
Elisa M Tartaglia, Aaron M Clarke, Michael H Herzog
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs...
2017: Frontiers in Psychology
https://www.readbyqxmd.com/read/28282439/iterative-free-energy-optimization-for-recurrent-neural-networks-inferno
#12
Alexandre Pitti, Philippe Gaussier, Mathias Quoy
The intra-parietal lobe coupled with the Basal Ganglia forms a working memory that demonstrates strong planning capabilities for generating robust yet flexible neuronal sequences. Neurocomputational models however, often fails to control long range neural synchrony in recurrent spiking networks due to spontaneous activity. As a novel framework based on the free-energy principle, we propose to see the problem of spikes' synchrony as an optimization problem of the neurons sub-threshold activity for the generation of long neuronal chains...
2017: PloS One
https://www.readbyqxmd.com/read/28113995/optimal-output-feedback-control-of-unknown-continuous-time-linear-systems-using-off-policy-reinforcement-learning
#13
Hamidreza Modares, Frank L Lewis, Zhong-Ping Jiang
A model-free off-policy reinforcement learning algorithm is developed to learn the optimal output-feedback (OPFB) solution for linear continuous-time systems. The proposed algorithm has the important feature of being applicable to the design of optimal OPFB controllers for both regulation and tracking problems. To provide a unified framework for both optimal regulation and tracking, a discounted performance function is employed and a discounted algebraic Riccati equation (ARE) is derived which gives the solution to the problem...
November 2016: IEEE Transactions on Cybernetics
https://www.readbyqxmd.com/read/28112207/placebo-intervention-enhances-reward-learning-in-healthy-individuals
#14
Zsolt Turi, Matthias Mittner, Walter Paulus, Andrea Antal
According to the placebo-reward hypothesis, placebo is a reward-anticipation process that increases midbrain dopamine (DA) levels. Reward-based learning processes, such as reinforcement learning, involves a large part of the DA-ergic network that is also activated by the placebo intervention. Given the neurochemical overlap between placebo and reward learning, we investigated whether verbal instructions in conjunction with a placebo intervention are capable of enhancing reward learning in healthy individuals by using a monetary reward-based reinforcement-learning task...
January 23, 2017: Scientific Reports
https://www.readbyqxmd.com/read/28077716/the-attraction-effect-modulates-reward-prediction-errors-and-intertemporal-choices
#15
Sebastian Gluth, Jared M Hotaling, Jörg Rieskamp
Classical economic theory contends that the utility of a choice option should be independent of other options. This view is challenged by the attraction effect, in which the relative preference between two options is altered by the addition of a third, asymmetrically dominated option. Here, we leveraged the attraction effect in the context of intertemporal choices to test whether both decisions and reward prediction errors (RPE) in the absence of choice violate the independence of irrelevant alternatives principle...
January 11, 2017: Journal of Neuroscience: the Official Journal of the Society for Neuroscience
https://www.readbyqxmd.com/read/28047608/su-d-brb-05-quantum-learning-for-knowledge-based-response-adaptive-radiotherapy
#16
I El Naqa, R Ten
PURPOSE: There is tremendous excitement in radiotherapy about applying data-driven methods to develop personalized clinical decisions for real-time response-based adaptation. However, classical statistical learning methods lack in terms of efficiency and ability to predict outcomes under conditions of uncertainty and incomplete information. Therefore, we are investigating physics-inspired machine learning approaches by utilizing quantum principles for developing a robust framework to dynamically adapt treatments to individual patient's characteristics and optimize outcomes...
June 2016: Medical Physics
https://www.readbyqxmd.com/read/27909102/the-attraction-effect-modulates-reward-prediction-errors-and-intertemporal-choices
#17
Sebastian Gluth, Jared M Hotaling, Jörg Rieskamp
Classical economic theory contends that the utility of a choice option should be independent of other options. This view is challenged by the attraction effect, in which the relative preference between two options is altered by the addition of a third, asymmetrically dominated option. Here, we leveraged the attraction effect in the context of intertemporal choices to test whether both decisions and reward prediction errors (RPE)-in the absence of choice-violate the independence of irrelevant alternatives principle...
December 1, 2016: Journal of Neuroscience: the Official Journal of the Society for Neuroscience
https://www.readbyqxmd.com/read/27825732/cognitive-components-underpinning-the-development-of-model-based-learning
#18
Tracey C S Potter, Nessa V Bryce, Catherine A Hartley
Reinforcement learning theory distinguishes "model-free" learning, which fosters reflexive repetition of previously rewarded actions, from "model-based" learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice...
June 2017: Developmental Cognitive Neuroscience
https://www.readbyqxmd.com/read/27793098/-proactive-use-of-cue-context-congruence-for-building-reinforcement-learning-s-reward-function
#19
Judit Zsuga, Klara Biro, Gabor Tajti, Magdolna Emma Szilasi, Csaba Papp, Bela Juhasz, Rudolf Gesztelyi
BACKGROUND: Reinforcement learning is a fundamental form of learning that may be formalized using the Bellman equation. Accordingly an agent determines the state value as the sum of immediate reward and of the discounted value of future states. Thus the value of state is determined by agent related attributes (action set, policy, discount factor) and the agent's knowledge of the environment embodied by the reward function and hidden environmental factors given by the transition probability...
October 28, 2016: BMC Neuroscience
https://www.readbyqxmd.com/read/27713407/striatal-prediction-errors-support-dynamic-control-of-declarative-memory-decisions
#20
Jason M Scimeca, Perri L Katzman, David Badre
Adaptive memory requires context-dependent control over how information is retrieved, evaluated and used to guide action, yet the signals that drive adjustments to memory decisions remain unknown. Here we show that prediction errors (PEs) coded by the striatum support control over memory decisions. Human participants completed a recognition memory test that incorporated biased feedback to influence participants' recognition criterion. Using model-based fMRI, we find that PEs-the deviation between the outcome and expected value of a memory decision-correlate with striatal activity and predict individuals' final criterion...
October 7, 2016: Nature Communications
keyword
keyword
119339
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"