keyword
MENU ▼
Read by QxMD icon Read
search

model-based reinforcement learning

keyword
https://www.readbyqxmd.com/read/28732231/kernel-dynamic-policy-programming-applicable-reinforcement-learning-to-robot-systems-with-high-dimensional-states
#1
Yunduan Cui, Takamitsu Matsubara, Kenji Sugimoto
We propose a new value function approach for model-free reinforcement learning in Markov decision processes involving high dimensional states that addresses the issues of brittleness and intractable computational complexity, therefore rendering the value function approach based reinforcement learning algorithms applicable to high dimensional systems. Our new algorithm, Kernel Dynamic Policy Programming (KDPP) smoothly updates the value function in accordance to the Kullback-Leibler divergence between current and updated policies...
June 29, 2017: Neural Networks: the Official Journal of the International Neural Network Society
https://www.readbyqxmd.com/read/28731839/cost-benefit-arbitration-between-multiple-reinforcement-learning-systems
#2
Wouter Kool, Samuel J Gershman, Fiery A Cushman
Human behavior is sometimes determined by habit and other times by goal-directed planning. Modern reinforcement-learning theories formalize this distinction as a competition between a computationally cheap but inaccurate model-free system that gives rise to habits and a computationally expensive but accurate model-based system that implements planning. It is unclear, however, how people choose to allocate control between these systems. Here, we propose that arbitration occurs by comparing each system's task-specific costs and benefits...
July 1, 2017: Psychological Science
https://www.readbyqxmd.com/read/28723943/stress-enhances-model-free-reinforcement-learning-only-after-negative-outcome
#3
Heyeon Park, Daeyeol Lee, Jeanyung Chey
Previous studies found that stress shifts behavioral control by promoting habits while decreasing goal-directed behaviors during reward-based decision-making. It is, however, unclear how stress disrupts the relative contribution of the two systems controlling reward-seeking behavior, i.e. model-free (or habit) and model-based (or goal-directed). Here, we investigated whether stress biases the contribution of model-free and model-based reinforcement learning processes differently depending on the valence of outcome, and whether stress alters the learning rate, i...
2017: PloS One
https://www.readbyqxmd.com/read/28719247/telehealth-in-schools-using-a-systematic-educational-model-based-on-fiction-screenplays-interactive-documentaries-and-three-dimensional-computer-graphics
#4
Diogo Julien Miranda, Chao Lung Wen
BACKGROUND: Preliminary studies suggest the need of a global vision in academic reform, leading to education re-invention. This would include problem-based education using transversal topics, developing of thinking skills, social interaction, and information-processing skills. We aimed to develop a new educational model in health with modular components to be broadcast and applied as a tele-education course. MATERIALS AND METHODS: We developed a systematic model based on a "Skills and Goals Matrix" to adapt scientific contents on fictional screenplays, three-dimensional (3D) computer graphics of the human body, and interactive documentaries...
July 18, 2017: Telemedicine Journal and E-health: the Official Journal of the American Telemedicine Association
https://www.readbyqxmd.com/read/28706499/valence-dependent-belief-updating-computational-validation
#5
Bojana Kuzmanovic, Lionel Rigoux
People tend to update beliefs about their future outcomes in a valence-dependent way: they are likely to incorporate good news and to neglect bad news. However, belief formation is a complex process which depends not only on motivational factors such as the desire for favorable conclusions, but also on multiple cognitive variables such as prior beliefs, knowledge about personal vulnerabilities and resources, and the size of the probabilities and estimation errors. Thus, we applied computational modeling in order to test for valence-induced biases in updating while formally controlling for relevant cognitive factors...
2017: Frontiers in Psychology
https://www.readbyqxmd.com/read/28696337/memristive-device-based-learning-for-navigation-in-robots
#6
Mohammad Sarim, Manish Kumar, Rashmi Jha, Ali A Minai
Biomimetic robots have gained attention recently for various applications ranging from resource hunting to search and rescue operations during disasters. Biological species are known to intuitively learn from the environment, gather and process data, and make appropriate decisions. Such sophisticated computing capabilities in robots are difficult to achieve, especially if done in real-time with ultra- low energy consumption. Here, we present a novel memristive device based learning architecture for robots. Two terminal memristive devices with resistive switching of oxide layer are modeled in a crossbar array to develop a neuromorphic platform that can impart active real-time learning capabilities in a robot...
July 11, 2017: Bioinspiration & Biomimetics
https://www.readbyqxmd.com/read/28680395/a-biologically-plausible-architecture-of-the-striatum-to-solve-context-dependent-reinforcement-learning-tasks
#7
Sabyasachi Shivkumar, Vignesh Muralidharan, V Srinivasa Chakravarthy
Basal ganglia circuit is an important subcortical system of the brain thought to be responsible for reward-based learning. Striatum, the largest nucleus of the basal ganglia, serves as an input port that maps cortical information. Microanatomical studies show that the striatum is a mosaic of specialized input-output structures called striosomes and regions of the surrounding matrix called the matrisomes. We have developed a computational model of the striatum using layered self-organizing maps to capture the center-surround structure seen experimentally and explain its functional significance...
2017: Frontiers in Neural Circuits
https://www.readbyqxmd.com/read/28678984/association-of-neural-and-emotional-impacts-of-reward-prediction-errors-with-major-depression
#8
Robb B Rutledge, Michael Moutoussis, Peter Smittenaar, Peter Zeidman, Tanja Taylor, Louise Hrynkiewicz, Jordan Lam, Nikolina Skandali, Jenifer Z Siegel, Olga T Ousdal, Gita Prabhu, Peter Dayan, Peter Fonagy, Raymond J Dolan
Importance: Major depressive disorder (MDD) is associated with deficits in representing reward prediction errors (RPEs), which are the difference between experienced and predicted reward. Reward prediction errors underlie learning of values in reinforcement learning models, are represented by phasic dopamine release, and are known to affect momentary mood. Objective: To combine functional neuroimaging, computational modeling, and smartphone-based large-scale data collection to test, in the absence of learning-related concerns, the hypothesis that depression attenuates the impact of RPEs...
July 5, 2017: JAMA Psychiatry
https://www.readbyqxmd.com/read/28671967/implementation-of-real-time-energy-management-strategy-based-on-reinforcement-learning-for-hybrid-electric-vehicles-and-simulation-validation
#9
Zehui Kong, Yuan Zou, Teng Liu
To further improve the fuel economy of series hybrid electric tracked vehicles, a reinforcement learning (RL)-based real-time energy management strategy is developed in this paper. In order to utilize the statistical characteristics of online driving schedule effectively, a recursive algorithm for the transition probability matrix (TPM) of power-request is derived. The reinforcement learning (RL) is applied to calculate and update the control policy at regular time, adapting to the varying driving conditions...
2017: PloS One
https://www.readbyqxmd.com/read/28667892/normal-aging-and-parkinson-s-disease-are-associated-with-the-functional-decline-of-distinct-frontal-striatal-circuits
#10
Aleksandra Gruszka, Adam Hampshire, Roger A Barker, Adrian M Owen
Impaired ability to shift attention between stimuli (i.e. shifting attentional 'set') is a well-established part of the dysexecutive syndrome in Parkinson's Disease (PD), nevertheless cognitive and neural bases of this deficit remain unclear. In this study, an fMRI-optimised variant of a classic paradigm for assessing attentional control (Hampshire and Owen 2006) was used to contrast activity in dissociable executive circuits in early-stage PD patients and controls. The results demonstrated that the neural basis of the executive performance impairments in PD is accompanied by hypoactivation within the striatum, anterior cingulate cortex (vACC), and inferior frontal sulcus (IFS) regions...
June 3, 2017: Cortex; a Journal Devoted to the Study of the Nervous System and Behavior
https://www.readbyqxmd.com/read/28651789/interactions-among-working-memory-reinforcement-learning-and-effort-in-value-based-choice-a-new-paradigm-and-selective-deficits-in-schizophrenia
#11
Anne G E Collins, Matthew A Albrecht, James A Waltz, James M Gold, Michael J Frank
BACKGROUND: When studying learning, researchers directly observe only the participants' choices, which are often assumed to arise from a unitary learning process. However, a number of separable systems, such as working memory (WM) and reinforcement learning (RL), contribute simultaneously to human learning. Identifying each system's contributions is essential for mapping the neural substrates contributing in parallel to behavior; computational modeling can help to design tasks that allow such a separable identification of processes and infer their contributions in individuals...
May 31, 2017: Biological Psychiatry
https://www.readbyqxmd.com/read/28642696/how-accumulated-real-life-stress-experience-and-cognitive-speed-interact-on-decision-making-processes
#12
Eva Friedel, Miriam Sebold, Sören Kuitunen-Paul, Stephan Nebe, Ilya M Veer, Ulrich S Zimmermann, Florian Schlagenhauf, Michael N Smolka, Michael Rapp, Henrik Walter, Andreas Heinz
Rationale: Advances in neurocomputational modeling suggest that valuation systems for goal-directed (deliberative) on one side, and habitual (automatic) decision-making on the other side may rely on distinct computational strategies for reinforcement learning, namely model-free vs. model-based learning. As a key theoretical difference, the model-based system strongly demands cognitive functions to plan actions prospectively based on an internal cognitive model of the environment, whereas valuation in the model-free system relies on rather simple learning rules from operant conditioning to retrospectively associate actions with their outcomes and is thus cognitively less demanding...
2017: Frontiers in Human Neuroscience
https://www.readbyqxmd.com/read/28641601/the-impact-of-traumatic-stress-on-pavlovian-biases
#13
O T Ousdal, Q J Huys, A M Milde, A R Craven, L Ersland, T Endestad, A Melinder, K Hugdahl, R J Dolan
BACKGROUND: Disturbances in Pavlovian valuation systems are reported to follow traumatic stress exposure. However, motivated decisions are also guided by instrumental mechanisms, but to date the effect of traumatic stress on these instrumental systems remain poorly investigated. Here, we examine whether a single episode of severe traumatic stress influences flexible instrumental decisions through an impact on a Pavlovian system. METHODS: Twenty-six survivors of the 2011 Norwegian terror attack and 30 matched control subjects performed an instrumental learning task in which Pavlovian and instrumental associations promoted congruent or conflicting responses...
June 23, 2017: Psychological Medicine
https://www.readbyqxmd.com/read/28626011/effects-of-ventral-striatum-lesions-on-stimulus-based-versus-action-based-reinforcement-learning
#14
Kathryn M Rothenhoefer, Vincent D Costa, Ramón Bartolo, Raquel Vicario-Feliciano, Elisabeth A Murray, Bruno B Averbeck
Learning the values of actions versus stimuli may depend on separable neural circuits. In the current study, we evaluated the performance of rhesus macaques with ventral striatum (VS) lesions on a two-arm bandit task that had randomly interleaved blocks of stimulus-based and action-based reinforcement learning (RL). Compared with controls, monkeys with VS lesions had deficits in learning to select rewarding images but not rewarding actions. We used a RL model to quantify learning and choice consistency and found that, in stimulus-based RL, the VS lesion monkeys were more influenced by negative feedback and had lower choice consistency than controls...
July 19, 2017: Journal of Neuroscience: the Official Journal of the Society for Neuroscience
https://www.readbyqxmd.com/read/28612515/the-benefits-of-a-peer-assisted-mock-paces
#15
Sarim Siddiqui, Samee Siddiqui, Qamar Mustafa, Abeer F Rizvi, Ibtesham T Hossain
BACKGROUND: Peer-assisted learning (PAL) and mock examinations have been credited as effective teaching tools; however, there is a lack of research into their effectiveness in PACES (practical assessment of clinical examination skills). This study demonstrates an effective model and the benefits of PAL after its implementation in a mock PACES at Imperial College London. There is a lack of research into the effectiveness of PAL and mock examinations in PACES METHODS: A mock PACES was designed for fifth-year medical students...
June 14, 2017: Clinical Teacher
https://www.readbyqxmd.com/read/28599832/model-based-control-in-dimensional-psychiatry
#16
REVIEW
Valerie Voon, Andrea Reiter, Miriam Sebold, Stephanie Groman
We use parallel interacting goal-directed and habitual strategies to make our daily decisions. The arbitration between these strategies is relevant to inflexible repetitive behaviors in psychiatric disorders. Goal-directed control, also known as model-based control, is based on an affective outcome relying on a learned internal model to prospectively make decisions. In contrast, habit control, also known as model-free control, is based on an integration of previous reinforced learning autonomous of the current outcome value and is implicit and more efficient but at the cost of greater inflexibility...
April 23, 2017: Biological Psychiatry
https://www.readbyqxmd.com/read/28585051/the-role-of-the-putamen-in-language-a-meta-analytic-connectivity-modeling-study
#17
Nestor Viñas-Guasch, Yan Jing Wu
The putamen is a subcortical structure that forms part of the dorsal striatum of basal ganglia, and has traditionally been associated with reinforcement learning and motor control, including speech articulation. However, recent studies have shown involvement of the left putamen in other language functions such as bilingual language processing (Abutalebi et al. 2012) and production, with some authors arguing for functional segregation of anterior and posterior putamen (Oberhuber et al. 2013). A further step in exploring the role of putamen in language would involve identifying the network of coactivations of not only the left, but also the right putamen, given the involvement of right hemisphere in high order language functions (Vigneau et al...
June 5, 2017: Brain Structure & Function
https://www.readbyqxmd.com/read/28581478/reinstated-episodic-context-guides-sampling-based-decisions-for-reward
#18
Aaron M Bornstein, Kenneth A Norman
How does experience inform decisions? In episodic sampling, decisions are guided by a few episodic memories of past choices. This process can yield choice patterns similar to model-free reinforcement learning; however, samples can vary from trial to trial, causing decisions to vary. Here we show that context retrieved during episodic sampling can cause choice behavior to deviate sharply from the predictions of reinforcement learning. Specifically, we show that, when a given memory is sampled, choices (in the present) are influenced by the properties of other decisions made in the same context as the sampled event...
July 2017: Nature Neuroscience
https://www.readbyqxmd.com/read/28575424/association-between-habenula-dysfunction-and-motivational-symptoms-in-unmedicated-major-depressive-disorder
#19
Wen-Hua Liu, Vincent Valton, Ling-Zhi Wang, Yu-Hua Zhu, Jonathan P Roiser
The lateral habenula plays a central role in reward and punishment processing and has been suggested to drive the cardinal symptom of anhedonia in depression. This hypothesis is largely based on observations of habenula hypermetabolism in animal models of depression, but the activity of habenula and its relationship with clinical symptoms in patients with depression remains unclear. High-resolution functional magnetic resonance imaging (fMRI) and computational modelling were used to investigate the activity of the habenula during a probabilistic reinforcement learning task with rewarding and punishing outcomes in 21 unmedicated patients with major depression and 17 healthy participants...
May 29, 2017: Social Cognitive and Affective Neuroscience
https://www.readbyqxmd.com/read/28573384/a-simple-computational-algorithm-of-model-based-choice-preference
#20
Asako Toyama, Kentaro Katahira, Hideki Ohira
A broadly used computational framework posits that two learning systems operate in parallel during the learning of choice preferences-namely, the model-free and model-based reinforcement-learning systems. In this study, we examined another possibility, through which model-free learning is the basic system and model-based information is its modulator. Accordingly, we proposed several modified versions of a temporal-difference learning model to explain the choice-learning process. Using the two-stage decision task developed by Daw, Gershman, Seymour, Dayan, and Dolan (2011), we compared their original computational model, which assumes a parallel learning process, and our proposed models, which assume a sequential learning process...
June 1, 2017: Cognitive, Affective & Behavioral Neuroscience
keyword
keyword
103848
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"