keyword
MENU ▼
Read by QxMD icon Read
search

model-based reinforcement learning

keyword
https://www.readbyqxmd.com/read/28326050/what-to-choose-next-a-paradigm-for-testing-human-sequential-decision-making
#1
Elisa M Tartaglia, Aaron M Clarke, Michael H Herzog
Many of the decisions we make in our everyday lives are sequential and entail sparse rewards. While sequential decision-making has been extensively investigated in theory (e.g., by reinforcement learning models) there is no systematic experimental paradigm to test it. Here, we developed such a paradigm and investigated key components of reinforcement learning models: the eligibility trace (i.e., the memory trace of previous decision steps), the external reward, and the ability to exploit the statistics of the environment's structure (model-free vs...
2017: Frontiers in Psychology
https://www.readbyqxmd.com/read/28320846/working-memory-load-strengthens-reward-prediction-errors
#2
Anne G E Collins, Brittany Ciullo, Michael J Frank, David Badre
Reinforcement learning in simple instrumental tasks is usually modeled as a monolithic process in which reward prediction errors are used to update expected values of choice options. This modeling ignores the different contributions of different memory and decision-making systems thought to contribute even to simple learning. In an fMRI experiment, we asked how working memory and incremental reinforcement learning processes interact to guide human learning. Working memory load was manipulated by varying the number of stimuli to be learned across blocks...
March 20, 2017: Journal of Neuroscience: the Official Journal of the Society for Neuroscience
https://www.readbyqxmd.com/read/28316564/functional-circuitry-effect-of-ventral-tegmental-area-deep-brain-stimulation-imaging-and-neurochemical-evidence-of-mesocortical-and-mesolimbic-pathway-modulation
#3
Megan L Settell, Paola Testini, Shinho Cho, Jannifer H Lee, Charles D Blaha, Hang J Jo, Kendall H Lee, Hoon-Ki Min
Background: The ventral tegmental area (VTA), containing mesolimbic and mesocortical dopaminergic neurons, is implicated in processes involving reward, addiction, reinforcement, and learning, which are associated with a variety of neuropsychiatric disorders. Electrical stimulation of the VTA or the medial forebrain bundle and its projection target the nucleus accumbens (NAc) is reported to improve depressive symptoms in patients affected by severe, treatment-resistant major depressive disorder (MDD) and depressive-like symptoms in animal models of depression...
2017: Frontiers in Neuroscience
https://www.readbyqxmd.com/read/28298887/automated-operant-conditioning-in-the-mouse-home-cage
#4
Nikolas A Francis, Patrick O Kanold
Recent advances in neuroimaging and genetics have made mice an advantageous animal model for studying the neurophysiology of sensation, cognition, and locomotion. A key benefit of mice is that they provide a large population of test subjects for behavioral screening. Reflex-based assays of hearing in mice, such as the widely used acoustic startle response, are less accurate than operant conditioning in measuring auditory processing. To date, however, there are few cost-effective options for scalable operant conditioning systems...
2017: Frontiers in Neural Circuits
https://www.readbyqxmd.com/read/28293206/five-year-olds-systematic-errors-in-second-order-false-belief-tasks-are-due-to-first-order-theory-of-mind-strategy-selection-a-computational-modeling-study
#5
Burcu Arslan, Niels A Taatgen, Rineke Verbrugge
The focus of studies on second-order false belief reasoning generally was on investigating the roles of executive functions and language with correlational studies. Different from those studies, we focus on the question how 5-year-olds select and revise reasoning strategies in second-order false belief tasks by constructing two computational cognitive models of this process: an instance-based learning model and a reinforcement learning model. Unlike the reinforcement learning model, the instance-based learning model predicted that children who fail second-order false belief tasks would give answers based on first-order theory of mind (ToM) reasoning as opposed to zero-order reasoning...
2017: Frontiers in Psychology
https://www.readbyqxmd.com/read/28286265/vicarious-extinction-learning-during-reconsolidation-neutralizes-fear-memory
#6
Armita Golkar, Cathelijn Tjaden, Merel Kindt
BACKGROUND: Previous studies have suggested that fear memories can be updated when recalled, a process referred to as reconsolidation. Given the beneficial effects of model-based safety learning (i.e. vicarious extinction) in preventing the recovery of short-term fear memory, we examined whether consolidated long-term fear memories could be updated with safety learning accomplished through vicarious extinction learning initiated within the reconsolidation time-window. We assessed this in a final sample of 19 participants that underwent a three-day within-subject fear-conditioning design, using fear-potentiated startle as our primary index of fear learning...
February 22, 2017: Behaviour Research and Therapy
https://www.readbyqxmd.com/read/28282439/iterative-free-energy-optimization-for-recurrent-neural-networks-inferno
#7
Alexandre Pitti, Philippe Gaussier, Mathias Quoy
The intra-parietal lobe coupled with the Basal Ganglia forms a working memory that demonstrates strong planning capabilities for generating robust yet flexible neuronal sequences. Neurocomputational models however, often fails to control long range neural synchrony in recurrent spiking networks due to spontaneous activity. As a novel framework based on the free-energy principle, we propose to see the problem of spikes' synchrony as an optimization problem of the neurons sub-threshold activity for the generation of long neuronal chains...
2017: PloS One
https://www.readbyqxmd.com/read/28274725/consolidation-of-vocabulary-during-sleep-the-rich-get-richer
#8
REVIEW
Emma James, M Gareth Gaskell, Anna Weighall, Lisa Henderson
Sleep plays a role in strengthening new words and integrating them with existing vocabulary knowledge, consistent with neural models of learning in which sleep supports hippocampal transfer to neocortical memory. Such models are based on adult research, yet neural maturation may mean that the mechanisms supporting word learning vary across development. Here, we propose a model in which children may capitalise on larger amounts of slow-wave sleep to support a greater demand on learning and neural reorganisation, whereas adults may benefit from a richer knowledge base to support consolidation...
March 6, 2017: Neuroscience and Biobehavioral Reviews
https://www.readbyqxmd.com/read/28268267/reward-gain-model-describes-cortical-use-dependent-plasticity
#9
Firas Mawase, Nicholas Wymbs, Shintaro Uehara, Pablo Celnik
Consistent repetitions of an action lead to plastic change in the motor cortex and cause shift in the direction of future movements. This process is known as use-dependent plasticity (UDP), one of the basic forms of the motor memory. We have recently demonstrated in a physiological study that success-related reinforcement signals could modulate the strength of UDP. We tested this idea by developing a computational approach that modeled the shift in the direction of future action as a change in preferred direction of population activity of neurons in the primary motor cortex...
August 2016: Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society
https://www.readbyqxmd.com/read/28265866/unpacking-buyer-seller-differences-in-valuation-from-experience-a-cognitive-modeling-approach
#10
Thorsten Pachur, Benjamin Scheibehenne
People often indicate a higher price for an object when they own it (i.e., as sellers) than when they do not (i.e., as buyers)-a phenomenon known as the endowment effect. We develop a cognitive modeling approach to formalize, disentangle, and compare alternative psychological accounts (e.g., loss aversion, loss attention, strategic misrepresentation) of such buyer-seller differences in pricing decisions of monetary lotteries. To also be able to test possible buyer-seller differences in memory and learning, we study pricing decisions from experience, obtained with the sampling paradigm, where people learn about a lottery's payoff distribution from sequential sampling...
March 6, 2017: Psychonomic Bulletin & Review
https://www.readbyqxmd.com/read/28254083/classification-techniques-on-computerized-systems-to-predict-and-or-to-detect-apnea-a-systematic-review
#11
REVIEW
Nuno Pombo, Nuno Garcia, Kouamana Bousson
BACKGROUND AND OBJECTIVE: Sleep apnea syndrome (SAS), which can significantly decrease the quality of life is associated with a major risk factor of health implications such as increased cardiovascular disease, sudden death, depression, irritability, hypertension, and learning difficulties. Thus, it is relevant and timely to present a systematic review describing significant applications in the framework of computational intelligence-based SAS, including its performance, beneficial and challenging effects, and modeling for the decision-making on multiple scenarios...
March 2017: Computer Methods and Programs in Biomedicine
https://www.readbyqxmd.com/read/28248958/fidelity-of-the-representation-of-value-in-decision-making
#12
Paul M Bays, Ben A Dowding
The ability to make optimal decisions depends on evaluating the expected rewards associated with different potential actions. This process is critically dependent on the fidelity with which reward value information can be maintained in the nervous system. Here we directly probe the fidelity of value representation following a standard reinforcement learning task. The results demonstrate a previously-unrecognized bias in the representation of value: extreme reward values, both low and high, are stored significantly more accurately and precisely than intermediate rewards...
March 2017: PLoS Computational Biology
https://www.readbyqxmd.com/read/28240598/feedback-for-reinforcement-learning-based-brain-machine-interfaces-using-confidence-metrics
#13
Noeline Prins, Justin Sanchez, Abhishek Prasad
OBJECTIVE: For Brain-Machine Interfaces (BMI) to be used in activities of daily living by paralyzed individuals, the BMI should be as autonomous as possible. One of the challenges is how the feedback is extracted and utilized in the BMI. Our long-term goal is to create autonomous BMIs that can utilize an evaluative feedback from the brain to update the decoding algorithm and use it intelligently in order to adapt the decoder. In this study, we show how to extract the necessary evaluative feedback from a biologically realistic (synthetic) source, use both the quantity and the quality of the feedback, and how that feedback information can be incorporated into a reinforcement learning (RL) controller architecture to maximize its performance...
February 27, 2017: Journal of Neural Engineering
https://www.readbyqxmd.com/read/28226424/reward-gain-model-describes-cortical-use-dependent-plasticity
#14
Firas Mawase, Nicholas Wymbs, Shintaro Uehara, Pablo Celnik, Firas Mawase, Nicholas Wymbs, Shintaro Uehara, Pablo Celnik, Firas Mawase, Pablo Celnik, Shintaro Uehara, Nicholas Wymbs
Consistent repetitions of an action lead to plastic change in the motor cortex and cause shift in the direction of future movements. This process is known as use-dependent plasticity (UDP), one of the basic forms of the motor memory. We have recently demonstrated in a physiological study that success-related reinforcement signals could modulate the strength of UDP. We tested this idea by developing a computational approach that modeled the shift in the direction of future action as a change in preferred direction of population activity of neurons in the primary motor cortex...
August 2016: Conference Proceedings: Annual International Conference of the IEEE Engineering in Medicine and Biology Society
https://www.readbyqxmd.com/read/28225034/simulating-future-value-in-intertemporal-choice
#15
Alec Solway, Terry Lohrenz, P Read Montague
The laboratory study of how humans and other animals trade-off value and time has a long and storied history, and is the subject of a vast literature. However, despite a long history of study, there is no agreed upon mechanistic explanation of how intertemporal choice preferences arise. Several theorists have recently proposed model-based reinforcement learning as a candidate framework. This framework describes a suite of algorithms by which a model of the environment, in the form of a state transition function and reward function, can be converted on-line into a decision...
February 22, 2017: Scientific Reports
https://www.readbyqxmd.com/read/28183973/solving-the-quantum-many-body-problem-with-artificial-neural-networks
#16
Giuseppe Carleo, Matthias Troyer
The challenge posed by the many-body problem in quantum physics originates from the difficulty of describing the nontrivial correlations encoded in the exponential complexity of the many-body wave function. Here we demonstrate that systematic machine learning of the wave function can reduce this complexity to a tractable computational form for some notable cases of physical interest. We introduce a variational representation of quantum states based on artificial neural networks with a variable number of hidden neurons...
February 10, 2017: Science
https://www.readbyqxmd.com/read/28170057/animal-models-of-drug-addiction
#17
María Pilar García Pardo, Concepción Roger Sánchez, José Enrique De la Rubia Ortí, María Asunción Aguilar Calpe
The development of animal models of drug reward and addiction is an essential factor for progress in understanding the biological basis of this disorder and for the identification of new therapeutic targets. Depending on the component of reward to be studied, one type of animal model or another may be used. There are models of reinforcement based on the primary hedonic effect produced by the consumption of the addictive substance, such as the self-administration (SA) and intracranial self-stimulation (ICSS) paradigms, and there are models based on the component of reward related to associative learning and cognitive ability to make predictions about obtaining reward in the future, such as the conditioned place preference (CPP) paradigm...
January 12, 2017: Adicciones
https://www.readbyqxmd.com/read/28166826/socio-ecological-dynamics-and-challenges-to-the-governance-of-neglected-tropical-disease-control
#18
REVIEW
Edwin Michael, Shirin Madon
The current global attempts to control the so-called "Neglected Tropical Diseases (NTDs)" have the potential to significantly reduce the morbidity suffered by some of the world's poorest communities. However, the governance of these control programmes is driven by a managerial rationality that assumes predictability of proposed interventions, and which thus primarily seeks to improve the cost-effectiveness of implementation by measuring performance in terms of pre-determined outputs. Here, we argue that this approach has reinforced the narrow normal-science model for controlling parasitic diseases, and in doing so fails to address the complex dynamics, uncertainty and socio-ecological context-specificity that invariably underlie parasite transmission...
February 6, 2017: Infectious Diseases of Poverty
https://www.readbyqxmd.com/read/28159314/fuzzy-lyapunov-reinforcement-learning-for-non-linear-systems
#19
Abhishek Kumar, Rajneesh Sharma
We propose a fuzzy reinforcement learning (RL) based controller that generates a stable control action by lyapunov constraining fuzzy linguistic rules. In particular, we attempt at lyapunov constraining the consequent part of fuzzy rules in a fuzzy RL setup. Ours is a first attempt at designing a linguistic RL controller with lyapunov constrained fuzzy consequents to progressively learn a stable optimal policy. The proposed controller does not need system model or desired response and can effectively handle disturbances in continuous state-action space problems...
January 31, 2017: ISA Transactions
https://www.readbyqxmd.com/read/28158309/learning-by-stimulation-avoidance-a-principle-to-control-spiking-neural-networks-dynamics
#20
Lana Sinapayen, Atsushi Masumori, Takashi Ikegami
Learning based on networks of real neurons, and learning based on biologically inspired models of neural networks, have yet to find general learning rules leading to widespread applications. In this paper, we argue for the existence of a principle allowing to steer the dynamics of a biologically inspired neural network. Using carefully timed external stimulation, the network can be driven towards a desired dynamical state. We term this principle "Learning by Stimulation Avoidance" (LSA). We demonstrate through simulation that the minimal sufficient conditions leading to LSA in artificial networks are also sufficient to reproduce learning results similar to those obtained in biological neurons by Shahaf and Marom, and in addition explains synaptic pruning...
2017: PloS One
keyword
keyword
103848
1
2
Fetch more papers »
Fetching more papers... Fetching...
Read by QxMD. Sign in or create an account to discover new knowledge that matter to you.
Remove bar
Read by QxMD icon Read
×

Search Tips

Use Boolean operators: AND/OR

diabetic AND foot
diabetes OR diabetic

Exclude a word using the 'minus' sign

Virchow -triad

Use Parentheses

water AND (cup OR glass)

Add an asterisk (*) at end of a word to include word stems

Neuro* will search for Neurology, Neuroscientist, Neurological, and so on

Use quotes to search for an exact phrase

"primary prevention of cancer"
(heart or cardiac or cardio*) AND arrest -"American Heart Association"