We have located links that may give you full text access.
Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems.
IEEE Transactions on Cybernetics 2018 January
In this paper, motivated by human neurocognitive experiments, a model-free off-policy reinforcement learning algorithm is developed to solve the optimal tracking control of multiple-model linear discrete-time systems. First, an adaptive self-organizing map neural network is used to determine the system behavior from measured data and to assign a responsibility signal to each of system possible behaviors. A new model is added if a sudden change of system behavior is detected from the measured data and the behavior has not been previously detected. A value function is represented by partially weighted value functions. Then, the off-policy iteration algorithm is generalized to multiple-model learning to find a solution without any knowledge about the system dynamics or reference trajectory dynamics. The off-policy approach helps to increase data efficiency and speed of tuning since a stream of experiences obtained from executing a behavior policy is reused to update several value functions corresponding to different learning policies sequentially. Two numerical examples serve as a demonstration of the off-policy algorithm performance.
Full text links
Related Resources
Get seemless 1-tap access through your institution/university
For the best experience, use the Read mobile app
All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.
By using this service, you agree to our terms of use and privacy policy.
Your Privacy Choices
You can now claim free CME credits for this literature searchClaim now
Get seemless 1-tap access through your institution/university
For the best experience, use the Read mobile app