Dynamic update of the reinforcement function during learning
During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most diffi...
Guardado en:
Autores principales: | , |
---|---|
Formato: | JOUR |
Materias: | |
Acceso en línea: | http://hdl.handle.net/20.500.12110/paper_09540091_v11_n3-4_p267_Santos |
Aporte de: |
id |
todo:paper_09540091_v11_n3-4_p267_Santos |
---|---|
record_format |
dspace |
spelling |
todo:paper_09540091_v11_n3-4_p267_Santos2023-10-03T15:51:34Z Dynamic update of the reinforcement function during learning Santos, J.M. Touzet, C. Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most difficult task today is to obtain the definition of the reinforcement function (RF). A first attempt in this direction was made by introducing a method - the update parameters algorithm (UPA) - for tuning a RF in such a way that it would be optimal during the exploration phase. The only requirement is to conform to a particular expression of RF. In this article, we propose Dynamic-UPA, an algorithm able to tune the RF parameters during the whole learning phase (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera in tasks of synthesis of obstacle avoidance and wall-following behaviors validate our proposals. Fil:Santos, J.M. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. JOUR info:eu-repo/semantics/openAccess http://creativecommons.org/licenses/by/2.5/ar http://hdl.handle.net/20.500.12110/paper_09540091_v11_n3-4_p267_Santos |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
topic |
Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning |
spellingShingle |
Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning Santos, J.M. Touzet, C. Dynamic update of the reinforcement function during learning |
topic_facet |
Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning |
description |
During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most difficult task today is to obtain the definition of the reinforcement function (RF). A first attempt in this direction was made by introducing a method - the update parameters algorithm (UPA) - for tuning a RF in such a way that it would be optimal during the exploration phase. The only requirement is to conform to a particular expression of RF. In this article, we propose Dynamic-UPA, an algorithm able to tune the RF parameters during the whole learning phase (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera in tasks of synthesis of obstacle avoidance and wall-following behaviors validate our proposals. |
format |
JOUR |
author |
Santos, J.M. Touzet, C. |
author_facet |
Santos, J.M. Touzet, C. |
author_sort |
Santos, J.M. |
title |
Dynamic update of the reinforcement function during learning |
title_short |
Dynamic update of the reinforcement function during learning |
title_full |
Dynamic update of the reinforcement function during learning |
title_fullStr |
Dynamic update of the reinforcement function during learning |
title_full_unstemmed |
Dynamic update of the reinforcement function during learning |
title_sort |
dynamic update of the reinforcement function during learning |
url |
http://hdl.handle.net/20.500.12110/paper_09540091_v11_n3-4_p267_Santos |
work_keys_str_mv |
AT santosjm dynamicupdateofthereinforcementfunctionduringlearning AT touzetc dynamicupdateofthereinforcementfunctionduringlearning |
_version_ |
1782024751825289216 |