Dynamic update of the reinforcement function during learning

During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most diffi...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Santos, Juan Miguel
Publicado:	1999
Materias:	Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning
Acceso en línea:	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos http://hdl.handle.net/20.500.12110/paper_09540091_v11_n3-4_p267_Santos
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	paper:paper_09540091_v11_n3-4_p267_Santos
record_format	dspace
spelling	paper:paper_09540091_v11_n3-4_p267_Santos2023-06-08T15:55:42Z Dynamic update of the reinforcement function during learning Santos, Juan Miguel Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most difficult task today is to obtain the definition of the reinforcement function (RF). A first attempt in this direction was made by introducing a method - the update parameters algorithm (UPA) - for tuning a RF in such a way that it would be optimal during the exploration phase. The only requirement is to conform to a particular expression of RF. In this article, we propose Dynamic-UPA, an algorithm able to tune the RF parameters during the whole learning phase (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera in tasks of synthesis of obstacle avoidance and wall-following behaviors validate our proposals. Fil:Santos, J.M. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. 1999 https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos http://hdl.handle.net/20.500.12110/paper_09540091_v11_n3-4_p267_Santos
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
topic	Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning
spellingShingle	Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning Santos, Juan Miguel Dynamic update of the reinforcement function during learning
topic_facet	Autonomous robot Behaviour-based approach Reinforcement function Reinforcement learning Robot learning
description	During the last decade, numerous contributions have been made to the use of reinforcement learning in the robot learning field. They have focused mainly on the generalization, memorization and exploration issues - mandatory for dealing with real robots. However, it is our opinion that the most difficult task today is to obtain the definition of the reinforcement function (RF). A first attempt in this direction was made by introducing a method - the update parameters algorithm (UPA) - for tuning a RF in such a way that it would be optimal during the exploration phase. The only requirement is to conform to a particular expression of RF. In this article, we propose Dynamic-UPA, an algorithm able to tune the RF parameters during the whole learning phase (exploration and exploitation). It allows one to undertake the so-called exploration versus exploitation dilemma through careful computation of the RF parameter values by controlling the ratio between positive and negative reinforcement during learning. Experiments with the mobile robot Khepera in tasks of synthesis of obstacle avoidance and wall-following behaviors validate our proposals.
author	Santos, Juan Miguel
author_facet	Santos, Juan Miguel
author_sort	Santos, Juan Miguel
title	Dynamic update of the reinforcement function during learning
title_short	Dynamic update of the reinforcement function during learning
title_full	Dynamic update of the reinforcement function during learning
title_fullStr	Dynamic update of the reinforcement function during learning
title_full_unstemmed	Dynamic update of the reinforcement function during learning
title_sort	dynamic update of the reinforcement function during learning
publishDate	1999
url	https://bibliotecadigital.exactas.uba.ar/collection/paper/document/paper_09540091_v11_n3-4_p267_Santos http://hdl.handle.net/20.500.12110/paper_09540091_v11_n3-4_p267_Santos
work_keys_str_mv	AT santosjuanmiguel dynamicupdateofthereinforcementfunctionduringlearning
_version_	1768545059090726912

Dynamic update of the reinforcement function during learning

Ejemplares similares