Contribución al estudio y el diseño de funciones de refuerzo

Mostrar todas las versiones(5)

We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the funct...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autor principal:	Santos, Juan Miguel
Otros Autores:	Scolnik, Hugo Daniel
Formato:	Tesis doctoral publishedVersion
Lenguaje:	Español
Publicado:	Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales 1999
Acceso en línea:	https://hdl.handle.net/20.500.12110/tesis_n3129_Santos
Aporte de:	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) de Universidad de Buenos Aires

id	tesis:tesis_n3129_Santos
record_format	dspace
spelling	tesis:tesis_n3129_Santos2023-10-02T19:45:08Z Contribución al estudio y el diseño de funciones de refuerzo Contribution to the study and the design of reinforcement functions Santos, Juan Miguel Scolnik, Hugo Daniel Giambiasi, Norbert We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the function (relative to exploration). We separate the constraints according to the type ofstate variable estimator on which they act, in particular: position and velocity. Using a particular, but representative Reinforcement Function (RF)expression, we study the relation between the Sum of each reinforcement type andthe RF parameters during the exploration phase of the learning. For linearrelations, we propose an analytic method to obtain the RF parameters values (noexperimentation requires). For non-linear, but monotonous relations, we proposethe Update Parameter Algorithm (UPA) and show that UPA can efficiently adjustthe proportion of negative and positive reinforcements received during theexploratory phase of the learning. Additionally, we study the feasibility and consequences of adapting the RFduring the learning process so as to improve the learning convergence of thesystem. Dynamic-UPA allows the whole learning process to maintain a desiredratio of positive and negative rewards. Thus, we introduce an approach to solvethe exploration-exploitation dilemma - a necessary step for efficient Reinforcement Learning. We illustrate, with several experiments involving robots (mobile and arm),the performance of the proposed design methods. Finally, we emphasize the mainconclusions and present some future directions of research. Fil: Santos, Juan Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales 1999 info:eu-repo/semantics/doctoralThesis info:ar-repo/semantics/tesis doctoral info:eu-repo/semantics/publishedVersion application/pdf spa info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar https://hdl.handle.net/20.500.12110/tesis_n3129_Santos
institution	Universidad de Buenos Aires
institution_str	I-28
repository_str	R-134
collection	Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
language	Español
orig_language_str_mv	spa
description	We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the function (relative to exploration). We separate the constraints according to the type ofstate variable estimator on which they act, in particular: position and velocity. Using a particular, but representative Reinforcement Function (RF)expression, we study the relation between the Sum of each reinforcement type andthe RF parameters during the exploration phase of the learning. For linearrelations, we propose an analytic method to obtain the RF parameters values (noexperimentation requires). For non-linear, but monotonous relations, we proposethe Update Parameter Algorithm (UPA) and show that UPA can efficiently adjustthe proportion of negative and positive reinforcements received during theexploratory phase of the learning. Additionally, we study the feasibility and consequences of adapting the RFduring the learning process so as to improve the learning convergence of thesystem. Dynamic-UPA allows the whole learning process to maintain a desiredratio of positive and negative rewards. Thus, we introduce an approach to solvethe exploration-exploitation dilemma - a necessary step for efficient Reinforcement Learning. We illustrate, with several experiments involving robots (mobile and arm),the performance of the proposed design methods. Finally, we emphasize the mainconclusions and present some future directions of research.
author2	Scolnik, Hugo Daniel
author_facet	Scolnik, Hugo Daniel Santos, Juan Miguel
format	Tesis doctoral Tesis doctoral publishedVersion
author	Santos, Juan Miguel
spellingShingle	Santos, Juan Miguel Contribución al estudio y el diseño de funciones de refuerzo
author_sort	Santos, Juan Miguel
title	Contribución al estudio y el diseño de funciones de refuerzo
title_short	Contribución al estudio y el diseño de funciones de refuerzo
title_full	Contribución al estudio y el diseño de funciones de refuerzo
title_fullStr	Contribución al estudio y el diseño de funciones de refuerzo
title_full_unstemmed	Contribución al estudio y el diseño de funciones de refuerzo
title_sort	contribución al estudio y el diseño de funciones de refuerzo
publisher	Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales
publishDate	1999
url	https://hdl.handle.net/20.500.12110/tesis_n3129_Santos
work_keys_str_mv	AT santosjuanmiguel contribucionalestudioyeldisenodefuncionesderefuerzo AT santosjuanmiguel contributiontothestudyandthedesignofreinforcementfunctions
_version_	1782022170942111744

Contribución al estudio y el diseño de funciones de refuerzo

Ejemplares similares