Contribución al estudio y el diseño de funciones de refuerzo

We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the funct...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Santos, Juan Miguel
Otros Autores: Scolnik, Hugo Daniel
Formato: Tesis doctoral publishedVersion
Lenguaje:Español
Publicado: Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales 1999
Acceso en línea:https://hdl.handle.net/20.500.12110/tesis_n3129_Santos
Aporte de:
id tesis:tesis_n3129_Santos
record_format dspace
spelling tesis:tesis_n3129_Santos2023-10-02T19:45:08Z Contribución al estudio y el diseño de funciones de refuerzo Contribution to the study and the design of reinforcement functions Santos, Juan Miguel Scolnik, Hugo Daniel Giambiasi, Norbert We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the function (relative to exploration). We separate the constraints according to the type ofstate variable estimator on which they act, in particular: position and velocity. Using a particular, but representative Reinforcement Function (RF)expression, we study the relation between the Sum of each reinforcement type andthe RF parameters during the exploration phase of the learning. For linearrelations, we propose an analytic method to obtain the RF parameters values (noexperimentation requires). For non-linear, but monotonous relations, we proposethe Update Parameter Algorithm (UPA) and show that UPA can efficiently adjustthe proportion of negative and positive reinforcements received during theexploratory phase of the learning. Additionally, we study the feasibility and consequences of adapting the RFduring the learning process so as to improve the learning convergence of thesystem. Dynamic-UPA allows the whole learning process to maintain a desiredratio of positive and negative rewards. Thus, we introduce an approach to solvethe exploration-exploitation dilemma - a necessary step for efficient Reinforcement Learning. We illustrate, with several experiments involving robots (mobile and arm),the performance of the proposed design methods. Finally, we emphasize the mainconclusions and present some future directions of research. Fil: Santos, Juan Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales 1999 info:eu-repo/semantics/doctoralThesis info:ar-repo/semantics/tesis doctoral info:eu-repo/semantics/publishedVersion application/pdf spa info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar https://hdl.handle.net/20.500.12110/tesis_n3129_Santos
institution Universidad de Buenos Aires
institution_str I-28
repository_str R-134
collection Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA)
language Español
orig_language_str_mv spa
description We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the function (relative to exploration). We separate the constraints according to the type ofstate variable estimator on which they act, in particular: position and velocity. Using a particular, but representative Reinforcement Function (RF)expression, we study the relation between the Sum of each reinforcement type andthe RF parameters during the exploration phase of the learning. For linearrelations, we propose an analytic method to obtain the RF parameters values (noexperimentation requires). For non-linear, but monotonous relations, we proposethe Update Parameter Algorithm (UPA) and show that UPA can efficiently adjustthe proportion of negative and positive reinforcements received during theexploratory phase of the learning. Additionally, we study the feasibility and consequences of adapting the RFduring the learning process so as to improve the learning convergence of thesystem. Dynamic-UPA allows the whole learning process to maintain a desiredratio of positive and negative rewards. Thus, we introduce an approach to solvethe exploration-exploitation dilemma - a necessary step for efficient Reinforcement Learning. We illustrate, with several experiments involving robots (mobile and arm),the performance of the proposed design methods. Finally, we emphasize the mainconclusions and present some future directions of research.
author2 Scolnik, Hugo Daniel
author_facet Scolnik, Hugo Daniel
Santos, Juan Miguel
format Tesis doctoral
Tesis doctoral
publishedVersion
author Santos, Juan Miguel
spellingShingle Santos, Juan Miguel
Contribución al estudio y el diseño de funciones de refuerzo
author_sort Santos, Juan Miguel
title Contribución al estudio y el diseño de funciones de refuerzo
title_short Contribución al estudio y el diseño de funciones de refuerzo
title_full Contribución al estudio y el diseño de funciones de refuerzo
title_fullStr Contribución al estudio y el diseño de funciones de refuerzo
title_full_unstemmed Contribución al estudio y el diseño de funciones de refuerzo
title_sort contribución al estudio y el diseño de funciones de refuerzo
publisher Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales
publishDate 1999
url https://hdl.handle.net/20.500.12110/tesis_n3129_Santos
work_keys_str_mv AT santosjuanmiguel contribucionalestudioyeldisenodefuncionesderefuerzo
AT santosjuanmiguel contributiontothestudyandthedesignofreinforcementfunctions
_version_ 1782022170942111744