Contribución al estudio y el diseño de funciones de refuerzo
We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the funct...
Guardado en:
Autor principal: | |
---|---|
Otros Autores: | |
Formato: | Tesis doctoral publishedVersion |
Lenguaje: | Español |
Publicado: |
Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales
1999
|
Acceso en línea: | https://hdl.handle.net/20.500.12110/tesis_n3129_Santos |
Aporte de: |
id |
tesis:tesis_n3129_Santos |
---|---|
record_format |
dspace |
spelling |
tesis:tesis_n3129_Santos2023-10-02T19:45:08Z Contribución al estudio y el diseño de funciones de refuerzo Contribution to the study and the design of reinforcement functions Santos, Juan Miguel Scolnik, Hugo Daniel Giambiasi, Norbert We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the function (relative to exploration). We separate the constraints according to the type ofstate variable estimator on which they act, in particular: position and velocity. Using a particular, but representative Reinforcement Function (RF)expression, we study the relation between the Sum of each reinforcement type andthe RF parameters during the exploration phase of the learning. For linearrelations, we propose an analytic method to obtain the RF parameters values (noexperimentation requires). For non-linear, but monotonous relations, we proposethe Update Parameter Algorithm (UPA) and show that UPA can efficiently adjustthe proportion of negative and positive reinforcements received during theexploratory phase of the learning. Additionally, we study the feasibility and consequences of adapting the RFduring the learning process so as to improve the learning convergence of thesystem. Dynamic-UPA allows the whole learning process to maintain a desiredratio of positive and negative rewards. Thus, we introduce an approach to solvethe exploration-exploitation dilemma - a necessary step for efficient Reinforcement Learning. We illustrate, with several experiments involving robots (mobile and arm),the performance of the proposed design methods. Finally, we emphasize the mainconclusions and present some future directions of research. Fil: Santos, Juan Miguel. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales 1999 info:eu-repo/semantics/doctoralThesis info:ar-repo/semantics/tesis doctoral info:eu-repo/semantics/publishedVersion application/pdf spa info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by-nc-sa/2.5/ar https://hdl.handle.net/20.500.12110/tesis_n3129_Santos |
institution |
Universidad de Buenos Aires |
institution_str |
I-28 |
repository_str |
R-134 |
collection |
Biblioteca Digital - Facultad de Ciencias Exactas y Naturales (UBA) |
language |
Español |
orig_language_str_mv |
spa |
description |
We propose a Reinforcement Function Design Process in two steps. Thefirst one translates a natural language description into an instance of the Reinforcement Function General Expression. The second tunes parameters ofconstraints in this expression, so as to obtain the optimal definition of the function (relative to exploration). We separate the constraints according to the type ofstate variable estimator on which they act, in particular: position and velocity. Using a particular, but representative Reinforcement Function (RF)expression, we study the relation between the Sum of each reinforcement type andthe RF parameters during the exploration phase of the learning. For linearrelations, we propose an analytic method to obtain the RF parameters values (noexperimentation requires). For non-linear, but monotonous relations, we proposethe Update Parameter Algorithm (UPA) and show that UPA can efficiently adjustthe proportion of negative and positive reinforcements received during theexploratory phase of the learning. Additionally, we study the feasibility and consequences of adapting the RFduring the learning process so as to improve the learning convergence of thesystem. Dynamic-UPA allows the whole learning process to maintain a desiredratio of positive and negative rewards. Thus, we introduce an approach to solvethe exploration-exploitation dilemma - a necessary step for efficient Reinforcement Learning. We illustrate, with several experiments involving robots (mobile and arm),the performance of the proposed design methods. Finally, we emphasize the mainconclusions and present some future directions of research. |
author2 |
Scolnik, Hugo Daniel |
author_facet |
Scolnik, Hugo Daniel Santos, Juan Miguel |
format |
Tesis doctoral Tesis doctoral publishedVersion |
author |
Santos, Juan Miguel |
spellingShingle |
Santos, Juan Miguel Contribución al estudio y el diseño de funciones de refuerzo |
author_sort |
Santos, Juan Miguel |
title |
Contribución al estudio y el diseño de funciones de refuerzo |
title_short |
Contribución al estudio y el diseño de funciones de refuerzo |
title_full |
Contribución al estudio y el diseño de funciones de refuerzo |
title_fullStr |
Contribución al estudio y el diseño de funciones de refuerzo |
title_full_unstemmed |
Contribución al estudio y el diseño de funciones de refuerzo |
title_sort |
contribución al estudio y el diseño de funciones de refuerzo |
publisher |
Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales |
publishDate |
1999 |
url |
https://hdl.handle.net/20.500.12110/tesis_n3129_Santos |
work_keys_str_mv |
AT santosjuanmiguel contribucionalestudioyeldisenodefuncionesderefuerzo AT santosjuanmiguel contributiontothestudyandthedesignofreinforcementfunctions |
_version_ |
1782022170942111744 |