Black box meta-learning intrinsic rewards for sparse-reward environments
Despite the successes and progress of deep reinforcement learning over the last decade, several challenges remain that hinder its broader application. Some fundamental aspects to improve include data efficiency, generalization capability, and ability to learn in sparse-reward environments, which oft...
Guardado en:
| Autores principales: | , , |
|---|---|
| Formato: | Objeto de conferencia |
| Lenguaje: | Inglés |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/176190 |
| Aporte de: |
| id |
I19-R120-10915-176190 |
|---|---|
| record_format |
dspace |
| spelling |
I19-R120-10915-1761902025-02-06T20:05:46Z http://sedici.unlp.edu.ar/handle/10915/176190 Black box meta-learning intrinsic rewards for sparse-reward environments Pappalardo, Octavio Santos, Juan M. Ramele, Rodrigo 2024-10 2024 2025-02-06T12:26:25Z en Ciencias Informáticas Despite the successes and progress of deep reinforcement learning over the last decade, several challenges remain that hinder its broader application. Some fundamental aspects to improve include data efficiency, generalization capability, and ability to learn in sparse-reward environments, which often require human-designed dense rewards. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. The focus is on meta-learning intrinsic rewards under a framework that doesn’t rely on the use of meta-gradients. We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function. The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations, and with only sparse rewards accessible for the evaluation tasks. Red de Universidades con Carreras en Informática Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 24-33 |
| institution |
Universidad Nacional de La Plata |
| institution_str |
I-19 |
| repository_str |
R-120 |
| collection |
SEDICI (UNLP) |
| language |
Inglés |
| topic |
Ciencias Informáticas |
| spellingShingle |
Ciencias Informáticas Pappalardo, Octavio Santos, Juan M. Ramele, Rodrigo Black box meta-learning intrinsic rewards for sparse-reward environments |
| topic_facet |
Ciencias Informáticas |
| description |
Despite the successes and progress of deep reinforcement learning over the last decade, several challenges remain that hinder its broader application. Some fundamental aspects to improve include data efficiency, generalization capability, and ability to learn in sparse-reward environments, which often require human-designed dense rewards. Meta-learning has emerged as a promising approach to address these issues by optimizing components of the learning algorithm to meet desired characteristics. Additionally, a different line of work has extensively studied the use of intrinsic rewards to enhance the exploration capabilities of algorithms. This work investigates how meta-learning can improve the training signal received by RL agents. The focus is on meta-learning intrinsic rewards under a framework that doesn’t rely on the use of meta-gradients. We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function. The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations, and with only sparse rewards accessible for the evaluation tasks. |
| format |
Objeto de conferencia Objeto de conferencia |
| author |
Pappalardo, Octavio Santos, Juan M. Ramele, Rodrigo |
| author_facet |
Pappalardo, Octavio Santos, Juan M. Ramele, Rodrigo |
| author_sort |
Pappalardo, Octavio |
| title |
Black box meta-learning intrinsic rewards for sparse-reward environments |
| title_short |
Black box meta-learning intrinsic rewards for sparse-reward environments |
| title_full |
Black box meta-learning intrinsic rewards for sparse-reward environments |
| title_fullStr |
Black box meta-learning intrinsic rewards for sparse-reward environments |
| title_full_unstemmed |
Black box meta-learning intrinsic rewards for sparse-reward environments |
| title_sort |
black box meta-learning intrinsic rewards for sparse-reward environments |
| publishDate |
2024 |
| url |
http://sedici.unlp.edu.ar/handle/10915/176190 |
| work_keys_str_mv |
AT pappalardooctavio blackboxmetalearningintrinsicrewardsforsparserewardenvironments AT santosjuanm blackboxmetalearningintrinsicrewardsforsparserewardenvironments AT ramelerodrigo blackboxmetalearningintrinsicrewardsforsparserewardenvironments |
| _version_ |
1845116771686678528 |