A parallel implementation of Q-learning based on communication with cache

Q-Learning is a Reinforcement Learning method for solving sequential decision problems, where the utility of actions depends on a sequence of decisions and there exists uncertainty about the dynamics of the environment the agent is situated on. This general framework has allowed that Q-Learning and...

Descripción completa

Guardado en:

Detalles Bibliográficos
Autores principales:	Printista, Alicia Marcela, Errecalde, Marcelo Luis, Montoya, Cecilia Inés
Formato:	Articulo
Lenguaje:	Inglés
Publicado:	2002
Materias:	Ciencias Informáticas Parallel programming Redes de Comunicación de Computadores Informática Aprendizaje communication based on cache reinforcement learning asynchronous dynamic programming
Acceso en línea:	http://sedici.unlp.edu.ar/handle/10915/9432 http://journal.info.unlp.edu.ar/wp-content/uploads/p41.pdf
Aporte de:	SEDICI (UNLP) de Universidad Nacional de La Plata

Descripción
Sumario:	Q-Learning is a Reinforcement Learning method for solving sequential decision problems, where the utility of actions depends on a sequence of decisions and there exists uncertainty about the dynamics of the environment the agent is situated on. This general framework has allowed that Q-Learning and other Reinforcement Learning methods to be applied to a broad spectrum of complex real world problems such as robotics, industrial manufacturing, games and others. Despite its interesting properties, Q-learning is a very slow method that requires a long period of training for learning an acceptable policy. In order to solve or at least reduce this problem, we propose a parallel implementation model of Q-learning using a tabular representation and via a communication scheme based on cache. This model is applied to a particular problem and the results obtained with different processor configurations are reported. A brief discussion about the properties and current limitations of our approach is finally presented.

A parallel implementation of Q-learning based on communication with cache

Ejemplares similares