|
|
|
|
| LEADER |
01809nam a22005177a 4500 |
| 003 |
AR-BaUEN |
| 005 |
20220913135132.0 |
| 008 |
210730s2018 xxud|||f |||| 001 0 eng d |
| 020 |
|
|
|a 9780262039246
|
| 040 |
|
|
|a AR-BaUEN
|b spa
|c AR-BaUEN
|
| 044 |
|
|
|a xxu
|
| 080 |
|
|
|a 004.85
|
| 100 |
1 |
|
|a Sutton, Richard S.
|
| 245 |
1 |
0 |
|a Reinforcement learning :
|b an introduction /
|c Richard S. Sutton and Andrew G. Barto
|
| 250 |
|
|
|a 2nd ed.
|
| 260 |
|
|
|a Cambridge, MA :
|b MIT Press,
|c c2018
|
| 300 |
|
|
|a xxii, 526 :
|b il., gráfs.
|
| 490 |
0 |
|
|a Adaptive computation and machine learning series
|
| 504 |
|
|
|a Referencias bibliográficas pp. 481-518.
|
| 504 |
|
|
|a Índice analítico de materias.
|
| 505 |
0 |
0 |
|g I
|
| 505 |
0 |
0 |
|t Introduction
|g 1.
|
| 505 |
0 |
0 |
|t Tabular solution methods
|g I
|
| 505 |
0 |
0 |
|t Multi-armed bandits
|g 2.
|
| 505 |
0 |
0 |
|t Finite Markov decision processes
|g 3.
|
| 505 |
0 |
0 |
|t Dynamic programming
|g 4.
|
| 505 |
0 |
0 |
|t Monte Carlo methods
|g 5.
|
| 505 |
0 |
0 |
|t Temporal-difference learning
|g 6.
|
| 505 |
0 |
0 |
|t n-step bootstrapping
|g 7.
|
| 505 |
0 |
0 |
|t Planning and learning with tabular methods
|g 8.
|
| 505 |
0 |
0 |
|t Approximate solution methods
|g II
|
| 505 |
0 |
0 |
|t On-policy prediction with approximation
|g 9
|
| 505 |
0 |
0 |
|t On-policy control with approximation
|g 10.
|
| 505 |
0 |
0 |
|t *Off-policy methods with approximation
|g 11.
|
| 505 |
0 |
0 |
|t Eligibility traces
|g 12.
|
| 505 |
0 |
0 |
|t Policy gradient methods
|g 13.
|
| 505 |
0 |
0 |
|t Looking deeper
|g III
|
| 505 |
0 |
0 |
|t Psychology
|g 14.
|
| 505 |
0 |
0 |
|t Neuroscience
|g 15.
|
| 505 |
0 |
0 |
|t Applications and case studies
|g 16.
|
| 505 |
0 |
0 |
|t Frontiers
|g 17.
|
| 650 |
1 |
2 |
|2 spines
|a INTELIGENCIA ARTIFICIAL
|
| 653 |
1 |
0 |
|a APRENDIZAJE POR REFUERZO
|
| 700 |
1 |
|
|a Barto, Andrew G.
|
| 962 |
|
|
|a info:eu-repo/semantics/book
|b info:eu-repo/semantics/publishedVersion
|
| 999 |
|
|
|c 89673
|