Soft errors detection and automatic recovery based on replication combined with different levels of checkpointing

Handling faults is a growing concern in HPC. In future exascale systems, it is projected that silent undetected errors will occur several times a day, increasing the occurrence of corrupted results. In this article, we propose SEDAR, which is a methodology that improves system reliability against tr...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Montezanti, Diego Miguel, Rucci, Enzo, De Giusti, Armando Eduardo, Naiouf, Marcelo, Rexachs del Rosario, Dolores, Luque Fadón, Emilio
Formato: Articulo Preprint
Lenguaje:Inglés
Publicado: 2020
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/124463
Aporte de:

Ejemplares similares