Soft errors detection and automatic recovery based on replication combined with different levels of checkpointing
Handling faults is a growing concern in HPC. In future exascale systems, it is projected that silent undetected errors will occur several times a day, increasing the occurrence of corrupted results. In this article, we propose SEDAR, which is a methodology that improves system reliability against tr...
Guardado en:
| Autores principales: | Montezanti, Diego Miguel, Rucci, Enzo, De Giusti, Armando Eduardo, Naiouf, Marcelo, Rexachs del Rosario, Dolores, Luque Fadón, Emilio |
|---|---|
| Formato: | Articulo Preprint |
| Lenguaje: | Inglés |
| Publicado: |
2020
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/124463 |
| Aporte de: |
Ejemplares similares
-
A methodology for soft errors detection and automatic recovery
por: Montezanti, Diego Miguel, et al.
Publicado: (2017) -
SEDAR: Soft Error Detection and Automatic Recovery in High Performance Computing Systems
por: Montezanti, Diego Miguel
Publicado: (2020) -
Factores que afectan el consumo energético de operaciones de checkpoint y restart en clusters
por: Morán, Marina, et al.
Publicado: (2018) -
Metodología para predecir el consumo energético de checkpoints en sistemas de HPC
por: Balladini, Javier, et al.
Publicado: (2014) -
IaaS Cloud as a Virtual Environment for Experimentation¡ in Checkpoint Analysis
por: León, Betzabeth, et al.
Publicado: (2019)