High availability for parallel computers

Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant arc...

Descripción completa

Detalles Bibliográficos
Autores principales: Rexachs del Rosario, Dolores, Luque Fadón, Emilio
Formato: Articulo
Lenguaje:Inglés
Publicado: 2010
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/9677
http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct10-1.pdf
Aporte de:
id I19-R120-10915-9677
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
spellingShingle Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
High availability for parallel computers
topic_facet Ciencias Informáticas
Fault tolerance
Reliability, availability, and serviceability
description Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant architecture with different protection levels, offering high availability with transparency, decentralization, flexibility and scalability for message-passing systems. Transient faults may cause an application running in a computer system to be removed from execution, however the biggest risk of transient faults is to provoke undetected data corruption that changes the final result of the application without anyone knowing. To evaluate the effects of transient faults in the robustness of applications and validate new fault detection mechanism and strategies, we have developed a full-system simulation fault injection environment
format Articulo
Articulo
author Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author_facet Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author_sort Rexachs del Rosario, Dolores
title High availability for parallel computers
title_short High availability for parallel computers
title_full High availability for parallel computers
title_fullStr High availability for parallel computers
title_full_unstemmed High availability for parallel computers
title_sort high availability for parallel computers
publishDate 2010
url http://sedici.unlp.edu.ar/handle/10915/9677
http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct10-1.pdf
work_keys_str_mv AT rexachsdelrosariodolores highavailabilityforparallelcomputers
AT luquefadonemilio highavailabilityforparallelcomputers
bdutipo_str Repositorios
_version_ 1764820492438470657