High availability for parallel computers
Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant arc...
Autores principales: | , |
---|---|
Formato: | Articulo |
Lenguaje: | Inglés |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/9677 http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct10-1.pdf |
Aporte de: |
id |
I19-R120-10915-9677 |
---|---|
record_format |
dspace |
institution |
Universidad Nacional de La Plata |
institution_str |
I-19 |
repository_str |
R-120 |
collection |
SEDICI (UNLP) |
language |
Inglés |
topic |
Ciencias Informáticas Fault tolerance Reliability, availability, and serviceability |
spellingShingle |
Ciencias Informáticas Fault tolerance Reliability, availability, and serviceability Rexachs del Rosario, Dolores Luque Fadón, Emilio High availability for parallel computers |
topic_facet |
Ciencias Informáticas Fault tolerance Reliability, availability, and serviceability |
description |
Fault tolerance has become an important issue for parallel applications in the last few years. The parallel systems' users want them to be reliable considering two main dimensions, availability and data consistency. Availability can be provided with solutions such as RADIC, a fault tolerant architecture with different protection levels, offering high availability with transparency, decentralization, flexibility and scalability for message-passing systems. Transient faults may cause an application running in a computer system to be removed from execution, however the biggest risk of transient faults is to provoke undetected data corruption that changes the final result of the application without anyone knowing. To evaluate the effects of transient faults in the robustness of applications and validate new fault detection mechanism and strategies, we have developed a full-system simulation fault injection environment |
format |
Articulo Articulo |
author |
Rexachs del Rosario, Dolores Luque Fadón, Emilio |
author_facet |
Rexachs del Rosario, Dolores Luque Fadón, Emilio |
author_sort |
Rexachs del Rosario, Dolores |
title |
High availability for parallel computers |
title_short |
High availability for parallel computers |
title_full |
High availability for parallel computers |
title_fullStr |
High availability for parallel computers |
title_full_unstemmed |
High availability for parallel computers |
title_sort |
high availability for parallel computers |
publishDate |
2010 |
url |
http://sedici.unlp.edu.ar/handle/10915/9677 http://journal.info.unlp.edu.ar/wp-content/uploads/JCST-Oct10-1.pdf |
work_keys_str_mv |
AT rexachsdelrosariodolores highavailabilityforparallelcomputers AT luquefadonemilio highavailabilityforparallelcomputers |
bdutipo_str |
Repositorios |
_version_ |
1764820492438470657 |