H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments

Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Royo, Ambrosio, Villamayor, Jorge, Castro-León, Marcela, Rexachs del Rosario, Dolores, Luque Fadón, Emilio
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2018
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/69674
Aporte de:
id I19-R120-10915-69674
record_format dspace
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
spellingShingle Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
Royo, Ambrosio
Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
topic_facet Ciencias Informáticas
cloud computing
cloud, fault-tolerance, high-performance computing, RADIC
description Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.
format Objeto de conferencia
Objeto de conferencia
author Royo, Ambrosio
Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author_facet Royo, Ambrosio
Villamayor, Jorge
Castro-León, Marcela
Rexachs del Rosario, Dolores
Luque Fadón, Emilio
author_sort Royo, Ambrosio
title H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_short H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_full H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_fullStr H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_full_unstemmed H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
title_sort h-radic: the fault tolerance framework for virtual clusters on multi-cloud environments
publishDate 2018
url http://sedici.unlp.edu.ar/handle/10915/69674
work_keys_str_mv AT royoambrosio hradicthefaulttoleranceframeworkforvirtualclustersonmulticloudenvironments
AT villamayorjorge hradicthefaulttoleranceframeworkforvirtualclustersonmulticloudenvironments
AT castroleonmarcela hradicthefaulttoleranceframeworkforvirtualclustersonmulticloudenvironments
AT rexachsdelrosariodolores hradicthefaulttoleranceframeworkforvirtualclustersonmulticloudenvironments
AT luquefadonemilio hradicthefaulttoleranceframeworkforvirtualclustersonmulticloudenvironments
bdutipo_str Repositorios
_version_ 1764820481611923458