Checkpoint and Restart: An Energy Consumption Characterization in Clusters
The fault tolerance method currently used in High Perfor- mance Computing (HPC) is the rollback-recovery method by using check- points. This, like any other fault tolerance method, adds an additional energy consumption to that of the execution of the application. The objective of this work is to...
Guardado en:
| Autores principales: | , , , |
|---|---|
| Formato: | Articulo article acceptedVersion |
| Lenguaje: | Inglés |
| Publicado: |
arXiv
2024
|
| Materias: | |
| Acceso en línea: | https://rdi.uncoma.edu.ar/handle/uncomaid/19173 |
| Aporte de: |
| id |
I22-R178-uncomaid-19173 |
|---|---|
| record_format |
dspace |
| spelling |
I22-R178-uncomaid-191732025-12-23T14:11:07Z Checkpoint and Restart: An Energy Consumption Characterization in Clusters Morán, Marina Balladini, Javier Rexachs, Dolores Luque, Emilio Checkpoint Restart Energy consumption Power Fault tol- erance methods Ciencias de la Computación e Información Artículos The fault tolerance method currently used in High Perfor- mance Computing (HPC) is the rollback-recovery method by using check- points. This, like any other fault tolerance method, adds an additional energy consumption to that of the execution of the application. The objective of this work is to determine the factors that affect the energy consumption of the computing nodes on homogeneous cluster, when per- forming checkpoint and restart operations, on SPMD (Single Program Multiple Data) applications. We have focused on the energetic study of compute nodes, contemplating different configurations of hardware and software parameters. We studied the effect of performance states (states P) and power states (states C) of processors, application problem size, checkpoint software (DMTCP) and distributed file system (NFS) config- uration. The results analysis allowed to identify opportunities to reduce the energy consumption of checkpoint and restart operations. Fil: Morán, Marina. Universidad Nacional del Comahue. Facultad de Informática; Argentina. Fil: Balladini, Javier. Universidad Nacional del Comahue. Facultad de Informática; Argentina. Fil: Rexachs, Dolores. Universitat Autónoma de Barcelona. Departamento de Arquitectura de Computadores y Sistemas Operativos; España. Fil: Luque, Emilio. Universitat Autónoma de Barcelona. Departamento de Arquitectura de Computadores y Sistemas Operativos; España. 2024 2025-12-17T14:54:56Z 2025-12-17T14:54:56Z Articulo article acceptedVersion https://rdi.uncoma.edu.ar/handle/uncomaid/19173 eng https://arxiv.org/abs/2409.02214 Atribución-NoComercial-CompartirIgual 4.0 https://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf 15 p. application/pdf arXiv |
| institution |
Universidad Nacional del Comahue |
| institution_str |
I-22 |
| repository_str |
R-178 |
| collection |
Repositorio Institucional UNCo |
| language |
Inglés |
| topic |
Checkpoint Restart Energy consumption Power Fault tol- erance methods Ciencias de la Computación e Información Artículos |
| spellingShingle |
Checkpoint Restart Energy consumption Power Fault tol- erance methods Ciencias de la Computación e Información Artículos Morán, Marina Balladini, Javier Rexachs, Dolores Luque, Emilio Checkpoint and Restart: An Energy Consumption Characterization in Clusters |
| topic_facet |
Checkpoint Restart Energy consumption Power Fault tol- erance methods Ciencias de la Computación e Información Artículos |
| description |
The fault tolerance method currently used in High Perfor-
mance Computing (HPC) is the rollback-recovery method by using check-
points. This, like any other fault tolerance method, adds an additional
energy consumption to that of the execution of the application. The
objective of this work is to determine the factors that affect the energy
consumption of the computing nodes on homogeneous cluster, when per-
forming checkpoint and restart operations, on SPMD (Single Program
Multiple Data) applications. We have focused on the energetic study of
compute nodes, contemplating different configurations of hardware and
software parameters. We studied the effect of performance states (states
P) and power states (states C) of processors, application problem size,
checkpoint software (DMTCP) and distributed file system (NFS) config-
uration. The results analysis allowed to identify opportunities to reduce
the energy consumption of checkpoint and restart operations. |
| format |
Articulo article acceptedVersion |
| author |
Morán, Marina Balladini, Javier Rexachs, Dolores Luque, Emilio |
| author_facet |
Morán, Marina Balladini, Javier Rexachs, Dolores Luque, Emilio |
| author_sort |
Morán, Marina |
| title |
Checkpoint and Restart: An Energy Consumption Characterization in Clusters |
| title_short |
Checkpoint and Restart: An Energy Consumption Characterization in Clusters |
| title_full |
Checkpoint and Restart: An Energy Consumption Characterization in Clusters |
| title_fullStr |
Checkpoint and Restart: An Energy Consumption Characterization in Clusters |
| title_full_unstemmed |
Checkpoint and Restart: An Energy Consumption Characterization in Clusters |
| title_sort |
checkpoint and restart: an energy consumption characterization in clusters |
| publisher |
arXiv |
| publishDate |
2024 |
| url |
https://rdi.uncoma.edu.ar/handle/uncomaid/19173 |
| work_keys_str_mv |
AT moranmarina checkpointandrestartanenergyconsumptioncharacterizationinclusters AT balladinijavier checkpointandrestartanenergyconsumptioncharacterizationinclusters AT rexachsdolores checkpointandrestartanenergyconsumptioncharacterizationinclusters AT luqueemilio checkpointandrestartanenergyconsumptioncharacterizationinclusters |
| _version_ |
1854720037399560192 |