Towards Management of Energy Consumption in HPC Systems with Fault Tolerance
High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based faul...
Guardado en:
| Autores principales: | , , , |
|---|---|
| Formato: | Objeto de conferencia |
| Lenguaje: | Inglés |
| Publicado: |
2020
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/139146 |
| Aporte de: |
| id |
I19-R120-10915-139146 |
|---|---|
| record_format |
dspace |
| institution |
Universidad Nacional de La Plata |
| institution_str |
I-19 |
| repository_str |
R-120 |
| collection |
SEDICI (UNLP) |
| language |
Inglés |
| topic |
Informática Energy consumption energy saving Power management Fault tolerance uncoordinated checkpoint HPC Distributed memory MPI DVFS ACPI |
| spellingShingle |
Informática Energy consumption energy saving Power management Fault tolerance uncoordinated checkpoint HPC Distributed memory MPI DVFS ACPI Morán, Marina Balladini, Javier Rexachs del Rosario, Dolores Rucci, Enzo Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
| topic_facet |
Informática Energy consumption energy saving Power management Fault tolerance uncoordinated checkpoint HPC Distributed memory MPI DVFS ACPI |
| description |
High-performance computing continues to increase its computing power and energy efficiency. However, energy consumption continues to rise and finding ways to limit and/or decrease it is a crucial point in current research. For high-performance MPI applications, there are rollback recovery based fault tolerance methods, such as uncoordinated checkpoints. These methods allow only some processes to go back in the face of failure, while the rest of the processes continue to run. In this article, we focus on the processes that continue execution, and propose a series of strategies to manage energy consumption when a failure occurs and uncoordinated checkpoints are used. We present an energy model to evaluate strategies and through simulation we analyze the behavior of an application under different configurations and failure time. As a result, we show the feasibility of improving energy efficiency in HPC systems in the presence of a failure. |
| format |
Objeto de conferencia Objeto de conferencia |
| author |
Morán, Marina Balladini, Javier Rexachs del Rosario, Dolores Rucci, Enzo |
| author_facet |
Morán, Marina Balladini, Javier Rexachs del Rosario, Dolores Rucci, Enzo |
| author_sort |
Morán, Marina |
| title |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
| title_short |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
| title_full |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
| title_fullStr |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
| title_full_unstemmed |
Towards Management of Energy Consumption in HPC Systems with Fault Tolerance |
| title_sort |
towards management of energy consumption in hpc systems with fault tolerance |
| publishDate |
2020 |
| url |
http://sedici.unlp.edu.ar/handle/10915/139146 |
| work_keys_str_mv |
AT moranmarina towardsmanagementofenergyconsumptioninhpcsystemswithfaulttolerance AT balladinijavier towardsmanagementofenergyconsumptioninhpcsystemswithfaulttolerance AT rexachsdelrosariodolores towardsmanagementofenergyconsumptioninhpcsystemswithfaulttolerance AT ruccienzo towardsmanagementofenergyconsumptioninhpcsystemswithfaulttolerance |
| bdutipo_str |
Repositorios |
| _version_ |
1764820457136062465 |