Searching. Please wait…
1452
37
174
32671
4708
2710
372
420
Abstract: In recent years, energy consumption has become a limiting factor in the evolution of high-performance computing (HPC) clusters in terms of environmental concern and maintenance cost. The computing power of these clusters is increasing, together with the demands of the workloads they execute. A key component in HPC systems is the workload manager, whose operation has a substantial impact on the performance and energy consumption of the clusters. Recent research has employed machine learning techniques to optimise the operation of this component. However, these attempts have focused on homogeneous clusters where all the cores are pooled together and considered equal, disregarding the fact that they are contained in nodes and that they can have different performances. This work presents an intelligent job scheduler based on deep reinforcement learning that focuses on reducing energy consumption of heterogeneous HPC clusters. To this aim it leverages information provided by the users as well as the power consumption specifications of the compute resources of the cluster. The scheduler is evaluated against a set of heuristic algorithms showing that it has potential to give similar results, even in the face of the extra complexity of the heterogeneous cluster.
Fuente: Journal of Supercomputing, 2025, 81(2), 427
Publisher: Kluwer Academic Publishers
Publication date: 01/01/2025
No. of pages: 23
Publication type: Article
DOI: 10.1007/s11227-024-06907-y
ISSN: 0920-8542,1573-0484
Spanish project: PID2022-136454NB-C21
Publication Url: https://doi.org/10.1007/s11227-024-06907-y
SCOPUS
Citations
Google Scholar
Metrics
UCrea Repository Read publication
LÓPEZ, MARTA
ESTEBAN STAFFORD FERNANDEZ
JOSÉ LUIS BOSQUE ORERO
Back