Journal
PARALLEL COMPUTING
Volume 57, Issue -, Pages 108-124Publisher
ELSEVIER
DOI: 10.1016/j.parco.2015.11.002
Keywords
High performance computing; Power simulation; Trapped power capacity; Power capping
Categories
Ask authors/readers for more resources
The power supplied to machine rooms tends to be over-provisioned because it is specified in practice not by workload demands but rather by high energy LINPACK runs or nameplate power estimates. This results in a considerable amount of trapped power capacity excess power infrastructure. Instead of being wasted, this trapped power capacity should be reclaimed to accommodate more compute nodes in the machine room and thereby increase system throughput. But to do this we need the ability to enforce a system-wide power cap. In this paper, we present TracSim, a full-system simulator that enables users to measure trapped power capacity and evaluate the performance of different policies for scheduling parallel tasks under a power cap. TracSim simulates the execution environment of a production HPC cluster at Los Alamos National Laboratory (LANL). TracSim enables users to specify the system topology, hardware configuration, power cap, and task workload and to develop resource configuration and task scheduling policies aimed at maximizing machine-room throughput while keeping power consumption under a power cap by exploiting CPU throttling techniques. We use real measurements from the LANL cluster to set TracSim's configuration parameters. We leverage TracSim to implement and evaluate four resource scheduling policies. Simulation results indicate the performance of those policies and quantify the amount of trapped capacity that can effectively be reclaimed. (C) 2015 Elsevier B.V. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available