4.7 Article

Peak Power Management to Meet Thermal Design Power in Fault-Tolerant Embedded Systems

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2018.2858816

关键词

Peak power consumption; fault tolerance; embedded systems; multicore platforms; thermal design power

资金

  1. Sharif University of Technology [G930827]

向作者/读者索取更多资源

Multicore platforms provide a great opportunity for implementation of fault-tolerance techniques to achieve high reliability in real-time embedded systems. Passive redundancy is well-suited for multicore platforms and a well-established technique to tolerate transient and permanent faults. However, it incurs significant power overheads, which go wasted in fault-free execution scenarios. Meanwhile, due to the Thermal Design Power (TDP) constraint, in some cases, it is not feasible to simultaneously power on all cores on a multicore platform. Since TDP is the maximum sustainable power that a chip can consume, violating TDP makes some cores automatically restart or significantly reduce their performance to prevent a permanent damage. This may affect timeliness of the system, and hence, designers face a challenge in deciding how to use multicore platforms in real-time embedded systems. In this paper, at first, we study how the use of passive redundancy (especially for Triple Modular redundancy) can violate TDP on multicore platforms. Then, we propose a scheme for scheduling real-time tasks in multicore systems to conquer the peak power problem in NMR systems. This is because in multicore embedded systems an efficient solution for meeting the TDP constraint is reducing the peak power consumption. The proposed scheme tries to remove overlaps of the peak power of concurrently executing tasks to keep the maximum power consumption below the chip TDP. In the proposed scheme, we devised a policy called PPA-LTF to manage peak power consumption. This policy prevents tasks execution that consume higher power according to the tasks' power traces. Our experimental results show that our scheme provides up to 50 percent (on average by 39 percent) peak power reduction compared to state-of-the-art schemes.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

Article Computer Science, Information Systems

A Survey of Fault-Tolerance Techniques for Embedded Systems From the Perspective of Power, Energy, and Thermal Issues

Sepideh Safari, Mohsen Ansari, Heba Khdr, Pourya Gohari-Nazari, Sina Yari-Karin, Amir Yeganeh-Khaksar, Shaahin Hessabi, Alireza Ejlali, Jorg Henkel

Summary: This paper provides an in-depth survey of task mapping/scheduling policies for fault-tolerance real-time embedded systems. It reviews and classifies these policies according to their goals and constraints, considering factors such as application models and hardware models. The survey analyzes the achievements and shortcomings of existing approaches and highlights the most promising ones.

IEEE ACCESS (2022)

Article Computer Science, Theory & Methods

TherMa-MiCs: Thermal-Aware Scheduling for Fault-Tolerant Mixed-Criticality Systems

Sepideh Safari, Heba Khdr, Pourya Gohari-Nazari, Mohsen Ansari, Shaahin Hessabi, Joerg Henkel

Summary: This paper presents a thermal-aware scheduling scheme named TherMa-MiCs for fault-tolerant MCSs. The scheme ensures the temperature constraint while satisfying the timing constraints of high-criticality tasks and maximizing the QoS of low-criticality tasks.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Information Systems

ReLIEF: A Reinforcement-Learning-Based Real-Time Task Assignment Strategy in Emerging Fault-Tolerant Fog Computing

Roozbeh Siyadatzadeh, Fatemeh Mehrafrooz, Mohsen Ansari, Bardia Safaei, Muhammad Shafique, Jorg Henkel, Alireza Ejlali

Summary: Due to the real-time requirements in IoT applications, fog computing has emerged to overcome the constraints of cloud computing. However, the reliability of executing real-time tasks in fog computing is a significant challenge. This article proposes a novel task assignment strategy based on machine learning to improve the reliability of fog-based IoT systems. The proposed technique reduces task dropping rate by up to 84% and increases system reliability by nearly 72% compared to state-of-the-art methods.

IEEE INTERNET OF THINGS JOURNAL (2023)

Proceedings Paper Computer Science, Hardware & Architecture

ATLAS: Aging-Aware Task Replication for Multicore Safety-Critical Systems

Mohsen Ansari, Sepideh Safari, Amir Yeganeh-Khaksar, Roozbeh Siyadatzadeh, Pourya Gohari-Nazari, Heba Khdr, Muhammad Shafique, Joerg Henkel, Alireza Ejlali

Summary: In this paper, an aging-aware task replication method called ATLAS is proposed for multicore safety-critical systems. The method updates the required number of replicas for each task to meet the reliability target, and reduces the temperature to decelerate aging effects. Experimental results demonstrate the effectiveness of the proposed method in improving schedulability by 16.1% on average and reducing the temperature by 7.4 degrees C.

2023 IEEE 29TH REAL-TIME AND EMBEDDED TECHNOLOGY AND APPLICATIONS SYMPOSIUM, RTAS (2023)

Article Computer Science, Hardware & Architecture

Passive Primary/Backup-Based Scheduling for Simultaneous Power and Reliability Management on Heterogeneous Embedded Systems

Sina Yari-Karin, Roozbeh Siyadatzadeh, Mohsen Ansari, Alireza Ejlali

Summary: In addition to real-time constraint, power/energy efficiency and high reliability are important objectives for real-time embedded systems. Heterogeneous multicore systems have been considered as a suitable solution for achieving joint power/energy efficiency and high reliability. However, power/energy and reliability are conflicting requirements due to fault-tolerance techniques. The proposed method in this article uses a passive primary/backup technique to maintain system reliability while reducing power/energy consumption in heterogeneous multicore systems. It maps primary and backup tasks in a mixed manner to take advantage of different core types and schedules backup tasks after primary tasks to avoid overlap. Experimental results demonstrate the power efficiency and effectiveness of our proposed method in terms of scheduling.

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING (2023)

Article Computer Science, Information Systems

Thermal-Aware Standby-Sparing Technique on Heterogeneous Real-Time Embedded Systems

Mohsen Ansari, Sepideh Safari, Sina Yari-Karin, Pourya Gohari-Nazari, Heba Khdr, Muhammad Shafique, Joerg Henkel, Alireza Ejlali

Summary: This paper proposes a thermal-aware standby-sparing technique that aims to maximize the Quality of Service (QoS) of real-time tasks while meeting power constraints and preventing thermal emergencies. The technique tolerates faults and reduces power consumption by removing overlaps between main and backup tasks. By employing a heterogeneous platform, the main tasks are executed on high-performance cores while the backup tasks are executed on low-power cores. Experiments show significant improvements in QoS, power consumption, and temperature reduction.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING (2022)

Article Computer Science, Hardware & Architecture

MASTER: Reclamation of Hybrid Scratchpad Memory to Maximize Energy Saving in Multi-Core Edge Systems

Mohsen Shekarisaz, Ali Hoseinghorban, Mostafa Bazzaz, Mohammad Salehi, Alireza Ejlali

Summary: This paper addresses the issue of energy consumption in the memory subsystem of edge devices by proposing a task mapping, scheduling, and dynamic allocation scheme based on hybrid Scratchpad Memories (SPM). By formulating the hybrid SPM allocation using integer linear programming, the energy consumption of the memory subsystem is minimized. Experimental results demonstrate that the proposed scheme outperforms the existing heuristic dynamic data allocation algorithm, achieving up to 34% energy savings in the memory subsystem.

IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING (2022)

Article Computer Science, Theory & Methods

Power-Aware Checkpointing for Multicore Embedded Systems

Mohsen Ansari, Sepideh Safari, Heba Khdr, Pourya Gohari-Nazari, Joerg Henkel, Alireza Ejlali, Shaahin Hessabi

Summary: This article introduces a peak-power-aware checkpointing (PPAC) technique that tolerates faults and meets power constraints in hard real-time embedded systems. By adjusting the timing of checkpoints and utilizing the available slack times on the cores, the technique reduces peak power and saves energy.

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (2022)

Article Computer Science, Information Systems

PVMC: Task Mapping and Scheduling Under Process Variation Heterogeneity in Mixed-Criticality Systems

Fahimeh Bahrami, Behnaz Ranjbar, Nezam Rohbani, Alireza Ejlali

Summary: Embedded Systems have transitioned from special-purpose hardware to commodity hardware, and have tended towards Mixed-Criticality implementations. Multi-cores bring new challenges due to Process Variation and affect the predictability of Embedded Systems. This work explores variation-aware techniques to improve reliability, scheduling, and energy saving in Mixed-Criticality systems.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING (2022)

Article Computer Science, Information Systems

ReMap: Reliability Management of Peak-Power-Aware Real-Time Embedded Systems Through Task Replication

Amir Yeganeh-Khaksar, Mohsen Ansari, Alireza Ejlali

Summary: This article proposes a method for mapping and scheduling periodic soft real-time tasks in multicore embedded systems to achieve a given reliability target while keeping the total power consumption under the chip TDP. Experimental results show that the proposed method can significantly reduce peak power consumption.

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING (2022)

暂无数据