Article
Computer Science, Hardware & Architecture
Yunki Han, Kangkyu Park, Youngbeom Jung, Lee-Sup Kim
Summary: In this paper, the authors propose an efficient GCN accelerator called EGCN, which reduces off-chip memory access by optimizing from both in-tile and out-of-tile perspectives. EGCN achieves significant improvements in performance, with reductions in off-chip DRAM access, speedup, and energy efficiency over state-of-the-art accelerators.
IEEE TRANSACTIONS ON COMPUTERS
(2022)
Article
Computer Science, Hardware & Architecture
Weiwen Jiang, Qiuwen Lou, Zheyu Yan, Lei Yang, Jingtong Hu, Xiaobo Sharon Hu, Yiyu Shi
Summary: The article discusses the co-exploration of neural architectures and hardware design, proposing the NACIM framework, which can find robust neural networks and achieve high energy-efficiency performance while considering device variation.
IEEE TRANSACTIONS ON COMPUTERS
(2021)
Article
Computer Science, Hardware & Architecture
Jingya Wu, Wenyan Lu, Guihai Yan, Xiaowei Li
Summary: Accelerators are widely used in various domains, but the bandwidth contention and hardware hazard in CPU-accelerator heterogeneous systems significantly bottleneck performance. To address this problem, a holistic profiling system called Portrait is proposed to model computation and bandwidth resource accurately and improve task scheduling efficiency.
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS
(2022)
Article
Computer Science, Hardware & Architecture
Daniel Casini, Paolo Pazzaglia, Alessandro Biondi, Marco Di Natale
Summary: This paper proposes a holistic framework for partitioning real-time applications on heterogeneous platforms with hardware accelerators. The model is inspired by a realistic setup of an advanced driving assistance system and can be applied to a broader range of heterogeneous architectures. The resulting analysis solves timing constraints, task-to-core mapping, task prioritization, and selection of computations to accelerate to find the most suitable trade-off between the smaller worst-case execution time provided by accelerators and synchronization and queuing delays.
JOURNAL OF SYSTEMS ARCHITECTURE
(2022)
Article
Engineering, Civil
Ki-In Na, Sunglok Choi, Jong-Hwan Kim
Summary: In this paper, an IMM-based adaptive target tracking method is proposed, which can adapt to various motion patterns and object types. Experimental results on synthetic and real datasets demonstrate the effectiveness of the proposed method.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
(2022)
Article
Computer Science, Hardware & Architecture
Marcelo Brandalero, Luigi Carro, Antonio Carlos Schneider Beck, Muhammad Shafique
Summary: The article proposes extending a single-ISA heterogeneous CMP with CGRA and DBT modules for accelerating applications in different scenarios. It introduces an additional voltage rail for low-energy operation and leverages the structure features of CGRA to address implementation challenges of NTV computing. Performance and energy consumption are improved with less than 35% area overhead compared to the baseline CMP.
IEEE TRANSACTIONS ON COMPUTERS
(2021)
Article
Computer Science, Information Systems
Zhongqin Wang, J. Andrew Zhang, Min Xu, Y. Jay Guo
Summary: In this work, a scheme named WiDFS is proposed, which achieves single-target real-time passive tracking using channel state information collected from commercial-off-the-shelf WiFi devices. The scheme accurately separates dynamic human components and estimates the Doppler frequency shift for tracking.
IEEE TRANSACTIONS ON MOBILE COMPUTING
(2023)
Article
Computer Science, Information Systems
Alessandro Gabrielli, Fabrizio Alfonsi, Alberto Annovi, Alessandra Camplani, Alessandro Cerri
Summary: FPGA technology has shown high computational performance in various applications, particularly suitable for high real-time requirements such as medical image analysis and high-energy physics particle trajectory recognition. This paper demonstrates the hardware implementation of complex algorithms using FPGA, proving that this technique can be accelerated not only through software-based systems, but also through consumer hardware devices. Xilinx UltraScale+ FPGA, as a member of the frontier family devices on the market, features high clock frequencies and acceptable energy consumption.
Article
Computer Science, Information Systems
Junkun Yan, Jinhui Dai, Wenqiang Pu, Shenghua Zhou, Hongwei Liu, Zheng Bao
Summary: In this article, a quality of service constrained-resource allocation (QoSC-RA) scheme is proposed for multiple target tracking in radar sensor network. The scheme divides radar sensors into groups and optimizes transmit resources to minimize total resource consumption while achieving desired MTT accuracy. Simulation results show that the QoSC-RA process can achieve predetermined MTT performance with smaller resource consumption compared to the uniform allocation scheme.
IEEE SYSTEMS JOURNAL
(2021)
Article
Computer Science, Information Systems
Marcello Barbirotta, Abdallah Cheikh, Antonio Mastrandrea, Francesco Menichelli, Marco Angioli, Saeid Jamili, Mauro Olivieri
Summary: High-performance embedded systems are driving the growth of the IoT through powerful processors, specialized hardware accelerators, and advanced software techniques. By combining hardware and software techniques, it is possible to design embedded architectures that can continue to function correctly even in the event of a failure or malfunction, thus increasing overall reliability and safety.
Article
Nuclear Science & Technology
An-Kang Hu, Rui Qiu, Huan Liu, Zhen Wu, Chun-Yan Li, Hui Zhang, Jun-Li Li, Rui-Jie Yang
Summary: The study developed a fast Monte Carlo tool, THUBrachy, which can be accelerated by different hardware accelerators. The GPU-accelerated THUBrachy is the fastest version, being 200 times faster than the serial version and around 500 times faster than Geant4. The proposed tool has great potential for fast and accurate dose calculations in clinical applications.
NUCLEAR SCIENCE AND TECHNIQUES
(2021)
Article
Computer Science, Information Systems
Wei Zhao, Shing-Chow Chan, Jian-Qiang Lin
Summary: This paper introduces a new variant of the PAST algorithm with multiple deflation (MD) and presents an efficient hardware architecture. By performing multiple deflations at each step and utilizing variable forgetting factor and variable regularization, the algorithm improves overall convergence rate and numerical properties. The proposed algorithm also includes methods for estimating eigenvalues and the signal subspace dimension.
Article
Engineering, Aerospace
Peter John-Baptiste, Kristine L. Bell, Joel Tidmore Johnson, Graeme Edward Smith
Summary: This work introduces a fully adaptive radar multiple target tracking (FAR-MTT) model, which outperforms static parameter selections. By developing a multiple target Fisher information matrix and an optimization scheme, the model achieves accurate tracking of each target and reduces resource usage in multiple target environments.
IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS
(2022)
Article
Computer Science, Hardware & Architecture
Shubham Jain, Hsinyu Tsai, Ching-Tzu Chen, Ramachandran Muralidhar, Irem Boybat, Martin M. Frank, Stanislaw Wozniak, Milos Stanisavljevic, Praneet Adusumilli, Pritish Narayanan, Kohji Hosokawa, Masatoshi Ishii, Arvind Kumar, Vijay Narayanan, Geoffrey W. Burr
Summary: This paper introduces a highly heterogeneous and programmable compute-in-memory (CIM) accelerator architecture for deep neural network (DNN) inference. The architecture combines CIM memory array tiles for energy-efficient multiply-accumulate operations with special-function compute cores for auxiliary digital computation. The paper discusses the design of the analog fabric, the efficiency in mapping DNNs onto the hardware, and the efficiency in pipelining various DNN workloads across different batch sizes. The experimental results show competitive throughput and significantly higher energy efficiency compared to NVIDIA A100.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
(2023)
Article
Chemistry, Multidisciplinary
Run Yan, Libo Huang, Hui Guo, Yashuai Lu, Ling Yang, Nong Xiao, Yongwen Wang, Li Shen, Mengqiao Lan
Summary: This article introduces a novel architecture, the RT engine, that accelerates ray tracing by utilizing strategies such as multiple stacks, three-phase break method, and approximation method.
APPLIED SCIENCES-BASEL
(2022)
Editorial Material
Computer Science, Hardware & Architecture
Peter Marwedel, Tulika Mitra, Martin Edin Grimheden, Hugo A. Andrade
IEEE DESIGN & TEST
(2020)
Editorial Material
Computer Science, Hardware & Architecture
Tulika Mitra, Andreas Gerstlauer
IEEE DESIGN & TEST
(2021)
Article
Computer Science, Hardware & Architecture
Sami Salamin, Martin Rapp, Anuj Pathania, Arka Maity, Joerg Henkel, Tulika Mitra, Hussam Amrouch
Summary: The article explores the system- and application-level benefits of NCFET-based multi-/many-core designs compared to state-of-the-art FinFET-based designs in terms of performance and power efficiency. It shows that a novel type of technology-based heterogeneity, in which cores with the same microarchitecture but different ferroelectric (FE) thickness are combined, can significantly increase power efficiency.
IEEE TRANSACTIONS ON COMPUTERS
(2021)
Article
Computer Science, Hardware & Architecture
Zhaoying Li, Dhananjaya Wijerathne, Xianzhang Chen, Anuj Pathania, Tulika Mitra
Summary: This article introduces a CGRA mapper called ChordMap, which automatically generates a high-quality mapping of streaming applications represented as SDF onto CGRAs. By using optimized spatio-temporal mapping and modulo-scheduling, ChordMap achieves higher throughput compared to existing technologies.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
(2022)
Article
Computer Science, Software Engineering
Guanhua Wang, Sudipta Chattopadhyay, Ivan Gotovchits, Tulika Mitra, Abhik Roychoudhury
Summary: The Spectre vulnerability in modern processors has been widely reported, and the static analysis approach oo7 is proposed to mitigate Spectre attacks by detecting and patching potentially vulnerable code snippets in program binaries. This method can detect various Spectre-vulnerable code patterns, insert fences at vulnerable conditional branches to prevent speculative execution, with an observed performance overhead of around 5.9% on SPECint benchmarks.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
(2021)
Article
Computer Science, Hardware & Architecture
Dhananjaya Wijerathne, Zhaoying Li, Anuj Pathania, Tulika Mitra, Lothar Thiele
Summary: The article introduces a fast and scalable CGRA mapping method called HiMap, which can generate close-to-optimal solutions, improve performance and energy efficiency, and has a short compilation time.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
(2022)
Article
Computer Science, Hardware & Architecture
Vanchinathan Venkataramani, Bruno Bodin, Aditi Kulkarni Mohite, Tulika Mitra, Li-Shiuan Peh
Summary: This article proposes an application-specific, non-TDM communication scheduling mechanism for bufferless software-defined NoCs. By utilizing the SDF model and task interactions and timing information, ASCENT achieves high performance and predictability.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
(2022)
Proceedings Paper
Computer Science, Hardware & Architecture
Zhaoying Li, Dan Wu, Dhananjaya Wijerathne, Tulika Mitra
Summary: This paper presents a portable compilation framework called LISA, which automatically adjusts to generate high-quality mappings for various spatial accelerators. By using graph neural networks to analyze graph attributes and considering the impact of dataflow graph structure on node placement and dependency routing, an optimized mapping strategy is achieved.
2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022)
(2022)
Proceedings Paper
Automation & Control Systems
Jinho Lee, Burin Amornpaisannon, Tulika Mitra, Trevor E. Carlson
Summary: This research improves the performance and efficiency of graph accelerators by maximizing parallelism and optimizing interconnect structure.
PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022)
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Kuluhan Binici, Nam Trung Pham, Tulika Mitra, Karianto Leman
Summary: With the increasing popularity of deep learning on edge devices, compressing large neural networks to meet the hardware requirements of resource-constrained devices has become a significant research direction. This paper addresses the problem of catastrophic forgetting in existing data-free distillation methods and proposes a data-free KD framework that maintains a dynamic collection of generated samples over time. The experiments demonstrate that the proposed framework improves the accuracy of student models obtained via KD on various datasets.
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022)
(2022)
Proceedings Paper
Computer Science, Hardware & Architecture
Thilini Kaushalya Bandara, Dhananjaya Wijerathne, Tulika Mitra, Li-Shiuan Peh
Summary: This research aims to improve energy efficiency in CGRAs by introducing heterogeneity and corresponding compiler support. The study proposes an automated design space exploration framework, REVAMP, which converts homogeneous CGRAs into irregular architectures through optimizing compute, network, and memory heterogeneity. The research showcases REVAMP on three homogeneous CGRAs, demonstrating its effectiveness.
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS
(2022)
Proceedings Paper
Computer Science, Artificial Intelligence
Dhananiaya Wijerathne, Zhaoying Li, Anuj Pathania, Tulika Mitra, Lothar Thiele
Summary: CGRA as a promising hardware accelerator relies on high-quality compilers for optimal performance, where HiMap offers a fast and scalable mapping approach that improves performance and energy efficiency significantly while reducing compilation time.
PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021)
(2021)
Article
Computer Science, Hardware & Architecture
Martin Rapp, Anuj Pathania, Tulika Mitra, Joerg Henkel
Summary: The performance of a task on a many-core with distributed shared LLC depends on the power budget and LLC latency. Task migrations can help maintain peak performance. The relative impacts of power budget and LLC latency on task performance may change in different execution phases.
IEEE TRANSACTIONS ON COMPUTERS
(2021)
Article
Engineering, Electrical & Electronic
Arka Maity, Anuj Pathania, Tulika Mitra
JOURNAL OF LOW POWER ELECTRONICS AND APPLICATIONS
(2020)
Proceedings Paper
Computer Science, Software Engineering
Alexander Hoffman, Anuj Pathania, Philipp H. Kindt, Samarjit Chakraborty, Tulika Mitra
PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC)
(2020)