4.6 Article

Devices and architectures for photonic chip-scale integration

Journal

APPLIED PHYSICS A-MATERIALS SCIENCE & PROCESSING
Volume 95, Issue 4, Pages 989-997

Publisher

SPRINGER HEIDELBERG
DOI: 10.1007/s00339-009-5109-2

Keywords

-

Ask authors/readers for more resources

Silicon nanophotonics holds the promise of dramatically advancing the state of the art in computing by enabling parallel architectures that combine unprecedented performance and ease of use with affordable power consumption. This paper presents a design study for a many-core architecture called Corona which utilizes dense wavelength division multiplexing (DWDM) for on- and off-chip communication together with the devices which will be needed to implement such a communication infrastructure.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

Article Computer Science, Hardware & Architecture

TRiM: Tensor Reduction in Memory

Byeongho Kim, Jaehyun Park, Eojin Lee, Minsoo Rhu, Jung Ho Ahn

Summary: Personalized recommendation systems are important in industry and the embedding layers within them are memory-intensive. To address performance bottlenecks, a fine-grained near-data processing architecture has been proposed for DRAM, with in-DRAM reduction units at different levels achieving significant performance improvements. Hot embedding-vector replication is also introduced to alleviate load imbalances across reduction units.

IEEE COMPUTER ARCHITECTURE LETTERS (2021)

Article Computer Science, Hardware & Architecture

MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units

Sunjung Lee, Jaewan Choi, Wonkyung Jung, Byeongho Kim, Jaehyun Park, Hweesoo Kim, Jung Ho Ahn

Summary: Mobile and edge devices are commonly used for inferring CNNs, but existing accelerators are not optimal for the latest CNN models, especially DW-CONV and SE models. This paper proposes a CNN acceleration architecture called MVP, which efficiently processes both compute- and memory-intensive operations with a small area overhead on top of the baseline systolic-array-based architecture.

ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS (2022)

Article Computer Science, Hardware & Architecture

GraNDe: Near-Data Processing Architecture With Adaptive Matrix Mapping for Graph Convolutional Networks

Sungmin Yun, Byeongho Kim, Jaehyun Park, Hwayong Nam, Jung Ho Ahn, Eojin Lee

Summary: Graph Convolutional Network (GCN) models have high accuracy in interpreting graph data, with one of the key components being the aggregation operation. A proposed new architecture, GraNDe, accelerates memory-intensive aggregation operations and achieves a speedup of up to 4.3x on open-graph benchmark datasets compared to baseline systems.

IEEE COMPUTER ARCHITECTURE LETTERS (2022)

Article Computer Science, Hardware & Architecture

ADT: Aggressive Demotion and Promotion for Tiered Memory

Yaebin Moon, Wanju Doh, Kwanhee Kyung, Eojin Lee, Jung Ho Ahn

Summary: Tiered memory using DRAM as fast memory and slower-but-larger byte-addressable memory as slow memory is a promising approach to expand main-memory capacity. Proactive demotion schemes are used to demote cold pages to slow memory, even when there is sufficient free space in fast memory. The proposed ADT scheme performs aggressive demotion and promotion by extending the unit of demotion/promotion, reducing fast-memory usage by 29% with only a 2.3% performance drop and outperforming state-of-the-art schemes for tiered memory management.

IEEE COMPUTER ARCHITECTURE LETTERS (2023)

Article Computer Science, Hardware & Architecture

Unleashing the Potential of PIM: Accelerating Large Batched Inference of Transformer-Based Generative Models

Jaewan Choi, Jaehyun Park, Kwanhee Kyung, Nam Sung Kim, Jung Ho Ahn

Summary: Transformer-based generative models utilize attention to summarize input sequences and generate output sequences. However, conventional computing platforms are inefficient in handling attention. To address this issue, we propose AttAcc, which takes advantage of the reuse of KV matrices during summarization and reduces external bandwidth and energy consumption by processing in-memory.

IEEE COMPUTER ARCHITECTURE LETTERS (2023)

Article Computer Science, Hardware & Architecture

X-ray: Discovering DRAM Internal Structure and Error Characteristics by Issuing Memory Commands

Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

Summary: The demand for accurate information about the internal structure and characteristics of DRAM is increasing. This paper presents reliable findings on the internal structure and characteristics of DRAM using activate-induced bitflips (AIBs), retention time test, and row-copy operation.

IEEE COMPUTER ARCHITECTURE LETTERS (2023)

Article Computer Science, Hardware & Architecture

A Hardware-Friendly Tiled Singular-Value Decomposition-Based Matrix Multiplication for Transformer-Based Models

Hailong Li, Jaewan Choi, Yongsuk Kwon, Jung Ho Ahn

Summary: Transformer-based models are widely used in NLP tasks, but matrix multiplication can be time-consuming. This paper introduces a hardware-friendly approach called tiling singular value decomposition (TSVD) for matrix multiplication, which leverages GPU resources more efficiently and mitigates the loss of important information. The experimental results show that TSVD-matmul achieves significant speedup compared to the SVD approach.

IEEE COMPUTER ARCHITECTURE LETTERS (2023)

Proceedings Paper Computer Science, Hardware & Architecture

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse

Jongmin Kim, Gwangho Lee, Sangpyo Kim, Gina Sohn, Minsoo Rhu, John Kim, Jung Ho Ahn

Summary: In this paper, we propose an accelerator called ARK for FHE, which accelerates the bootstrapping operation through runtime data generation and inter-operation key reuse, enabling practical FHE workloads. This approach reduces the size of the working set, maximizes on-chip memory utilization, and effectively handles the heavy computation and data movement overheads of FHE.

2022 55TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO) (2022)

Article Computer Science, Hardware & Architecture

Future Scaling of Memory Hierarchy for Tensor Cores and Eliminating Redundant Shared Memory Traffic Using Inter-Warp Multicasting

Sunjung Lee, Seunghwan Hwang, Michael Jaemin Kim, Jaewan Choi, Jung Ho Ahn

Summary: This paper investigates the differences in computational performance and memory bandwidth between the CUDA core and the Tensor core in NVIDIA GPUs. Through comparisons and analysis of different generations of Tensor cores, a new method to reduce shared memory traffic is proposed. The experimental results show that inter-warp multicasting significantly improves the performance of deep neural networks.

IEEE TRANSACTIONS ON COMPUTERS (2022)

Proceedings Paper Computer Science, Hardware & Architecture

BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption

Sangpyo Kim, Jongmin Kim, Michael Jaemin Kim, Wonkyung Jung, John Kim, Minsoo Rhu, Jung Ho Ahn

Summary: Homomorphic encryption enables secure cloud computation by performing computations on encrypted data. However, the technique of bootstrapping, which allows unlimited operations or fully homomorphic encryption, requires significant additional computation and memory bandwidth. This paper proposes BTS, a hardware accelerator that supports bootstrapping as a first-class citizen, achieving improved execution time through parallel processing elements and deterministic communication patterns.

PROCEEDINGS OF THE 2022 THE 49TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA '22) (2022)

Proceedings Paper Computer Science, Hardware & Architecture

Mithril: Cooperative Row Hammer Protection on Commodity DRAM Leveraging Managed Refresh

Michael Jaemin Kim, Jaehyun Park, Yeonhong Park, Wanju Doh, Namhoon Kim, Tae Jun Ham, Jae W. Lee, Jung Ho Ahn

Summary: The Row Hammer (RH) phenomenon has attracted significant attention from the research community due to its security implications. Existing RH-protection schemes have various shortcomings and limitations. This paper introduces Mithril, the first RFM interface-compatible, DRAM-MC cooperative RH-protection scheme, and its optional extension Mithril+. The proposed schemes aim to address the challenges and provide deterministic protection guarantees.

2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022) (2022)

Proceedings Paper Computer Science, Hardware & Architecture

MaPHeA: A Lightweight Memory Hierarchy-Aware Profile-Guided Heap Allocation Framework

Deok-Jae Oh, Yaebin Moon, Eojin Lee, Tae Jun Ham, Yongjun Park, Jae W. Lee, Jung Ho Ahn

Summary: MaPHeA is a lightweight memory hierarchy-aware profile-guided heap allocation framework applicable to both HPC and embedded systems. It improves application performance by optimizing the allocation of dynamically allocated heap objects with low profiling overhead and without additional user intervention. By identifying frequently accessed heap objects and allocating them to fast DRAM regions, MaPHeA can significantly improve the performance of memory-intensive workloads.

LCTES '21: PROCEEDINGS OF THE 22ND ACM SIGPLAN/SIGBED INTERNATIONAL CONFERENCE ON LANGUAGES, COMPILERS, AND TOOLS FOR EMBEDDED SYSTEMS (2021)

Proceedings Paper Computer Science, Hardware & Architecture

Accelerating Fully Homomorphic Encryption Through Microarchitecture-Aware Analysis and Optimization

Wonkyung Jung, Eojin Lee, Sangpyo Kim, Namhoon Kim, Keewoo Lee, Chohong Min, Jung Hee Cheon, Jung Ho Ahn

2021 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2021) (2021)

Article Computer Science, Information Systems

Accelerating Fully Homomorphic Encryption Through Architecture-Centric Analysis and Optimization

Wonkyung Jung, Eojin Lee, Sangpyo Kim, Jongmin Kim, Namhoon Kim, Keewoo Lee, Chohong Min, Jung Hee Cheon, Jung Ho Ahn

Summary: Homomorphic Encryption (HE) is a popular privacy-preserving approach for cloud computing, with schemes like HE for Arithmetic of Approximate Numbers (HEAAN) gaining popularity due to their support for approximate computations and unlimited arithmetic operations. However, the high computation complexity of HE, especially in ciphertext arithmetic like HE multiplication (HE Mul), has led to a lack of rigorous analysis in accelerating HE and optimizing performance for different parallel processing platforms.

IEEE ACCESS (2021)

Article Computer Science, Hardware & Architecture

Row-Streaming Dataflow Using a Chaining Buffer and Systolic Array plus Structure

Hweesoo Kim, Sunjung Lee, Jaewan Choi, Jung Ho Ahn

Summary: The SysAr+ structure proposed in this letter enhances data reuse in the CONV layer without the need for im2col pre-processing, resulting in significant energy consumption reduction and improved performance in ResNet and DenseNet models.

IEEE COMPUTER ARCHITECTURE LETTERS (2021)

No Data Available