Article
Computer Science, Hardware & Architecture
Weilin Cai, Heng Chen, Ziheng Wang, Xingjun Zhang
Summary: Data has always been the most valuable asset for enterprises and research institutions and protecting the confidentiality of data is crucial. To improve the efficiency of large-scale data encryption and decryption, a parallel version of the ChaCha20 stream cipher, optimized for the SW26010 heterogeneous multi-core processor on the Sunway TaihuLight supercomputer, was implemented. Multiple optimization methods were used to achieve a maximum throughput of 32.43 GB/s on a single SW26010 processor and good scalability up to 8296.43 GB/s on 1024 core groups.
JOURNAL OF SUPERCOMPUTING
(2022)
Article
Computer Science, Theory & Methods
Guoqing Xiao, Kenli Li, Yuedan Chen, Wangquan He, Albert Y. Zomaya, Tao Li
Summary: This paper introduces a customized and accelerative framework for SpMV on the Sunway, addressing performance limitations. CASpMV shows significant improvement over generic parallel SpMV on the Sunway and exhibits good scalability on multiple CGs.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Information Systems
Mingfan Li, Han Lin, Junshi Chen, Jose Monsalve Diaz, Qian Xiao, Rongfen Lin, Fei Wang, Guang R. Gao, Hong An
Summary: This paper introduces a large-scale distributed framework swFLOW for deep learning tasks on Sunway TaihuLight. By analyzing and optimizing the performance of convolutional neural networks, the processing speed has been significantly improved. The use of Elastic Averaging Stochastic Gradient Descent (EASGD) algorithm reduces communication overhead and achieves high parallel efficiency.
INFORMATION SCIENCES
(2021)
Article
Multidisciplinary Sciences
Zhimin Wang, Zhaoyun Chen, Shengbin Wang, Wendong Li, Yongjian Gu, Guoping Guo, Zhiqiang Wei
Summary: The new quantum circuit simulator developed on the Sunway TaihuLight supercomputer is versatile and efficient, capable of simulating different quantum state amplitudes and supporting various types of quantum operations. This simulator has the potential to be widely applied in developing quantum algorithms in various fields.
SCIENTIFIC REPORTS
(2021)
Article
Computer Science, Hardware & Architecture
Yuxuan Li, Xiaohui Duan, Lin Gan, Wubing Wan, Yuhu Chen, Kai Xu, Jinzhe Yang, Weiguo Liu, Wei Xue, Haohuan Fu, Guangwen Yang
Summary: The Community Atmosphere Model (CAM) has been successfully ported and optimized on the Sunway TaihuLight system, achieving high-performance climate modeling capabilities.
IEEE TRANSACTIONS ON COMPUTERS
(2022)
Article
Computer Science, Theory & Methods
Xin Liu, Jun Sun, Lin Zheng, Su Wang, Yao Liu, Tongquan Wei
Summary: This article introduces an optimized parallel NSGA-II algorithm on the Sunway TaihuLight system, which effectively overcomes the challenges of low memory bandwidth and capacity through an improved model and various optimization techniques. Experimental results demonstrate significant speedups achieved by the algorithm in path planning and ZDT1 cases.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Information Systems
Xiaoyan Liu, Yi Liu, Bohong Yin, Hailong Yang, Zhongzhi Luan, Depei Qian
Summary: This paper introduces an optimized algorithm swSpAMM for accelerating large-scale decay matrix multiplication on supercomputers. Intra-node optimizations, including algorithm parallelization and block-major data layout, are utilized to better exploit the architecture advantage of the Sunway processor. Inter-node optimizations, such as a matrix organization strategy and a dynamic scheduling strategy, are proposed to achieve improved load balance. Experimental results demonstrate that swSpAMM achieves speedups of up to 14.5x and 2.2x compared to existing libraries and methods on a single node and multiple nodes, respectively.
FRONTIERS OF COMPUTER SCIENCE
(2023)
Article
Computer Science, Interdisciplinary Applications
Genshen Chu, Yang Li, Runchu Zhao, Shuai Ren, Wen Yang, Xinfu He, Changjun Hu, Jue Wang
Summary: MISA-MD successfully simulates the cascade collision of reactor pressure vessel steel by utilizing EAM potential, hash algorithm, and acceleration optimization strategies based on the SW26010 processor. Experimental results demonstrate good accuracy and scalability of MISA-MD, with higher efficiency and lower memory usage compared to LAMMPS.
COMPUTER PHYSICS COMMUNICATIONS
(2021)
Article
Physics, Multidisciplinary
Quanyong Xu, Hu Ren, Hanfeng Gu, Jie Wu, Jingyuan Wang, Zhifeng Xie, Guangwen Yang
Summary: This paper presents the development of an efficient implicit solver, "sprayDyMFoam," based on the Sunway TaihuLight supercomputer for the whole-engine numerical simulation of aeroengines. The solver improves the PIMPLE algorithm in the solution of aerodynamic force and adjusts the droplet atomization model for the combustion process to ensure the matching between components and the combustion chamber. The parallel communication mechanism for AMI boundary processing is also optimized. The sprayDyMFoam solver shows good computational capacity and efficiency in simulating a typical double-rotor turbofan engine.
Article
Computer Science, Hardware & Architecture
Renjiang Chen, Tao Liu, Zhaoyuan Liu, Li Wang, Min Tian, Ying Guo, Jingshan Pan, Xiaoming Wu, Meihong Yang
Summary: With the development of nuclear energy technology, it has become necessary to use high-performance computers for reactor simulation calculations. The method of characteristics (MOC) is the preferred method for simulating neutron transport in the nuclear reactor core. This paper proposes a fine-grained and universal two-level parallelization method based on the architecture of Sunway many-core processor and Sunway Bluelight II supercomputer. The parallelization achieved significant speedup and good scalability on Sunway Bluelight II with up to 18.6x performance improvement.
JOURNAL OF SUPERCOMPUTING
(2023)
Article
Computer Science, Information Systems
Ming Dun, Yunchun Li, Qingxiao Sun, Hailong Yang, Wei Li, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian
Summary: swCPD is an efficient CPD implementation that accelerates optimization algorithms with a hierarchical partitioning mechanism, achieving better performance on emerging processor architectures.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Theory & Methods
Lei Xu, Honghui Shang, Xin Chen, Yunquan Zhang, Lifang Wang, Xingyu Gao, Haifeng Song
Summary: The atomic kinetic Monte Carlo method connects microscale mechanism with macroscale evolution and is important in material simulations. However, long-time simulation of multi-component materials is challenging due to the need for significant computing resources. With exascale computing, kinetic Monte Carlo (KMC) simulations can now be enabled with ultra-high computing power. In this paper, OpenKMC is optimized for the new-generation Sunway supercomputer, resulting in a 37x performance enhancement and the ability to perform trillion-atom simulations with 85% parallel efficiency when powered by 10 million cores.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2023)
Article
Computer Science, Information Systems
Jie Tan, Jianmin Pang, Cong Liu
Summary: This study investigates the impact of software migration between different platforms on software performance and identifies code smell density, cyclomatic complexity, and complex functions as key factors affecting the performance change.
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS
(2022)
Article
Computer Science, Theory & Methods
Zhiyong Xiao, Xu Liu, Jingheng Xu, Qingxiao Sun, Lin Gan
Summary: The study introduces a highly scalable hybrid parallel genetic algorithm based on Sunway TaihuLight Supercomputer, utilizing Cellular and Island models to achieve impressive performance for large-scale problems.
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE
(2021)
Article
Computer Science, Theory & Methods
Yilian Zhang, Yao Liu, Penglong Jiao, Yiping Zhou, Tongquan Wei
Summary: This article proposes an automatic multi-parameter performance modeling method for HPC applications on the new Sunway supercomputer. The method achieves low overhead performance profiling through a lightweight performance profiling technique and builds performance models based on the Fourier neural operator, which have high prediction accuracy and generalization ability.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2023)
Article
Computer Science, Artificial Intelligence
Jie Liang, Kenli Li, Chubo Liu, Keqin Li
Summary: Mobile edge computing (MEC) is a promising technology for supporting computation-intensive tasks on mobile devices. A new algorithm called joint re-ordering and frequency scaling (JRFS) is proposed to optimize makespan by considering offloading tasks with precedence constraints among tasks in a MEC environment with multiple servers. Extensive experiments show that JRFS outperforms several other methods in terms of makespan.
Article
Computer Science, Artificial Intelligence
Lian Chen, Wangdong Yang, Kenli Li, Keqin Li
Summary: In this study, the UIWMF and DUIWMF recommendation algorithms based on matrix factorization are proposed to effectively handle large-scale implicit feedback data and improve recommendation accuracy. By utilizing weight strategies and parallel learning algorithms, the issues of negative feedback information retrieval and single machine resource constraints are successfully addressed.
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS
(2021)
Article
Computer Science, Hardware & Architecture
Fan Tang, Chubo Liu, Kenli Li, Zhuo Tang, Keqin Li
Summary: This paper focuses on optimizing task migration considering user mobility in mobile edge computing environment to maximize the number of tasks meeting deadlines. By designing a group migration algorithm, significant performance improvements are achieved compared to other common heuristics.
JOURNAL OF SYSTEMS ARCHITECTURE
(2021)
Article
Computer Science, Artificial Intelligence
Zeshan Hu, Lin Xiao, Kenli Li, Keqin Li, Jichun Li
Summary: By utilizing simplified nonlinear activation functions, two new SFTZNN models were designed to efficiently solve the time-varying matrix pseudoinversion problem. Theoretical analysis provided maximum convergence time and upper bounds of steady-state residual error in ideal conditions and with external perturbations. Comparative simulations and an engineering application confirmed the feasibility and superiority of the new SFTZNN models.
APPLIED SOFT COMPUTING
(2021)
Article
Computer Science, Theory & Methods
Guoqing Xiao, Kenli Li, Yuedan Chen, Wangquan He, Albert Y. Zomaya, Tao Li
Summary: This paper introduces a customized and accelerative framework for SpMV on the Sunway, addressing performance limitations. CASpMV shows significant improvement over generic parallel SpMV on the Sunway and exhibits good scalability on multiple CGs.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Information Systems
Yibi Chen, Xiaofeng Zou, Kenli Li, Keqin Li, Xulei Yang, Cen Chen
Summary: This paper introduces a deep learning-based method for region-based prediction in smart cities, utilizing multiple local 3D CNN spatial-temporal residual networks (LMST3D-ResNet) to extract various temporal dependencies for predicting future citywide activities.
INFORMATION SCIENCES
(2021)
Article
Computer Science, Theory & Methods
Saiqin Long, Weifan Long, Zhetao Li, Kenli Li, Yuanqing Xia, Zhuo Tang
Summary: This study focuses on the task assignment problems in collaborative edge and cloud environments, utilizing a distributed, non-cooperative approach. By establishing queuing models and applying game theory to minimize task costs while meeting QoS constraints, Greedy Energy-aware Algorithm and Best Response Algorithm were proposed. The convergence of the algorithms was discussed, and results demonstrate that the BRA algorithm can quickly reach a solution close to Nash equilibrium.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Theory & Methods
Chubo Liu, Fan Tang, Yikun Hu, Kenli Li, Zhuo Tang, Keqin Li
Summary: Mobile edge computing (MEC) provides cloud-like capabilities to mobile users, with research focusing on task migration using reinforcement learning algorithms and distributed approaches. Experimental results show that the distributed task migration algorithm can significantly reduce the average completion time of tasks.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)
Article
Computer Science, Information Systems
Wensheng Luo, Xu Zhou, Jianye Yang, Peng Peng, Guoqing Xiao, Yunjun Gao
Summary: This article aims to efficiently compute the top-r influential communities in a network where nodes may have arbitrary weights by improving existing techniques and developing efficient algorithms to judge the connectivity of nodes with the same weight. Performance studies on real data sets demonstrate the effectiveness and efficiency of the proposed approaches.
IEEE INTERNET OF THINGS JOURNAL
(2021)
Article
Computer Science, Artificial Intelligence
Mincan Li, Zidong Wang, Kenli Li, Xiangke Liao, Kate Hone, Xiaohui Liu
Summary: This article introduces a novel layered MAS model that addresses the multitask multiagent allocation problem using deep Q-learning and MSDE methods. The MSDE-SPEA2-based method is proposed to tackle many-objective optimization problem with various objectives like task allocation, completion time, agent satisfaction, etc.
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION
(2021)
Article
Computer Science, Theory & Methods
Yuedan Chen, Guoqing Xiao, Kenli Li, Francesco Piccialli, Albert Y. Zomaya
Summary: Sparse matrix-sparse vector multiplication is a fundamental and important operation in high-performance scientific and engineering applications. This paper proposes a fine-grained parallel framework to overcome the challenges of scalability in high-performance computing systems. The framework utilizes multi-stage and hybrid parallelism, adaptive parallel execution, and optimization techniques to accelerate computation and utilize computing resources. Experimental results show significant performance improvements with different input sparsity.
ACM TRANSACTIONS ON PARALLEL COMPUTING
(2022)
Article
Computer Science, Hardware & Architecture
Juan Liu, Guoqing Xiao, Fan Wu, Xiangke Liao, Kenli Li
Summary: This paper presents AAPP, an accelerative and adaptive path planner based on RRT* algorithm, to overcome the limitations in bandwidth, load imbalance, high computing complexity, and the choice of parameters. AAPP compresses large-scale map data using simplified compressed sparse rows (SCSR), and proposes a two-layer parallel framework named TLRRT* to exploit GPU computing performance. The paper also designs a two-stage parallel framework named TSRRT* to address load imbalance and computing complexity, and presents optimizations to adaptively select execution schemes and parameters. Experimental results show that AAPP achieves a speedup of up to 22.72x over RRT* algorithm, and can handle large-scale datasets with shorter trajectory lengths.
IEEE TRANSACTIONS ON COMPUTERS
(2023)
Article
Automation & Control Systems
Xian Zhang, Guoqing Xiao, Mingxing Duan, Yuedan Chen, Kenli Li
Summary: Nowadays, subgraph matching, a fundamental problem in various applications, is becoming more and more challenging due to the NP-hardness, explosive growth of graph data, high energy consumption, and overhead of CPU and GPU platforms. To address this issue, we propose a phased hybrid algorithm called PH-CF based on the CPU-FPGA heterogeneous platform, which exploits the pipeline and data flow mechanism, low power consumption, and configurable characteristics of FPGA. Experimental results demonstrate that PH-CF outperforms the state-of-the-arts, providing average performance improvements of up to 16.07x, 38.61x, and 11.46x over CFL, CECI, and DP-iso, respectively, while showing good stability and robustness on various datasets.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
(2023)
Article
Mathematics
Walaa H. El-Ashmawi, Ahmad Salah, Mahmoud Bekhit, Guoqing Xiao, Khalil Al Ruqeishi, Ahmed Fathalla
Summary: The BPPC is a less-studied variation of the classic combinatorial optimization problem. This work proposes an improved jellyfish metaheuristic algorithm to solve the BPPC by defining jellyfish operations. The proposed method outperforms other comparison methods in terms of the number of bins and the average bin utilization.
Article
Computer Science, Theory & Methods
Hao Li, Zixuan Li, Kenli Li, Jan S. Rellermeyer, Lydia Y. Chen, Keqin Li
Summary: The STD algorithm aims to obtain an optimal low-rank representation feature for sparse tensors but faces the issue of intermediate variables explosion. To address this problem, a novel stochastic optimization strategy called SGD_Tucker is proposed, which shows significant advantages in handling high-dimensional intermediate variables and achieving faster computation speeds in experiments.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
(2021)