Loading...

Archive

    For Selected: Toggle Thumbnails
    Words from the chief editor
    Song JIANG
    2024, 41(1): 1-1. 
    Abstract ( )   HTML ( )   PDF (550KB) ( )  
    Performance Optimization Techniques and Parallel Numerical Algorithms for Supercomputing
    Preface
    Zeyao MO
    2024, 41(1): 2-2. 
    Abstract ( )   HTML ( )   PDF (588KB) ( )  
    Meeting minutes of panel session of HPCMid22
    Zeyao MO, Long WANG, Jie LIU, Guangming TAN, Weifeng LIU, Zhibin YU, Jidong ZHAI, Hailong YANG, Xiaowen XU
    2024, 41(1): 3-8.  DOI: 10.19596/j.cnki.1001-246x.8818
    Abstract ( )   HTML ( )   PDF (658KB) ( )  

    2022年12月12日, 第八届高性能计算中间件技术研讨会(HPCMid22)成功召开。HPCMid (会议网址: http://www.caep-scns.ac.cn/HPCMid.php)每年举办一次, 面向科学与工程计算数值模拟应用在当前及下一代超级计算机上面临的挑战, 围绕高性能计算中间件关键技术, 邀请相关学者报告最新研究进展并探讨未来发展趋势。第八届研讨会以"适配新型体系结构的性能优化技术"为主题, 聚焦后摩尔时代新型体系结构为科学与工程计算带来的机遇与挑战, 探讨新型体系结构下可移植性能优化技术的发展趋势。本届研讨会的专家座谈(Panel Session)环节由莫则尧研究员和徐小文研究员共同主持, 邀请了王龙、刘杰、谭光明、刘伟峰、喻之斌5位来自高校、科研院所、企业的专家围绕"性能优化: 个性vs共性"这一主题开展了深入的讨论与交流, 翟季冬、杨海龙等多位专家也参与了讨论。专家们针对性能优化技术的研究现状与发展趋势、面临的问题与挑战以及人才培养等方面发表了许多有启发性的观点。《计算物理》编辑部特将本次讨论整理后发表, 以飨读者。限于篇幅, 略有删节。

    Parallel Algorithm Libraries for Tianhe Supercomputers
    Jie LIU, Yongzhen SHI, Bo YANG, Xiang ZHANG, Xinhai CHEN, Huajian ZHANG, Xiaowei GUO, Shengguo LI, Runhua LI, Jintao PENG, Tiaojie XIAO, Xuguang CHEN, Qingyang ZHANG, Biao LI, Can LENG, Yushui LI, Qinglin WANG
    2024, 41(1): 9-21.  DOI: 10.19596/j.cnki.1001-246x.8784
    Abstract ( )   HTML ( )   PDF (7923KB) ( )  

    Tianhe supercomputers developed by the National University of Defense Technology won first place in the world's supercomputing TOP500 seven times. To exploit the high efficiency of those systems, the Tianhe team extracted the common key computing methods in large-scale scientific and engineering computing, designed and implemented scalable parallel algorithms for those methods according to the characteristics of the Tianhe supercomputers, and developed the Tianhe parallel algorithm libraries which are an important part of the Tianhe application-support environment. This paper first reviews the development history and system structures of Tianhe supercomputing systems. Subsequently, the architecture, functions, and performance of common parallel libraries such as grid processing libraries, partial differential equation discrete solving libraries, matrix computing libraries, particle transport libraries, collective communication libraries, and deep learning libraries are highlighted. Finally, a summary of typical application software on Tianhe supercomputers shows that the parallel algorithm libraries can effectively support the rapid development and performance optimization of typical application software.

    Sparse General Matrix-matrix Multiplication for Sunway Manycore Architecture
    Kan LIU, Lei YANG, Wei XUE, Wenguang CHEN
    2024, 41(1): 22-32.  DOI: 10.19596/j.cnki.1001-246x.8766
    Abstract ( )   HTML ( )   PDF (8404KB) ( )  

    A parallel algorithm for sparse general matrix-matrix multiplication (SpGEMM), swSpGEMM, targeting the new generation Sunway many-core architecture is proposed. The algorithm addresses the load balance issue caused by the distribution of nonzeros in input matrix, using a light weight parallel task partitioning. For the irregular memory access and inefficient instruction pipelining in accumulating the product, a hierarchical sparse accumulator has been proposed to maximize the utilization of local memory with different input matrix features and to relieve the instruction dependency in integer searching, resulting in more efficient use of the computing capability of the hardware. On large matrices from the SuiteSparse sparse matrix collection, the algorithm outperforms MKL on two Intel Xeon GOLD 6132 processors by 21.1% and cuSPARSE on NVIDIA A100 by 95.3%.

    Auto-tuning for Sparse Matrix-vector Multiplication
    Zhen DU, Guangming TAN
    2024, 41(1): 33-39.  DOI: 10.19596/j.cnki.1001-246x.8763
    Abstract ( )   HTML ( )   PDF (4372KB) ( )  

    SpMV (sparse matrix-vector multiplication) is a widely used kernel in scientific computing. Since the performance of specific SpMV program is closely related to the distribution of non-zero elements in sparse matrices, there is no universal SpMV program design that can achieve high performance in all matrices. Therefore, auto-tuning has become a popular method for high SpMV performance. This paper analyzes the difficulties in optimizing SpMV and introduces two representative works of auto-tuning: SMAT, which is based on pre-implemented templates and AlphaSparse which designs SpMV programs from scratch. This paper introduces their designs, implementations, test results, advantages, and disadvantages. Finally, the trend of SpMV auto-tuning is analyzed and predicted.

    Efficient Asynchronous Performance Prediction for Heterogeneous Systems
    Yuyang JIN, Zixuan MA, Jidong ZHAI
    2024, 41(1): 40-51.  DOI: 10.19596/j.cnki.1001-246x.8759
    Abstract ( )   HTML ( )   PDF (5225KB) ( )  

    An efficient asynchronous performance prediction method is proposed to guide the design of asynchronous strategies. This method decomposes the performance behavior of synchronous and asynchronous execution and achieves fast and accurate prediction through hierarchical modeling, graph-based simulation and other techniques. Based on this method, the performance of HPL on the Sunway TaihuLight supercomputer is predicted. The experimental results show that the method achieves an accuracy of 96.61% on average for 4 million cores, with a prediction cost as low as milliseconds.

    SEMD: A Cross-platform Automatic Performance Optimization Programming Tool for Real Numerical Simulation Software
    Peng ZHANG, Aiqing ZHANG, Zeyao MO, Jingtao WANG
    2024, 41(1): 52-63.  DOI: 10.19596/j.cnki.1001-246x.8777
    Abstract ( )   HTML ( )   PDF (14870KB) ( )  

    Aiming at the lack of reusability and portability in the manual optimization of software, we propose and implement SEMD, a cross-platform automatic performance optimization programming tool for numerical simulation software. It abstracts numerical computing loop programming using high-level semantics, which is prevalent in the field of numerical simulation, completely shielding underlying hardware features and performance optimization implementations. Therefore, any numerical subroutines written based on SEMD can attain automatic cross-platform performance portability. Our tests demonstrate that SEMD's performance optimization effects exceed those of comparable products on three different processor architectures, including X86, ARM and GPU. Furthermore, SEMD has been successfully applied in the development of four real numerical simulation software programs in the fields of structure, fluid, and electromagnetic, resulting in an average performance improvement of 164% on hotspot subroutines.

    Feature-modified Algorithm Framework for Parallel Preconditioning in Sparse Linear Solvers
    Xiaowen XU, Zeyao MO, Shaoliang HU, Hengbin AN
    2024, 41(1): 64-74.  DOI: 10.19596/j.cnki.1001-246x.8787
    Abstract ( )   HTML ( )   PDF (9899KB) ( )  

    To address the high computational complexity of sparse linear solvers caused by complex physical characteristics in practical applications, this paper presents a unified framework for feature-modified preconditioning algorithms. By refining the algebraic features affecting the efficiency from physical characteristics and combining multilevel feature analysis, we construct feature-modified components. The effectiveness of this framework is demonstrated through several typical feature-modified preconditioning algorithms and their application results.

    Nonlinear Iterative Methods for Radiation Diffusion Equations
    Hengbin AN, Zeyao MO
    2024, 41(1): 75-86.  DOI: 10.19596/j.cnki.1001-246x.8765
    Abstract ( )   HTML ( )   PDF (1439KB) ( )  

    To improve the robustness and convergence speed of the Newton method and Picard method of solving radiation diffusion equations, several work is introduced when they are used to solve the three temperature radiation diffusion equation system, including the selection of initial iteration value, the treatment of physical constraints in the iterative process, the combination of the Picard iterative method and Anderson acceleration, and the improvement of Anderson acceleration method. By applying application-driven treatments and improvements, the two methods can be used to solve the nonlinear radiation diffusion equations.

    Feature-driven Parallel Algebraic Multigrid Methods for Multi-group Radiation Diffusion Problems
    Shi SHU, Xiaoqiang YUE, Jianmeng HE, Xiaowen XU, Zeyao MO
    2024, 41(1): 87-97.  DOI: 10.19596/j.cnki.1001-246x.8768
    Abstract ( )   HTML ( )   PDF (1141KB) ( )  

    Firstly, a review is given by classifying the existing fast algorithms for solving large-scale discrete linear systems arising from the Multi-Group Radiation Diffusion (MGRD) equations. Secondly, based on our recent work on parallel algebraic multigrid (AMG), two preconditioning algorithms and related theoretical frameworks are developed on a higher level. One is the approximate Schur complement type based on physical quantities and the other is the combined type based on physical and algebraic features, and the relevant components of these works are portrayed within these frameworks. Based on the above framework, a approximate Schur complement preconditioner with fundamental approximation property and low computational complexity is designed, and the corresponding spectral equivalence theory is established. Numerical experiments show that the new preconditioner has better robustness and computational efficiency. Finally, several issues that need to be further addressed are presented.

    Application-oriented Preconditioning of Seepage Mechanics
    Chunsheng FENG, Shizhe LI, Shenghao LIU, Chensong ZHANG, Li ZHAO
    2024, 41(1): 98-109.  DOI: 10.19596/j.cnki.1001-246x.8791
    Abstract ( )   HTML ( )   PDF (1481KB) ( )  

    The seepage mechanics model comprises multiple nonlinearly coupled partial differential equations. In various applications, seepage mechanics problems exhibit distinct characteristics and the corresponding solution methods are also very different. This paper focuses on the representative mathematical models used in oil and gas reservoir development. It introduces the mathematical formulation and application characteristics of multiphase multicomponent seepage mechanics equations within porous media, along with efficient techniques for solving their discretized linear equation systems, including commonly employed preconditioning methods. Additionally, this study appropriately modifies standard test cases and evaluates the shared-memory parallel efficiency of these preconditioning methods.

    JPSOL: A Parallel Numerical Algebraic Solver Driven by Application Features
    Shaoliang HU, Xiaowen XU, Hengbin AN, Ran XU, Ronghong FAN
    2024, 41(1): 110-121.  DOI: 10.19596/j.cnki.1001-246x.8771
    Abstract ( )   HTML ( )   PDF (14507KB) ( )  

    JPSOL(J Parallel Solver Library for Numerical Algebra Problems) is introduced, including the software architecture, matrix vector data structure, three kinds of algorithm libraries (linear, nonlinear and eigenvalue) and domain specific solvers. Then, the high parallel scalability of JPSOL are demonstrated by the testing results of basic iterative methods. Finally, the effect and robustness of JPSOL are demonstrated by several typical practical applications.

    Convergence Estimation and Characteristic Analysis of A Two-level Iterative Algorithm for Discretized Three-temperature Energy Linear Systems
    Yue HAO, Silu HUANG, Xiaowen XU
    2024, 41(1): 122-130.  DOI: 10.19596/j.cnki.1001-246x.8767
    Abstract ( )   HTML ( )   PDF (1918KB) ( )  

    In this paper, we study in detail the specific convergence property of the physical-variable-based coarsening two-level iterative method (PCTL) algorithm based on the theory of algebraic multigrid method (AMG), and give a reasonable upper bound on the convergence factor, which provides a theoretical guarantee for the PCTL algorithm. Moreover, we also analyze the algebraic features that affect the convergence of the PCTL algorithm, such as diagonal dominance and coupling strength, hoping to provide theoretical guidance for the applications and algorithm optimization of the PCTL algorithm.

    A Review of Algorithms and Applications of Solvers with Quantum Computing Acceleration
    Kang XU, Zeyang LI, Zhufeng GUO, Yingtong SHEN, Wei WANG, Minhui GOU, Zizheng WANG, Yukun WANG, Weifeng LIU
    2024, 41(1): 131-150.  DOI: 10.19596/j.cnki.1001-246x.8778
    Abstract ( )   HTML ( )   PDF (1957KB) ( )  

    Quantum computing is a new computing model based on the principles of quantum mechanics. Because of its powerful parallelism far superior to classical computing, quantum computing is considered as a computational method that may have a subversive impact on the future, providing a new way to solve some complex problems. The algorithms and applications of quantum solvers in numerical computation-related problems of large-scale science and engineering are reviewed. In particular, systems of linear equations, eigenvalue problems, differential equations, Hamiltonian and graph computation, quantum machine learning, quantum solver platform, and practical numerical simulation have been introduced. Aiming at different numerical computing problems, the current mainstream quantum computing algorithms are introduced in detail, and the research progress of relevant algorithms at home and abroad in recent years is comprehensively summarized. Finally, the future development trend of quantum computing in numerical algebra solving is prospected.

Share: