计算物理 ›› 2024, Vol. 41 ›› Issue (1): 9-21.DOI: 10.19596/j.cnki.1001-246x.8784

• 面向超级计算机的性能优化技术与数值并行算法专刊 • 上一篇    下一篇

天河超算系统并行算法库

刘杰1,2(), 石永振1,2, 杨博1, 张翔1, 陈新海1, 张华健1,2, 郭晓威1, 李胜国1, 李润华1,2, 彭晋韬1,2, 肖调杰1, 陈旭光1, 张庆阳1, 李彪1,2, 冷灿1,2, 李翊谁1,2, 王庆林1,2,*()   

  1. 1. 高端装备数字化软件湖南省重点实验室, 湖南 长沙 410073
    2. 国防科技大学并行与分布计算全国重点实验室, 湖南 长沙 410073
  • 收稿日期:2023-06-27 出版日期:2024-01-25 发布日期:2024-02-05
  • 通讯作者: 王庆林
  • 作者简介:刘杰, 男, 博士, 研究员, 博士生导师, 研究方向为高性能计算, E-mail: liujie@nudt.edu.cn
  • 基金资助:
    国家重点研发计划(2021YFBO300101);国家自然科学基金(62002365)

Parallel Algorithm Libraries for Tianhe Supercomputers

Jie LIU1,2(), Yongzhen SHI1,2, Bo YANG1, Xiang ZHANG1, Xinhai CHEN1, Huajian ZHANG1,2, Xiaowei GUO1, Shengguo LI1, Runhua LI1,2, Jintao PENG1,2, Tiaojie XIAO1, Xuguang CHEN1, Qingyang ZHANG1, Biao LI1,2, Can LENG1,2, Yushui LI1,2, Qinglin WANG1,2,*()   

  1. 1. Laboratory of Digitizing Software for Frontier Equipment, National University of Defense Technology, Changsha, Hunan 410073, China
    2. National Key Laboratory of Parallel and Distributed Computing, National University of Defense Technology, Changsha, Hunan 410073, China
  • Received:2023-06-27 Online:2024-01-25 Published:2024-02-05
  • Contact: Qinglin WANG

摘要:

国防科技大学研制的天河超算系统7次荣登世界超算TOP500排行榜第一名。面对高效能发挥超算系统性能的实际需求, 天河团队抽取大型科学与工程计算中的共性核心计算方法, 根据天河超算系统的特点设计与实现了可扩展并行算法, 研制了天河超算系统并行算法库, 是天河超算系统应用支撑环境的重要组成部分。本文首先对天河超算系统的发展历程和系统结构等内容进行回顾; 随后, 对网格处理算法库、偏微分方程离散求解算法库、矩阵计算算法库、粒子输运计算库、聚合通信算法库以及深度学习算法库等并行算法库的架构、功能以及性能进行重点介绍; 最后, 对天河超算系统上典型应用软件的简要总结显示: 并行算法库可有效支撑典型应用软件的快速开发与性能优化。

关键词: 天河超算系统, 并行算法, 应用软件, 算法库

Abstract:

Tianhe supercomputers developed by the National University of Defense Technology won first place in the world's supercomputing TOP500 seven times. To exploit the high efficiency of those systems, the Tianhe team extracted the common key computing methods in large-scale scientific and engineering computing, designed and implemented scalable parallel algorithms for those methods according to the characteristics of the Tianhe supercomputers, and developed the Tianhe parallel algorithm libraries which are an important part of the Tianhe application-support environment. This paper first reviews the development history and system structures of Tianhe supercomputing systems. Subsequently, the architecture, functions, and performance of common parallel libraries such as grid processing libraries, partial differential equation discrete solving libraries, matrix computing libraries, particle transport libraries, collective communication libraries, and deep learning libraries are highlighted. Finally, a summary of typical application software on Tianhe supercomputers shows that the parallel algorithm libraries can effectively support the rapid development and performance optimization of typical application software.

Key words: Tianhe supercomputer, parallel algorithm, application software, algorithm library

中图分类号: