计算物理 ›› 2024, Vol. 41 ›› Issue (1): 110-121.DOI: 10.19596/j.cnki.1001-246x.8771
• 面向超级计算机的性能优化技术与数值并行算法专刊 • 上一篇 下一篇
胡少亮1,2(), 徐小文1,2,*(), 安恒斌1,2, 徐然1,2, 范荣红1,2
收稿日期:
2023-05-31
出版日期:
2024-01-25
发布日期:
2024-02-05
通讯作者:
徐小文
作者简介:
胡少亮(1990-), 博士, 从事数值代数算法库研发和面向应用的线性解法器算法研究, E-mail: hu_shaoliaong@iapcm.ac.cn
基金资助:
Shaoliang HU1,2(), Xiaowen XU1,2,*(), Hengbin AN1,2, Ran XU1,2, Ronghong FAN1,2
Received:
2023-05-31
Online:
2024-01-25
Published:
2024-02-05
Contact:
Xiaowen XU
摘要:
介绍JPSOL (J Parallel Solver Library for Numerical Algebra Problems)的软件架构、矩阵向量数据结构、三类算法库(线性、非线性和特征值)及领域专用解法器, 然后通过基本迭代法的测试结果说明其高可扩展并行性, 最后通过几类典型实际应用, 展示应用效果和健壮性。
中图分类号:
胡少亮, 徐小文, 安恒斌, 徐然, 范荣红. 应用特征驱动的并行数值代数解法器JPSOL[J]. 计算物理, 2024, 41(1): 110-121.
Shaoliang HU, Xiaowen XU, Hengbin AN, Ran XU, Ronghong FAN. JPSOL: A Parallel Numerical Algebraic Solver Driven by Application Features[J]. Chinese Journal of Computational Physics, 2024, 41(1): 110-121.
核组数 | 总核数 | 运行时间/s | 并行效率/% |
512 | 33 280 | 46.20 | 100.00 |
1 024 | 66 560 | 46.44 | 99.50 |
2 048 | 133 120 | 46.68 | 98.99 |
4 096 | 266 240 | 47.06 | 98.19 |
8 162 | 530 530 | 47.38 | 97.53 |
16 384 | 1 064 960 | 48.08 | 96.09 |
32 768 | 2 129 920 | 63.76 | 72.46 |
表1 JACOBI_CG方法弱可扩展测试结果
Table 1 Weak scalability of JACOBI_CG method
核组数 | 总核数 | 运行时间/s | 并行效率/% |
512 | 33 280 | 46.20 | 100.00 |
1 024 | 66 560 | 46.44 | 99.50 |
2 048 | 133 120 | 46.68 | 98.99 |
4 096 | 266 240 | 47.06 | 98.19 |
8 162 | 530 530 | 47.38 | 97.53 |
16 384 | 1 064 960 | 48.08 | 96.09 |
32 768 | 2 129 920 | 63.76 | 72.46 |
核数 | AMG_CG | AMG_BiCG | AMG_GMRES |
512 | 16.23 (40) | 19.40 (26) | 16.60 (40) |
1 024 | 8.18 (40) | 9.18 (24) | 8.01 (39) |
2 048 | 4.44 (41) | 5.06 (25) | 4.38 (39) |
4 096 | 2.69 (42) | 3.07 (25) | 2.69 (40) |
8 192 | 1.68 (41) | 2.02 (27) | 1.74 (40) |
16 384 | 1.40 (41) | 1.51 (26) | 1.46 (41) |
表2 AMG预条件基本迭代法的强可扩展性
Table 2 Strong scalability of AMG preconditionediterative methods
核数 | AMG_CG | AMG_BiCG | AMG_GMRES |
512 | 16.23 (40) | 19.40 (26) | 16.60 (40) |
1 024 | 8.18 (40) | 9.18 (24) | 8.01 (39) |
2 048 | 4.44 (41) | 5.06 (25) | 4.38 (39) |
4 096 | 2.69 (42) | 3.07 (25) | 2.69 (40) |
8 192 | 1.68 (41) | 2.02 (27) | 1.74 (40) |
16 384 | 1.40 (41) | 1.51 (26) | 1.46 (41) |
核数 | AMG_CG | AMG_BiCG | AMG_GMRES |
32 | 2.03 (23) | 2.54 (14) | 2.21 (23) |
256 | 3.14 (30) | 3.76 (20) | 3.32 (30) |
2 048 | 4.73 (41) | 5.39 (25) | 4.68 (39) |
16 384 | 7.62 (58) | 8.04 (34) | 7.83 (56) |
表3 AMG预条件基本迭代法的弱可扩展性
Table 3 Weak scalability of AMG preconditionediterative methods
核数 | AMG_CG | AMG_BiCG | AMG_GMRES |
32 | 2.03 (23) | 2.54 (14) | 2.21 (23) |
256 | 3.14 (30) | 3.76 (20) | 3.32 (30) |
2 048 | 4.73 (41) | 5.39 (25) | 4.68 (39) |
16 384 | 7.62 (58) | 8.04 (34) | 7.83 (56) |
自由度数 | 核数 | 迭代次数 | 时间/s | 并行效率/% |
22.5万 | 16 | 92 | 74.9 | 100.00 |
180万 | 128 | 97 | 93.6 | 80.02 |
1 400万 | 1 024 | 97 | 132.8 | 56.40 |
1.11亿 | 8 192 | 98 | 163.7 | 45.75 |
表4 滤波器算例弱可扩展性测试结果
Table 4 Weak scalability testing results for filter model
自由度数 | 核数 | 迭代次数 | 时间/s | 并行效率/% |
22.5万 | 16 | 92 | 74.9 | 100.00 |
180万 | 128 | 97 | 93.6 | 80.02 |
1 400万 | 1 024 | 97 | 132.8 | 56.40 |
1.11亿 | 8 192 | 98 | 163.7 | 45.75 |
核数 | 时间/s | 加速比 | 并行效率/% |
200 | 31.15 | 1.00 | 100.00 |
400 | 17.89 | 1.74 | 87.06 |
800 | 9.22 | 3.38 | 84.46 |
1 000 | 8.98 | 3.47 | 69.37 |
2 000 | 7.51 | 4.15 | 41.48 |
表5 涡轮叶片算例强可扩展性测试
Table 5 Strong scalability of turbine blade model
核数 | 时间/s | 加速比 | 并行效率/% |
200 | 31.15 | 1.00 | 100.00 |
400 | 17.89 | 1.74 | 87.06 |
800 | 9.22 | 3.38 | 84.46 |
1 000 | 8.98 | 3.47 | 69.37 |
2 000 | 7.51 | 4.15 | 41.48 |
自由度数 | 核数 | 线性解法器时间/s | 迭代次数 |
1.65亿 | 512 | 368.67 | 243 |
13.03亿 | 2 048 | 466.70 | 343 |
103.45亿 | 32 768 | 1261.11 | 581 |
表6 三峡大坝模型百亿自由度测试结果
Table 6 10 billion degrees of freedomtesting of Sanxia dam model
自由度数 | 核数 | 线性解法器时间/s | 迭代次数 |
1.65亿 | 512 | 368.67 | 243 |
13.03亿 | 2 048 | 466.70 | 343 |
103.45亿 | 32 768 | 1261.11 | 581 |
核数 | 线性解法器时间 | 加速比 | 并行效率/% |
400 | 1 736.38 | 1.00 | 100.00 |
800 | 858.50 | 2.02 | 101.00 |
1 600 | 434.04 | 4.00 | 100.00 |
3 200 | 230.80 | 7.52 | 94.00 |
6 400 | 128.92 | 13.47 | 84.19 |
10 000 | 108.69 | 16.98 | 67.92 |
表7 燃烧室算例,1.6亿网格,线性解法器强可扩展测试结果
Table 7 The combustion chamber model case, 0.16billion cells of mesh, strong scalabilitytesting results of linear solver
核数 | 线性解法器时间 | 加速比 | 并行效率/% |
400 | 1 736.38 | 1.00 | 100.00 |
800 | 858.50 | 2.02 | 101.00 |
1 600 | 434.04 | 4.00 | 100.00 |
3 200 | 230.80 | 7.52 | 94.00 |
6 400 | 128.92 | 13.47 | 84.19 |
10 000 | 108.69 | 16.98 | 67.92 |
1 | FALGOUT R D, YANG U M. Hypre: A library of high performance preconditioners[C]//International Conference on Computational Science: Computational Science-ICCS 2002. Amsterdam, The Netherlands: Springer, 2002: 632-641. |
2 | JOLIVET P , ROMAN J E , ZAMPINI S .KSPHPDDM and PCHPDDM: Extending PETSc with advanced Krylov methods and robust multilevel overlapping Schwarz preconditioners[J].Computers & Mathematics with Applications,2021,84,277-295. |
3 |
HERNANDEZ V , ROMAN J E , VIDAL V .SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems[J].ACM Transactions on Mathematical Software,2005,31(3):351-362.
DOI |
4 | AMESTOY P R, DUFF I S, L'EXCELLENT J Y, et al. MUMPS: A general purpose distributed memory sparse solver[C]//International Workshop on Applied Parallel Computing: PARA 2000: Applied Parallel Computing. New Paradigms for HPC in Industry and Academia. Bergen, Norway: Springer, 2000: 121-130. |
5 |
LI Xiaoye .An overview of SuperLU: Algorithms, implementation, and user interface[J].ACM Transactions on Mathematical Software,2005,31(3):302-325.
DOI |
6 |
SCHENK O , GÄRTNER K , FICHTNER W , et al.PARDISO: A high-performance serial and parallel sparse linear solver in semiconductor device simulation[J].Future Generation Computer Systems,2001,18(1):69-78.
DOI |
7 |
DAVIS T A .Algorithm 832: UMFPACK V4.3---an unsymmetric-pattern multifrontal method[J].ACM Transactions on Mathematical Software,2004,30(2):196-199.
DOI |
8 |
HEROUX M A , BARTLETT R A , HOWLE V E , et al.An overview of the Trilinos project[J].ACM Transactions on Mathematical Software,2005,31(3):397-423.
DOI |
9 | RUGE J W, STVBEN K. Algebraic multigrid[M]//MCCORMICK S F. Multigrid Methods. Philadelphia, PA: Society for Industrial and Applied Mathematics, 1987: 73-130. |
10 | 胡少亮, 徐小文, 郑宇腾, 等.系统级封装应用中时谐Maxwell方程大规模计算的求解算法: 现状与挑战[J].计算物理,2021,38(2):131-145. |
11 |
XU Xiaowen , MO Zeyao .Algebraic interface-based coarsening AMG preconditioner for multi-scale sparse matrices with applications to radiation hydrodynamics computation[J].Numerical Linear Algebra with Applications,2017,24(2):e2078.
DOI |
12 | 丁琪, 尚月强.非定常Navier-Stokes方程基于两重网格离散的有限元并行算法[J].计算物理,2020,37(1):10-18. |
13 | 李凌霄, 翟传磊, 谢辉, 等.一种求解三维热辐射输运方程的整体预处理迭代方法及并行计算[J].计算物理,2021,38(3):269-279. |
14 | 丁永龙, 胡琳萍, 张瑞勤.一种基于迭代子空间直接求逆算法的高效子空间混合算法[J].计算物理,2021,38(4):418-422. |
15 |
MO Zeyao , ZHANG Aiqing , CAO Xiaolin , et al.JASMIN: A parallel software infrastructure for scientific computing[J].Frontiers of Computer Science in China,2010,4(4):480-488.
DOI |
16 |
LIU Qingkai , MO Zeyao , ZHANG Aiqing , et al.JAUMIN: A programming framework for large-scale numerical simulation on unstructured meshes[J].CCF Transactions on High Performance Computing,2019,1(1):35-48.
DOI |
17 |
XU Xiaowen , YUE Xiaoqiang , MAO Runzhang , et al.JXPAMG: A parallel algebraic multigrid solver for extreme-scale numerical simulations[J].CCF Transactions on High Performance Computing,2023,5(1):72-83.
DOI |
18 | SAAD Y. Iterative methods for sparse linear systems[M]. 2nd ed. Philadelphia, PA: Society for Industrial and Applied Mathematics, 2003. |
19 |
MORGAN R B .GMRES with deflated restarting[J].SIAM Journal on Scientific Computing,2002,24(1):20-37.
DOI |
20 |
PARKS M L , DE STURLER E , MACKEY G , et al.Recycling Krylov subspaces for sequences of linear systems[J].SIAM Journal on Scientific Computing,2006,28(5):1651-1674.
DOI |
21 |
CAI Xiaochuan , SARKIS M .A restricted additive Schwarz preconditioner for general sparse linear systems[J].SIAM Journal on Scientific Computing,1999,21(2):792-797.
DOI |
22 |
DE STERCK H , YANG U M , HEYS J J , et al.Reducing complexity in parallel algebraic multigrid preconditioners[J].SIAM Journal on Matrix Analysis and Applications,2006,27(4):1019-1039.
DOI |
23 |
VANěK P , MANDEL J , BREZINA M .Algebraic multigrid by smoothed aggregation for second and fourth order elliptic problems[J].Computing,1996,56(3):179-196.
DOI |
24 |
VAN\VEK P , BREZINA M , MANDEL J .Convergence of algebraic multigrid based on smoothed aggregation[J].Numerische Mathematik,2001,88(3):559-579.
DOI |
25 |
WALKER H F , NI Peng .Anderson acceleration for fixed-point iterations[J].SIAM Journal on Numerical Analysis,2011,49(4):1715-1735.
DOI |
26 |
TOTH A , KELLEY C T .Convergence analysis for Anderson acceleration[J].SIAM Journal on Numerical Analysis,2015,53(2):805-819.
DOI |
27 | SAAD Y .Numerical methods for large eigenvalue problems[M].Philadelphia, PA: Society for Industrial and Applied Mathematics,2011. |
28 |
STEWART G W .A Krylov-Schur algorithm for large eigenproblems[J].SIAM Journal on Matrix Analysis and Applications,2002,23(3):601-614.
DOI |
29 |
KNYAZEV A V .Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method[J].SIAM Journal on Scientific Computing,2001,23(2):517-541.
DOI |
30 |
KNYAZEV A V , ARGENTATI M E , LASHUK I , et al.Block locally optimal preconditioned eigenvalue Xolvers (BLOPEX) in Hypre and PETSc[J].SIAM Journal on Scientific Computing,2007,29(5):2224-2239.
DOI |
31 |
SLEIJPEN G L G , VAN DER VORST H A .A Jacobi-Davidson iteration method for linear eigenvalue problems[J].SIAM Journal on Matrix Analysis and Applications,1996,17(2):401-425.
DOI |
32 | ROMERO E , ROMAN J E .A parallel implementation of Davidson methods for large-scale eigenvalue problems in SLEPc[J].ACM Transactions on Mathematical Software,2014,40(2):1-29. |
33 | 张宁, 李瑜, 谢和虎, 等.一种求解特征值问题的广义共轭梯度算法[J].中国科学(数学),2021,51(8):1297-1320. |
34 |
赵振国, 李光荣, 童杰, 等.封装结构强电磁脉冲多物理效应并行计算程序研制[J].强激光与粒子束,2018,30(8):083001.
DOI |
35 | 王卫杰, 赵振国, 胡少亮, 等.芯片-系统电磁脉冲耦合的高性能全波电磁模拟[J].强激光与粒子束,2021,33(12):107-114. |
36 | 王卫杰, 胡少亮, 郑宇腾, 等.并行预处理有限元方法及其在系统级封装结构电磁模拟中的应用[J].电子学报,2021,49(1):58-63. |
37 |
HIPTMAIR R , XU Jinchao .Nodal auxiliary space preconditioning in H(curl) and H(div) spaces[J].SIAM Journal on Numerical Analysis,2007,45(6):2483-2509.
DOI |
38 |
BENZI M , GOLUB G H , LIESEN J .Numerical solution of saddle point problems[J].Acta Numerica,2005,14,1-137.
DOI |
39 |
TIAN Rong , ZHOU Mozhen , WANG Jingtao , et al.A challenging dam structural analysis: large-scale implicit thermo-mechanical coupled contact simulation on Tianhe-Ⅱ[J].Computational Mechanics,2019,63(1):99-119.
DOI |
40 | AN Hengbin , MO Zeyao , WANG Jingtao , et al.Shear decoupled parallel scalable preconditioners for nonlinear thermo-mechanical coupled contact applications[J].Journal of Scientific Computing,2021,90(1):4. |
41 |
PATANKAR S V , SPALDING D B .A calculation procedure for heat, mass and momentum transfer in three-dimensional parabolic flows[J].International Journal of Heat and Mass Transfer,1972,15(10):1787-1806.
DOI |
42 | 胡少亮, 许开龙, 徐然, 等.求解压力Poisson方程的混合粗化代数多重网格算法[J].计算物理,2023,40(5):527-534. |
[1] | 徐小文, 莫则尧, 胡少亮, 安恒斌. 特征修正并行预条件算法框架[J]. 计算物理, 2024, 41(1): 64-74. |
[2] | 许康, 李泽阳, 郭竹丰, 沈颖童, 王威, 缑敏辉, 王子正, 王玉坤, 刘伟峰. 量子计算加速的解法器算法及应用综述[J]. 计算物理, 2024, 41(1): 131-150. |
[3] | 胡少亮, 许开龙, 徐然, 刘再刚, 徐小文, 安恒斌, 范荣红, 汪振宇, 王伟. 求解压力Poisson方程的混合粗化代数多重网格算法[J]. 计算物理, 2023, 40(5): 527-534. |
[4] | 王少椿, 付铄然, 郭凌空, 唐志淏, 张娜, 孙乾. 基于模拟有限差分法的水驱油藏渗透率时变数值模拟[J]. 计算物理, 2023, 40(5): 597-605. |
[5] | 察鲁明, 冯其红, 王森, 徐世乾, 刘高文, 黄文欢. 基于虚拟单元法及损伤模型压驱注水数值模拟方法[J]. 计算物理, 2023, 40(1): 81-90. |
[6] | 刘利, 牛胜利, 朱金辉, 左应红, 谢红刚, 商鹏. 临近空间核爆炸碎片云运动的数值模拟[J]. 计算物理, 2022, 39(5): 521-528. |
[7] | 吴丽媛, 张素英. 自旋相关光晶格中玻色-爱因斯坦凝聚体的基态[J]. 计算物理, 2022, 39(5): 617-623. |
[8] | 杜旭林, 程林松, 牛烺昱, 陈玉明, 曹仁义, 谢永红. 考虑水力压裂缝和天然裂缝动态闭合的三维离散缝网数值模拟[J]. 计算物理, 2022, 39(4): 453-464. |
[9] | 孙梦营, 马明, 过海龙, 姚孟君, 徐猛, 张莹. 零重力点热源马兰戈尼FTM数值模拟[J]. 计算物理, 2022, 39(2): 191-200. |
[10] | 赵腾飞, 张华. 气泡碰撞过程中形变及破碎现象分析[J]. 计算物理, 2022, 39(1): 41-52. |
[11] | 关富荣, 李成乾, 邓敏艺. 激发介质相对不应态对螺旋波动力学行为的影响[J]. 计算物理, 2021, 38(6): 749-756. |
[12] | 王俊捷, 寇继生, 蔡建超, 潘益鑫, 钟振. 基于Tolman长度的Lucas-Washburn渗吸模型改进及数值模拟[J]. 计算物理, 2021, 38(5): 521-533. |
[13] | 杨展康, 牛奕. 温度及围护通风对独头巷道氡浓度分布的影响[J]. 计算物理, 2021, 38(4): 456-464. |
[14] | 李凌霄, 翟传磊, 谢辉, 施意. 一种求解三维热辐射输运方程的整体预处理迭代方法及并行计算[J]. 计算物理, 2021, 38(3): 269-279. |
[15] | 胡少亮, 徐小文, 郑宇腾, 赵振国, 王卫杰, 徐然, 安恒斌, 莫则尧. 系统级封装应用中时谐Maxwell方程大规模计算的求解算法:现状与挑战[J]. 计算物理, 2021, 38(2): 131-145. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||
版权所有 © 《计算物理》编辑部
地址:北京市海淀区丰豪东路2号 邮编:100094 E-mail:jswl@iapcm.ac.cn
本系统由北京玛格泰克科技发展有限公司设计开发