稀疏矩阵向量乘的自动调优

doi:10.19596/j.cnki.1001-246x.8763

计算物理 ›› 2024, Vol. 41 ›› Issue (1): 33-39.DOI: 10.19596/j.cnki.1001-246x.8763

• 面向超级计算机的性能优化技术与数值并行算法专刊 • 上一篇下一篇

稀疏矩阵向量乘的自动调优

杜臻¹^,²(), 谭光明²

1. 中国科学院大学计算机科学与技术学院, 北京 101408
2. 中国科学院计算技术研究所, 北京 100190

收稿日期:2023-05-25 出版日期:2024-01-25 发布日期:2024-02-05
作者简介:杜臻, 男, 博士研究生, 研究方向为稀疏计算、自动调优。E-mail: duzhen18z@ict.ac.cn
基金资助:
国家自然科学基金杰出青年基金项目(T2125013)

Auto-tuning for Sparse Matrix-vector Multiplication

Zhen DU¹^,²(), Guangming TAN²

1. School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 101408, China
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China

Received:2023-05-25 Online:2024-01-25 Published:2024-02-05

摘要/Abstract

摘要：

分析稀疏矩阵向量乘(SpMV)程序优化的难点, 介绍两个自动调优的代表性工作: 基于预实现模板的SMAT和从头设计程序的AlphaSparse。详细介绍了它们的设计思路、实现细节、测试结果以及各自的优缺点。最后, 对SpMV自动调优的发展趋势进行了分析和预测。

关键词: 高性能科学计算, 稀疏矩阵, 自动调优, 稀疏矩阵向量乘

Abstract:

SpMV (sparse matrix-vector multiplication) is a widely used kernel in scientific computing. Since the performance of specific SpMV program is closely related to the distribution of non-zero elements in sparse matrices, there is no universal SpMV program design that can achieve high performance in all matrices. Therefore, auto-tuning has become a popular method for high SpMV performance. This paper analyzes the difficulties in optimizing SpMV and introduces two representative works of auto-tuning: SMAT, which is based on pre-implemented templates and AlphaSparse which designs SpMV programs from scratch. This paper introduces their designs, implementations, test results, advantages, and disadvantages. Finally, the trend of SpMV auto-tuning is analyzed and predicted.

Key words: high-performance scientific computing, sparse matrix, auto-tuning, sparse matrix-vector multiplication

中图分类号:

O469

杜臻, 谭光明. 稀疏矩阵向量乘的自动调优[J]. 计算物理, 2024, 41(1): 33-39.

Zhen DU, Guangming TAN. Auto-tuning for Sparse Matrix-vector Multiplication[J]. Chinese Journal of Computational Physics, 2024, 41(1): 33-39.

图/表 5

图1 SpMV的计算过程

Fig.1 The computational process of SpMV

图2 预实现模板的格式选择器的程序设计空间

Fig.2 The program design space of format selectors for pre-implemented templates

图3 AlphaSparse的程序设计空间

Fig.3 The program design space of AlphaSparse

图4 AlphaSparse所面临的科学问题(a)设计空间表达；(b)设计空间映射；(c)设计空间搜索

Fig.4 Three problems facing AlphaSparse (a)expression of design space; (b) mapping of design space; (c) searching of design space

图5 程序设计流程矩阵描述的修改

Fig.5 Modification of the matrix metadata set of operator graph

参考文献 19

1	LANGR D , TVRDÍK P . Evaluation criteria for sparse matrix storage formats[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27 (2): 428- 440.
2	FILIPPONE S , CARDELLINI V , BARBIERI D , et al. Sparse matrix-vector multiplication on GPGPUs[J]. ACM Transactions on Mathematical Software, 2017, 43 (4): 1- 49.
3	DAVIS T A , HU Yifan . The university of Florida sparse matrix collection[J]. ACM Transactions on Mathematical Software, 2011, 38 (1): 1- 25.
4	LI Jiajia, TAN Guangming, CHEN Mingyu, et al. SMAT: An input adaptive auto-tuner for sparse matrix-vector multiplication[C]//Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation. Seattle, Washington, USA: Association for Computing Machinery, 2013: 117-126.
5	TAN Guangming , LIU Junhong , LI Jiajia . Design and implementation of adaptive SpMV library for multicore and many-core architecture[J]. ACM Transactions on Mathematical Software, 2018, 44 (4): 1- 25.
6	DU Zhen, LI Jiajia, WANG Yinshan, et al. AlphaSparse: Generating high performance SpMV codes directly from sparse matrices[C]//SC22: International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, TX, USA: IEEE, 2022: 1-15.
7	WANG Endong , ZHANG Qing , SHEN Bo , et al. High-performance computing on the Intel^® Xeon Phi?: How to fully exploit MIC architectures[M]. Cham: Springer, 2014: 167- 188.
8	QUINLAN R. Data mining tools See5 and C5.0[EB/OL]. [2023-04-20]. https://www.rulequest.com/see5-info.html.
9	ASHARI A, SEDAGHATI N, EISENLOHR J, et al. Fast sparse matrix-vector multiplication on GPUs for graph applications[C]//SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New Orleans, LA, USA: IEEE, 2014: 781-792.
10	DAGA M, GREATHOUSE J L. Structural agnostic SpMV: Adapting CSR-adaptive for irregular matrices[C]//2015 IEEE 22nd International Conference on High Performance Computing (HiPC). Bengaluru, India: IEEE, 2015: 64-74.
11	GREATHOUSE J L, DAGA M. Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format[C]//SC'14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New Orleans, LA, USA: IEEE, 2014: 769-780.
12	LIU Weifeng, VINTER B. CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. Newport Beach, California, USA: Association for Computing Machinery, 2015: 339-350.
13	MERRILL D, GARLAND M. Merge-based parallel sparse matrix-vector multiplication[C]//SC'16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Salt Lake City, UT, USA: IEEE, 2016: 678-689.
14	CHEN Tianqi, GUESTRIN C. XGBoost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: Association for Computing Machinery, 2016: 785-794.
15	ZHAO Yue, LI Jiajia, LIAO Chunhua, et al. Bridging the gap between deep learning and sparse matrix format selection[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Vienna, Austria: Association for Computing Machinery, 2018: 94-108.
16	BENATIA A , JI Weixing , WANG Yizhuo , et al. BestSF: A sparse meta-format for optimizing SpMV on GPU[J]. ACM Transactions on Architecture and Code Optimization, 2018, 15 (3): 1- 27.
17	CHEN Shizhao, FANG Jianbin, CHEN Donglin, et al. Adaptive optimization of sparse matrix-vector multiplication on emerging many-core architectures[C]//2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). Exeter, UK: IEEE, 2018: 649-658.
18	SEDAGHATI N, MU Te, POUCHET L N, et al. Automatic selection of sparse matrix representation on GPUs[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. Newport Beach, California, USA: Association for Computing Machinery, 2015: 99-108.
19	HOU Kaixi, FENG Wuchun, CHE Shuai. Auto-tuning strategies for parallelizing sparse matrix-vector (SpMV) multiplication on multi- and many-core processors[C]//2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Lake Buena Vista, FL, USA: IEEE, 2017: 713-722.

[1]	刘侃, 杨磊, 薛巍, 陈文光. 适用于申威众核架构的稀疏矩阵-矩阵乘法[J]. 计算物理, 2024, 41(1): 22-32.
[2]	宛新林, 席道瑛. 基于预条件LANCZOS算法快速实现三维地电场正演计算[J]. 计算物理, 2009, 26(6): 892-896.
[3]	刘兴平, 莫则尧, 彭力田. 高维预条件子的填充技术[J]. 计算物理, 2000, 17(5): 476-482.

稀疏矩阵向量乘的自动调优

Auto-tuning for Sparse Matrix-vector Multiplication

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 19

相关文章 3

编辑推荐

Metrics

作者中心

审稿中心

期刊浏览

期刊介绍