Chinese Journal of Computational Physics ›› 2024, Vol. 41 ›› Issue (1): 22-32.DOI: 10.19596/j.cnki.1001-246x.8766

• Performance Optimization Techniques and Parallel Numerical Algorithms for Supercomputing • Previous Articles     Next Articles

Sparse General Matrix-matrix Multiplication for Sunway Manycore Architecture

Kan LIU1(), Lei YANG2, Wei XUE1,*(), Wenguang CHEN1   

  1. 1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
    2. National Supercomputing Center in Wuxi, Wuxi, Jiangsu 214000, China
  • Received:2023-05-30 Online:2024-01-25 Published:2024-02-05
  • Contact: Wei XUE

Abstract:

A parallel algorithm for sparse general matrix-matrix multiplication (SpGEMM), swSpGEMM, targeting the new generation Sunway many-core architecture is proposed. The algorithm addresses the load balance issue caused by the distribution of nonzeros in input matrix, using a light weight parallel task partitioning. For the irregular memory access and inefficient instruction pipelining in accumulating the product, a hierarchical sparse accumulator has been proposed to maximize the utilization of local memory with different input matrix features and to relieve the instruction dependency in integer searching, resulting in more efficient use of the computing capability of the hardware. On large matrices from the SuiteSparse sparse matrix collection, the algorithm outperforms MKL on two Intel Xeon GOLD 6132 processors by 21.1% and cuSPARSE on NVIDIA A100 by 95.3%.

Key words: Sunway many-core architecture, sparse matrix computation, matrix-matrix multiplication

CLC Number: