计算物理 ›› 2018, Vol. 35 ›› Issue (5): 554-562.DOI: 10.19596/j.cnki.1001-246x.7698

• • 上一篇    下一篇

LBM伪势MRT三维模型GPU并行计算的性能优化

彭浩1, 单鸣雷1,2, 朱昌平1,2, 姚澄1,2   

  1. 1. 河海大学常州市传感网与环境感知重点实验室并江苏省输配电装备技术重点实验室, 常州 213022;
    2. 江苏省“世界水谷”与水生态文明协同创新中心, 南京 211100
  • 收稿日期:2017-05-17 修回日期:2017-07-21 出版日期:2018-09-25 发布日期:2018-09-25
  • 通讯作者: 单鸣雷(1977-),男,讲师,博士,主要从事格子Boltzmann方法多相流建模,E-mail:shanming2003@126.com
  • 作者简介:彭浩(1992-),男,湖北荆门,硕士研究生,主要从事GPU并行计算研究
  • 基金资助:
    国家重点研发计划(2016YFC0401606),江苏省重点研发计划(BE2016056)及江苏省自然科学基金(SBK2014043338)资助项目

Performance Optimization of 3D Pseudopotential Multi-Relaxation-Time Lattice Boltzmann Model on GPU

PENG Hao1, SHAN Minglei1,2, ZHU Changping1,2, YAO Cheng1,2   

  1. 1. Changzhou Key Laboratory of Sensor Networks and Environmental Sensing, Jiangsu Key Laboratory of Power Transmission and Distribution Equipment Technology, Hohai University, Changzhou 213022, China;
    2. Jiangsu Provincial Collaborative Innovation Center of World Water Valley and Water Ecological Civilization, Nanjing 211100, China
  • Received:2017-05-17 Revised:2017-07-21 Online:2018-09-25 Published:2018-09-25

摘要: 格子Boltzmann方法伪势模型算法中的格点间计算未完全局部化,因此在并行计算时需要更多次的全局内存读写、使用更多数量的寄存器和线程同步操作,从而导致GPU并行计算效率下降.本文针对伪势模型并行计算的局限性,基于三维十五速格子结构的多松弛时间伪势模型,以气液相分离为算例,通过合并访问的方式提高全局内存的读写效率;并提出一种"定向转移"算法,提高格子边界格点获取邻居格点数据的效率;最后探索不同资源分配中各种因素对计算效率的影响,总结最优资源分配的方法.

关键词: LBM, 伪势模型, GPU, 并行计算, 性能优化

Abstract: Pseudopotential model of lattice Boltzmann method is partially non-local for pseudopotential calculation with coupling of lattices, which leads to synchronization of threads in parallel implementation process. Besides, it uses a large number of registers and much time of data access operations when access global memory in calculation process. They lead to low computational efficiency. In this paper, a multi-relaxation-time(MRT) 3D pseudopotential model with D3Q15 lattice is adopted as an example to investigate performance of parallel computing based on GPU. To address limitation of parallel computing of pseudo-potential model, efficiency of reading and writing of global memory is improved by using merge access method. To improve efficiency of grids retrieving data which are in boundary of lattice, a "Directional Transfer" algorithm is proposed. The role of computing resource configuration is investigated with different sizes of block, and optimal resource configuration scheme is obtained.

Key words: LBM, pseudopotential model, GPU, parallel computing, performance optimization

中图分类号: