计算物理 ›› 2009, Vol. 26 ›› Issue (6): 831-836.

• 论文 • 上一篇    下一篇

一种基于信息测度的科学数据集序列约减方法

吴国清1,2, 莫则尧2, 陈虹2   

  1. 1. 中国工程物理研究院研究生部, 北京 100088;
    2. 北京应用物理与计算数学研究所, 北京 100094
  • 收稿日期:2008-07-03 修回日期:2008-12-10 出版日期:2009-11-25 发布日期:2009-11-25
  • 作者简介:吴国清(1980-),男,山东临沂,博士,主要从事科学数据挖掘、数据压缩研究,北京海淀区丰豪东路2号高性能计算中心100094.
  • 基金资助:
    国家自然科学基金(编号:90718029)资助项目

An Approach for Scientific Dataset Stream Reduction Based on Information Measures

WU Guoqing1,2, MO Zeyao2, CHEN Hong2   

  1. 1. Graduate Department of China Academy of Engineering Physics, Beijing 100088, China;
    2. Institute of Applied Physics and Computational Mathematics, Beijing 100094, China
  • Received:2008-07-03 Revised:2008-12-10 Online:2009-11-25 Published:2009-11-25

摘要: 提出一种基于信息测度的数据集序列约减方法,研究如何从序列中抽样出具有较小相关性,同时不丢失具有重要物理特征的数据集.方法具有普适性,应用于激光与等离子体相互作用模拟程序的结果数据中,减少数据集间的相关性和信息冗余度,单个数据集的平均信息量较原数据集序列增加30%左右.

关键词: 科学数据, 数据约减, 仙农熵, 边际效用

Abstract: We propose a data reduction approach based on information theory. It comprises sampling of datssets based on mutual entropy and truncation based on offline Marginal Utility. The approach is a universal method for multi-dimensional scientific dataset streams. To show applicability, results obtained with plasma simulation data are presented, It reduces relationship and redundancy between datesets.

Key words: scientific datasets, data reduction, Shannon entropy, marginal utility

中图分类号: