Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2016, Vol. 56 Issue (3): 229-236,245    DOI: 10.16511/j.cnki.qhdxxb.2016.21.032
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
支持时序数据聚合函数的索引
黄向东1, 郑亮帆1, 邱明明1, 张金瑞1, 王建民1,2
1. 清华大学软件学院, 北京 100084;
2. 清华信息科学与技术国家实验室(筹), 北京 100084
Time-series data aggregation index
HUANG Xiangdong1, ZHENG Liangfan1, QIU Mingming1, ZHANG Jinrui1, WANG Jianmin1,2
1. School of Software, Tsinghua University, Beijing 100084, China;
2. Tsinghua National Laboratory for Information Science and Technology(TNList), Beijing 100084, China
全文: PDF(1649 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 时序数据是工业新发展的关键, 其中针对时序数据的聚合操作成为主要的应用场景之一。传统关系型数据库不足以支撑海量的时序数据, 而现有的NoSQL数据库对时序数据的聚合操作显得低效耗时。该文提出了一种结合概要表和线段树思想的支持时序数据聚合操作的高效索引机制, 并实现了基于这种索引机制的查询算法。该查询算法将概要表的思想引入NoSQL中, 缩小了待查询数据集, 并通过在概要表上建立概要森林的形式, 将最坏情况下的待查询数据集进一步缩小为索引个数的lbn倍。此外, 该算法通过计算直接定位出待查询的一系列索引数据, 有效避免了一般树形结构的递归遍历操作, 减少了大量的磁盘开销。最后, 通过与一般索引机制的查询对比实验, 验证了该索引机制的可用性和高效性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
黄向东
郑亮帆
邱明明
张金瑞
王建民
关键词 索引聚合操作时序数据概要表线段树    
Abstract:Time-series data is the key to industrial development, with the aggregation of the data an important step in practice. However, traditional relational databases fail to support vast amounts of time-series data. The NoSQL databases are inefficient and require time-consuming calculation to aggregate of time-series data. This paper presents an efficient index mechanism that supports time-series data aggregation by combining a synopsis table and a segment tree. A query algorithm based on this mechanism introduces the synopsis table into the NoSQL database and builds a segment tree from the synopsis table for archiving that is lbn the size of the original query set. This query algorithm can directly locate a series of index data to be queried without the recursive operations in traditional trees and effectively reduces I/O overhead. This study shows the efficiency of this index mechanism by comparisons with general index mechanisms.
Key wordsindex    aggregate operation    time-series data    synopsis table    segment tree
收稿日期: 2015-09-28      出版日期: 2016-03-15
ZTFLH:  TP311.13  
通讯作者: 王建民,教授,E-mail:jimwang@tsinghua.edu.cn     E-mail: jimwang@tsinghua.edu.cn
引用本文:   
黄向东, 郑亮帆, 邱明明, 张金瑞, 王建民. 支持时序数据聚合函数的索引[J]. 清华大学学报(自然科学版), 2016, 56(3): 229-236,245.
HUANG Xiangdong, ZHENG Liangfan, QIU Mingming, ZHANG Jinrui, WANG Jianmin. Time-series data aggregation index. Journal of Tsinghua University(Science and Technology), 2016, 56(3): 229-236,245.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2016.21.032  或          http://jst.tsinghuajournals.com/CN/Y2016/V56/I3/229
  图1 10个数据项生成2个概要信息
  图2 概要森林与时间窗口
  图3 概要森林的生长过程
  图4 添加序号为奇数的结点
  图5 添加序号为偶数的结点
  图6 序号转编号算法serialToCode
  图7 添加叶子节点算法AddDigestNod
  图8 生成中间节点算法generateParent
  图9 概要森林的搜索过程
  图10 查询过程算法query<br>图11 通过序号计算父亲节点编号算法serialToParentCod
  图12 通过添加虚拟结点构建线段树
  图13 无概要表的直接聚合查询
  图14 基于概要表的直接聚合查询
  图15 基于概要森林的聚合查询
  图16 概要表和概要森林的查询性能对比
  图17 基于概要森林的聚合查询
  图18 基于概要森林的聚合查询
[1] Goldstein J, Larson P A. Optimizing queries using materialized views:a practical, scalable solution[J]. Special Interest Group on Management Of Data, 2001, 30(2):331-342.
[2] Lehner W, Cochrane B, Pirahesh H, et al. Applying mass query optimization to speed up automatic summary table refresh[C]//In proceedings of the international conference on data engineering. Heidelberg, Germany:IEEE Computer Society, 2001:1-22.
[3] Chen Y, Chen M, Liu X, et al. MapReduce based aggregate-join query algorithms[J]. Journal of Computer Research & Development, 2013, 50(z1):306-311.
[4] Li Y, Kim G B, Wen L R, et al. MHB-Tree:A distributed spatial index method for document based NoSQL database system[J]. Lecture Notes in Electrical Engineering, 2013, 214:489-497.
[5] Dean J, Ghemawat S. MapReduce:Simplified data processing on large clusters[J]. Operating Systems Design & Implementation, 2004, 51(1):147-152.
[6] Ross K A, Srivastava D, Sudarshan S. Materialized view maintenance and integrity constraint checking:Trading space for time[J]. AcmSigmod Record, 199625(2):447-458.
[7] Li C, Chen J, Jin C, et al. MR-tree:An efficient index for MapReduce[J]. International Journal of Communication Systems, 2014, 27(6):828-838.
[8] Abramova V, Bernardino J. NoSQL databases:MongoDB vs Cassandra[C]//Proceedings of the International C* Conference on Computer Science and Software Engineering. Porto, Portual:ACM, 2013:14-22.
[9] Ibrahim S, Jin H, Lu L, et al. Adaptive disk I/O scheduling for MapReduce in virtualized environment[C]//IEEE 42nd International Conference on Parallel Processing. Taipei, Taiwan, China:IEEE Computer Society, 2011:335-344.
[10] Zilio D C, Zuzarte C, Lightstone S. Recommending materialized views and indexes with IBM DB2 design advisor[C]//In Proceedings of the International Conference on Autonomic Computing. New York, NY, USA:IEEE Computer Society, 2004:180-187.
[11] Qu Z C, Guo T L. A maintenance strategy of materialized views in distributed environment[J].Applied Mechanics and Materials, 2012, 182-183:2123-2126.
[12] Gal A. Obsolescent materialized views in query processing of enterprise information systems[C]//In Proc Eighth International Conference on Information and Knowledge Management. Kansas City, MO, USA:ACM, 1999:367-374.
[13] Arman N. A materialized view for the same generation query in deductive databases[J].Computer & Information Science, 2012,5(6):1-5.
[1] 刘俊岭, 何倩男, 邹鑫源, 孙焕良, 曹科研, 于戈. 空间关键字任务匹配算法[J]. 清华大学学报(自然科学版), 2021, 61(9): 953-964.
[2] 郑孟蕾, 田凌. 基于时序数据库的产品数字孪生模型海量动态数据建模方法[J]. 清华大学学报(自然科学版), 2021, 61(11): 1281-1288.
[3] 张辉, 苏宁, 刘奕群, 马少平. 文本飘红策略对搜索引擎用户行为的影响[J]. 清华大学学报(自然科学版), 2018, 58(8): 703-709.
[4] 王晶, 王昊. 融合局部特征和全局特征的视频拷贝检测[J]. 清华大学学报(自然科学版), 2016, 56(3): 269-272.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn