Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2018, Vol. 58 Issue (12) : 1059-1065     DOI: 10.16511/j.cnki.qhdxxb.2018.21.022
COMPUTER SCIENCE AND TECHNOLOGY |
Performance optimization algorithm for MapReduce based on obtaining frequent keys
LI Jianjiang1, HUA Shuiliang2, WU Jie3, ZHANG Kai1
1. Department of Computer Science and Technology, University of Science and Technology Beijing, Beijing 100083, China;
2. State Grid Zhangjiakou Power Supply Company & Telecommunication Branch, Zhangjiakou 075000, China;
3. Department of Computer and Information Sciences, Temple University, Philadelphia 19122, USA
Download: PDF(957 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  MapReduce is getting much attention in academia and industry for use in cloud computing to quickly deal with huge amounts of data. However, when MapReduce deals with text-centric applications, the algorithm generates is large amount of duplicate data in the intermediate results that increases the run time. A frequency buffering (FB) algorithm was used to add a Hash table before the ring memory to store frequent keys in a Hash table. However, since the algorithm is implemented by sampling, the algorithm may not accurately estimate the overhead and the frequent keys. Therefore, this study added a performance optimization algorithm to MapReduce to obtain the frequent keys by adding a counting Bloom filter (CBF) and a Hash table to dynamically filter the frequent keys before storing them in the ring memory. This algorithm more accurately identifies the frequent keys and greatly reduces the data sorting overhead and the disk I/O. Tests show that this performance optimization algorithm for MapReduce for obtaining the frequent keys significantly improves the execution speed by 17.04% compared to the original MapReduce and 9.31% higher than the frequency buffering algorithm.
Keywords MapReduce      frequent key      performance optimization      counting Bloom filter     
Issue Date: 13 December 2018
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
LI Jianjiang
HUA Shuiliang
WU Jie
ZHANG Kai
Cite this article:   
LI Jianjiang,HUA Shuiliang,WU Jie, et al. Performance optimization algorithm for MapReduce based on obtaining frequent keys[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(12): 1059-1065.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2018.21.022     OR     http://jst.tsinghuajournals.com/EN/Y2018/V58/I12/1059
  
  
  
  
  
  
  
[1] HILLS T T, NOGUCHI T, GIBBERT M. Information overload or search-amplified risk set size and order effects on decisions from experience[J]. Psychonomic Bulletin & Review, 2013, 20(5):1023-1031.
[2] ZIKOPOULOS P, EATON C. Understanding big data:Analytics for enterprise class Hadoop and streaming data[M]. New York:McGraw-Hill Osborne Media, 2011.
[3] HERODOTOU H, BABU S. Profiling, what-if analysis, and cost-based optimization of mapreduce programs[J]. Proceedings of the VLDB Endowment, 2011, 4(11):1111-1122.
[4] FLORATOU A, PATEL J M, SHEKITA E J, et al. Column-oriented storage techniques for MapReduce[J]. Proceedings of the VLDB Endowment, 2011, 4(7):419-429.
[5] HE Y, LEE R, HUAI Y, et al. RCFile:A fast and space-efficient data placement structure in MapReduce-based warehouse systems[J]. 2011, 83(1):1199-1208.
[6] MAX, FAN X, LIU J, et al. vLocality:Revisiting data locality for MapReduce in virtualized clouds[J]. IEEE Network, 2017, 31(1):28-35.
[7] POLO J, CARRERA D, BECERRA Y, et al. Performance-driven task co-scheduling for mapreduce environments[C]//2010 IEEE Network Operations and Management Symposium-NOMS 2010. New York:IEEE, 2010:373-380.
[8] CHENG D, RAO J, GUO Y, et al. Improving performance of heterogeneous MapReduce clusters with adaptive task tuning[J]. IEEE Transactions on Parallel & Distributed Systems, 2017, 28(3):774-786.
[9] ZIPF G K. Selected studies of the principle of relative frequency in language[J]. Language, 1933, 9(1):89-92.
[10] RATTANAOPAS K, KAEWKEEREE S. Improving Hadoop MapReduce performance with data compression:A study using wordcount job[C]//International Conference on Electrical Engineering/electronics, Computer, Telecommunications and Information Technology. New York:IEEE, 2017:564-567.
[11] ISSA J. Performance evaluation and estimation model using regression method for Hadoop wordcount[J]. IEEE Access, 2015, 3:2784-2793.
[12] HSIAO C H, CAFARELLA M, NARAYANASAMY S. Reducing MapReduce abstraction costs for text-centric applications[C]//201443rd International Conference on Parallel Processing. New York:IEEE, 2014:40-49.
[13] NYANG D H. Counting bloom filter:U.S. Patent Application 15/021, 133[P]. 2013-10-14.
[14] ROTTENSTREICH O, KANIZO Y, KESLASSY I. The variable increment counting Bloom filter[J]. IEEE/ACM Transactions on Networking, 2014, 22(4):1092-1105.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd