Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2022, Vol. 62 Issue (12) : 1851-1863     DOI: 10.16511/j.cnki.qhdxxb.2022.21.024
INFORMATION SCIENCE |
As-Stream: An intelligent operator parallelization strategy for fluctuating data streams
LI Wei1,2, LI Chenglong3, YANG Jiahai3
1. Information Technology Center, Tsinghua University, Beijing 100084, China;
2. China University of Geosciences, Beijing 100083, China;
3. Institute for Network Science and Cyberspace, Tsinghua University, Beijing 100084, China
Download: PDF(2842 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  A large number of studies have presented methods using online resource management to optimize stream computing for fluctuating data streams, but have not optimized the parallel operator operations at the streaming application level. For example, in Apache Storm, the operator parallelism cannot be dynamically adjusted once it is set. This paper presents an intelligent parallelization strategy for operators with fluctuating data streams, As-Stream, which significantly improves the streaming computing platform performance. This method uses real-time tuning of parameters based on unsupervised learning and self-adaptive analyses in an elastic intelligent monitoring module. As-Stream includes parallel bottleneck identification, parameter plan generation, parameter migration conversion and parameter migration scheduling algorithms. The system was implemented on an Apache Storm platform with a large number of tests in a real distributed stream computing environment. The results show that this system significantly improves the performance compared with existing default scheduling strategies. With sufficient resources, the average throughput is increased 2.4 fold while with limited resources, the average latency is reduced by 44%.
Keywords stream computing      machine learning      operator parallelism      resource allocation     
Issue Date: 10 November 2022
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
LI Wei
LI Chenglong
YANG Jiahai
Cite this article:   
LI Wei,LI Chenglong,YANG Jiahai. As-Stream: An intelligent operator parallelization strategy for fluctuating data streams[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(12): 1851-1863.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2022.21.024     OR     http://jst.tsinghuajournals.com/EN/Y2022/V62/I12/1851
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
[1] Apache Storm[EB/OL]. (2013-11-05)[2021-12-30]. https://storm.apache.org.
[2] Apache Heron[EB/OL]. (2015-09-25)[2021-12-30]. https://heron.apache.org.
[3] Apache Flink®[EB/OL]. (2014-06-07)[2021-12-30]. https://flink.apache.org.
[4] Apache. Spark Streaming[EB/OL]. (2014-02-25)[2021-12-30]. https://spark.apache.org/streaming.
[5] Apache Samza[EB/OL]. (2013-11-05)[2021-12-30]. https://samza.apache.org.
[6] Apache ApexTM[EB/OL]. (2015-03-14)[2021-12-30]. https://apex.apache.org.
[7] Google. Cloud Dataflow[EB/OL]. (2014-6-05)[2021-12-30]. https://cloud.google.com/dataflow.
[8] Apache. Timely Dataflow[EB/OL]. (2014-12-07)[2021-12-30]. https://github.com/timelydataflow.
[9] ZHAO Y J, LIU Z, WU Y D, et al. Timestamped state sharing for stream analytics[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(11): 2691-2704.
[10] DENG S Z, WANG B T, HUANG S, et al. Self-adaptive framework for efficient stream data classification on storm[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020, 50(1): 123-136.
[11] MUHAMMAD A, ALEEM M, ISLAM M A. TOP-Storm: A topology-based resource-aware scheduler for stream processing engine[J]. Cluster Computing, 2021, 24(1): 417-431.
[12] AL-SINAYYID A, ZHU M. Job scheduler for streaming applications in heterogeneous distributed processing systems[J]. The Journal of Supercomputing, 2020, 76(12): 9609-9628.
[13] ZHENG T Y, CHEN G, WANG X Y, et al. Real-time intelligent big data processing: Technology, platform, and applications[J]. Science China Information Sciences, 2019, 62(8): 82101.
[14] LI W, SUN D W, GAO S, et al. A machine learning-based elastic strategy for operator parallelism in a big data stream computing system[C]//Proceedings of the 12th EAI International Conference on Broadband Communications, Networks, and Systems. Melbourne, Australia: Springer, 2021: 3-19.
[15] ZHOU Q H, GUO S, LU H D, et al. Falcon: Addressing stragglers in heterogeneous parameter server via multiple parallelism[J]. IEEE Transactions on Computers, 2021, 70(1): 139-155.
[16] HERODOTOU H, CHEN Y X, LU J H. A survey on automatic parameter tuning for big data processing systems[J]. ACM Computing Surveys, 2021, 53(2): 43.
[17] MUHAMMAD A, ALEEM M. A3Storm: Topology, traffic, and resourceaware storm scheduler for heterogeneous clusters[J]. The Journal of Supercomputing, 2021, 77(2): 1059-1093.
[18] CHENG D Z, ZHOU X B, WANG Y, et al. Adaptive scheduling parallel jobs with dynamic batching in spark streaming[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(12): 2672-2685.
[19] FANG J H, ZHANG R, FU T Z J, et al. Distributed stream rebalance for stateful operator under workload variance[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(10): 2223-2240.
[20] ISLAM M T, KARUNASEKERA S, BUYYA R. dSpark: Deadline-based resource allocation for big data applications in Apache Spark[C]//Proceedings of the 2017 IEEE 13th International Conference on e-Science (e-Science). Auckland, New Zealand: IEEE, 2017: 89-98.
[21] WANG W A, ZHANG C, CHEN X J, et al. An on-the-fly scheduling strategy for distributed stream processing platform[C]//Proceedings of the 2018 IEEE International Conference on Parallel and Distributed. Melbourne, VIC, Australia: IEEE, 2018: 773-780.
[22] LI W X, LIU D W, CHEN K, et al. Hone: Mitigating stragglers in distributed stream processing with tuple scheduling[J]. IEEE Transactions on Parallel and Distributed Systems, 2021, 32(8): 2021-2034.
[23] WEI X H, LI L, LI X, et al. Pec: Proactive elastic collaborative resource scheduling in data stream processing[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(7): 1628-1642.
[24] ARMBRUST M, DAS T, TORRES J, et al. Structured streaming: A declarative API for real-time applications in Apache Spark[C]//Proceedings of the 2018 International Conference on Management of Data. Houston, TX, USA: ACM, 2018: 601-613.
[25] SUN D W, HE H Y, YAN H B, et al. Lr-Stream: Using latency and resource aware scheduling to improve latency and throughput for streaming applications[J]. Future Generation Computer Systems, 2021, 114: 243-258.
[26] AO W C, PSOUNIS K. Resource-constrained replication strategies for hierarchical and heterogeneous tasks[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31(4): 793-804.
[27] CAO H Y, WU C Q, BAO L, et al. Throughput optimization for storm-based processing of stream data on clouds[J]. Future Generation Computer Systems, 2020, 112: 567-579.
[28] LIU S C, WENG J P, WANG J H, et al. An adaptive online scheme for scheduling and resource enforcement in storm[J]. IEEE/ACM Transactions on Networking, 2019, 27(4): 1373-1386.
[29] ESKANDARI L, MAIR J, HUANG Z Y, et al. T3-Scheduler: A topology and traffic aware two-level scheduler for stream processing systems in a heterogeneous cluster[J]. Future Generation Computer Systems, 2018, 89: 617-632.
[30] CHEN H H, ZHANG F, JIN H. PStream: A popularity-aware differentiated distributed stream processing system[J]. IEEE Transactions on Computers, 2021, 70(10): 1582-1597.
[31] KHATIBI E, MIRTAHERI S L. A dynamic data dissemination mechanism for Cassandra NoSQL data store[J]. The Journal of Supercomputing, 2019, 75(11): 7479-7496.
[32] LIU P C, XU H L, DA SILVA D, et al. FP4S: Fragment-based parallel state recovery for stateful stream applications[C]//Proceedings of the 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). New Orleans, LA, USA: IEEE, 2020: 1102-1111.
[1] WU Hao, NIU Fenglei. Machine learning model of radiation heat transfer in the high-temperature nuclear pebble bed[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(8): 1213-1218.
[2] DAI Xin, HUANG Hong, JI Xinyu, WANG Wei. Spatiotemporal rapid prediction model of urban rainstorm waterlogging based on machine learning[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 865-873.
[3] REN Jianqiang, CUI Yapeng, NI Shunjiang. Prediction method of the pandemic trend of COVID-19 based on machine learning[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 1003-1011.
[4] AN Jian, CHEN Yuxuan, SU Xingyu, ZHOU Hua, REN Zhuyin. Applications and prospects of machine learning in turbulent combustion and engines[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(4): 462-472.
[5] LIU Yang, TANG Wenzhe, WANG Yunhong, ZHANG Huicong. Influence of clients' design management capability on the design performance of water conservancy projects[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(2): 242-254.
[6] XIONG Qian, TANG Wenzhe, WANG Zhongjing. Factor analysis and system construction of integrated water resource management in the Xiong'an New Area[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(2): 255-263.
[7] ZHAO Qiming, BI Kexin, QIU Tong. Comparison and integration of machine learning based ethylene cracking process models[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1450-1457.
[8] CAO Laicheng, LI Yuntao, WU Rong, GUO Xian, FENG Tao. Multi-key privacy protection decision tree evaluation scheme[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 862-870.
[9] WANG Haojie, MA Zixuan, ZHENG Liyan, WANG Yuanwei, WANG Fei, ZHAI Jidong. Efficient memory allocator for the New Generation Sunway supercomputer[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 943-951.
[10] LU Sicong, LI Chunwen. Human-machine conversation system for chatting based on scene and topic[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 952-958.
[11] LIU Qiangmo, HE Xu, ZHOU Baishun, WU Haolin, ZHANG Chi, QIN Yu, SHEN Xiaomei, GAO Xiaorong. Simple and high performance classification model for autism based on machine learning and pupillary response[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(10): 1730-1738.
[12] MA Xiaoyue, MENG Xiao. Image position and layout effects of multi-image tweets from the perspective of user engagement[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(1): 77-87.
[13] TANG Zhili, WANG Xue, XU Qianjun. Rockburst prediction based on oversampling and objective weighting method[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(6): 543-555.
[14] WANG Zhiguo, ZHANG Yujin. Anomaly detection in surveillance videos: A survey[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(6): 518-529.
[15] SONG Yubo, QI Xinyu, HUANG Qiang, HU Aiqun, YANG Junjie. Two-stage multi-classification algorithm for Internet of Things equipment identification[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 365-370.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd