面向高通量应用的众核处理器任务调度

doi:10.16511/j.cnki.qhdxxb.2017.26.004

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1394 KB)
输出: BibTeX | EndNote (RIS)

摘要具有高通量特征的大数据应用已成为目前数据中心的主流应用，这些应用在传统处理器平台上的运行效率不高，原因之一是任务调度的低效。针对高通量应用的一些典型特征以及现有任务窃取算法的不足，该文提出一种程序行为和环境感知的任务调度机制，通过软硬件结合实现了处理器核的分区管理和任务的分级调度，减小了不同应用之间因争用共享资源对性能产生的不利影响，同时利用线程相似度高的特点提高指令缓存的命中率，从而提升系统的整体吞吐率。初步的模拟评估表明：该算法在混合负载情况下性能明显优于现有算法的，在测试的混合负载中平均优于现有算法20%。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	徐远超
	杨璐

关键词 ：众核处理器, 大数据应用, 高通量, 任务调度

Abstract：Big data applications with high-volume throughputs have become the most common applications in datacenters. The efficiencies of these applications running on traditional processors are very low for various reasons, one of which is the low-efficiency task scheduling. This paper presents a task scheduling framework that identifies program behavior and the running environment and then partitions the cores with hierarchical task scheduling though hardware and software co-design to reduce the negative effect of shared resource contention and improving the instruction cache hit rate using thread similarity. Tests show this algorithm improves performance by 20% on average over the legacy work-stealing scheduling algorithm.

Key words： many-core processor big data applications high-volume throughput task scheduling

收稿日期: 2016-10-26 出版日期: 2017-03-15

ZTFLH:

TP316

引用本文:

徐远超, 杨璐. 面向高通量应用的众核处理器任务调度[J]. 清华大学学报（自然科学版）, 2017, 57(3): 244-249.
XU Yuanchao, YANG Lu. Task scheduling on a many-core processor for high-volume throughput applications. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 244-249.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2017.26.004 或 http://jst.tsinghuajournals.com/CN/Y2017/V57/I3/244

图1 程序感知与环境感知的任务调度机制

图2 环状拓扑的高通量众核结构

图3 程序段离线分析流程

图4 异构众核设计

图5 两级任务分发

表1 测试程序

图6 单一负载时的性能比较

图7 混合负载时的性能比较

[1]	王元卓, 靳小龙, 程学旗. 网络大数据:现状与展望[J]. 计算机学报, 2013, 36(6):1-15.WANG Yuanzhuo, JIN Xiaolong, CHENG Xueqi. Network big data:Present and future[J]. Chinese Journal of Computers, 2013, 36(6):1-15. (in Chinese)
[2]	詹剑锋, 王磊, 孙凝晖. 高通量计算机的性能评价[J]. 中国计算机学会通讯, 2011, 7(7):40-43.ZHAN Jianfeng, WANG Lei, SUN Ninghui. Performance evaluation of high-volume throughput computer[J]. Communication of CCF, 2011, 7(7):40-43. (in Chinese)
[3]	Allan A, Edenfeld D, Joyner W H, et al. 2001 technology roadmap for semiconductors[J]. Computer, 2002, 35(1):42-53.
[4]	Broquedis F, Diakhaté F, Thibault S, et al. Scheduling dynamic OpenMP applications over multicore architectures[C]//Proc of the International Workshop on OpenMP. Berlin, Germany:Springer, 2008:170-180.
[5]	Frigo M, Leiserson C E, Randall K H. The implementation of the Cilk-5 multithreaded language[C]//ACM Sigplan Notices. Montreal, Quebec, Canada:ACM, 1998:212-223.
[6]	Blumofe R D, Leiserson C E. Scheduling multithreaded computations by work stealing[C]//Proc 35th Annual Symposium on Foundations of Computer Science. New York, NY, USA:IEEE, 1994:356-368.
[7]	Ebrahimi E, Lee C J, Mutlu O, et al. Fairness via source throttling:A configurable and high-performance fairness substrate for multi-core memory systems[C]//ACM Sigplan Notices. Pittsburgh, PA, USA:ACM, 2010, 45(3):335-346.
[8]	Diamos G F, Yalamanchili S. Harmony:An ution model and runtime for heterogeneous many core systems[C]//Proc 17th International Symposium on High Performance Distributed Computing. Boston, MA, USA:ACM, 2008:197-200.
[9]	Augonnet C, Thibault S, Namyst R, et al. StarPU:A unified platform for task scheduling on heterogeneous multicore architectures[C]//Proc of the European Conference on Parallel Processing. Berlin, Germany:Springer, 2009:863-874.
[10]	Nightingale E B, Hodson O, McIlroy R, et al. Helios:Heterogeneous multiprocessing with satellite kernels[C]//Proc 22nd ACM SIGOPS Symposium on Operating Systems Principles. Big Sky, MT, USA:ACM, 2009:221-234.
[11]	Baumann A, Barham P, Dagand P E, et al. The multikernel:A new OS architecture for scalable multicore systems[C]//Proc 22nd ACM SIGOPS Symposium on Operating Systems Principles. Big Sky, MT, USA:ACM, 2009:29-44.
[12]	Wentzlaff D, Agarwal A. Factored operating systems (fos):The case for a scalable operating system for multicores[J]. ACM SIGOPS Operating Systems Review, 2009, 43(2):76-85.
[13]	Boyd-Wickizer S, Chen H, Chen R, et al. Corey:An operating system for many cores[C]//Proc 8th USENIX Symposium on Operating Systems Design and Implementation. San Diego, CA, USA:USENIX, 2008, 8:43-57.
[14]	Rhoden B, Klues K, Zhu D, et al. Improving per-node efficiency in the datacenter with new OS abstractions[C]//Proc 2nd ACM Symposium on Cloud Computing. Cascais, Portugal:ACM, 2011:25.
[15]	Kumar V, Fedorova A. Towards better performance per Watt in virtual environments on asymmetric single-ISA multi-core systems[J]. ACM SIGOPS Operating Systems Review, 2009, 43(3):105-109.
[16]	曹仰杰, 钱德沛, 伍卫国, 等. 众核处理器系统核资源动态分组的自适应调度算法[J]. 软件学报, 2012, 23(2):240-252. CAO Yangjie, QIAN Depei, WU Weiguo, et al. Adaptive scheduling algorithm based on dynamic core-resource partitions for many-core processor systems[J]. Journal of Software, 2012, 23(2):240-252. (in Chinese)
[17]	Mogul J C, Mudigonda J, Binkert N, et al. Using asymmetric single-ISA CMPs to save energy on operating systems[J]. IEEE Micro, 2008, 28(3):26-41.
[18]	Ye X, Fan D, Sun N, et al. SimICT:A fast and flexible framework for performance and power evaluation of large-scale architecture[C]//Proc of the 2013 International Symposium on Low Power Electronics and Design. Beijing, China:IEEE Press, 2013:273-278.
[19]	Ferdman M, Adileh A, Kocberber O, et al. Clearing the clouds:A study of emerging scale-out workloads on modern hardware[C]//ACM SIGPLAN Notices. London, UK:ACM, 2012:37-48."

[1]	刘巍, 王瑀屏. 基于随机映射的相变内存磨损均衡方法[J]. 清华大学学报（自然科学版）, 2015, 55(11): 1208-1215.
[2]	谢学智, 王瑀屏, 谈鉴锋, 陈启庚. 不可信系统平台下的敏感信息管理系统[J]. 清华大学学报（自然科学版）, 2015, 55(11): 1221-1228.
[3]	茅俊杰, 陈渝. Linux设备驱动的内核服务需求特征[J]. 清华大学学报（自然科学版）, 2015, 55(8): 911-915.
[4]	刘圣卓, 姜进磊, 杨广文. 基于副本的跨数据中心虚拟机快速迁移算法[J]. 清华大学学报（自然科学版）, 2015, 55(5): 579-584.

Viewed

Full text

Abstract

Cited

Shared

Discussed