清华大学学报(自然科学版)  2017, Vol. 57 Issue (3): 244-249    DOI: 10.16511/j.cnki.qhdxxb.2017.26.004
徐远超1,2, 杨璐1
1. 首都师范大学 信息工程学院, 北京 100048;
2. 中国科学院 计算技术研究所, 计算机体系结构国家重点实验室, 北京 100190
Task scheduling on a many-core processor for high-volume throughput applications
XU Yuanchao1,2, YANG Lu1
1. College of Information Engineering, Capital Normal University, Beijing 100048, China;
2. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
摘要 具有高通量特征的大数据应用已成为目前数据中心的主流应用,这些应用在传统处理器平台上的运行效率不高,原因之一是任务调度的低效。针对高通量应用的一些典型特征以及现有任务窃取算法的不足,该文提出一种程序行为和环境感知的任务调度机制,通过软硬件结合实现了处理器核的分区管理和任务的分级调度,减小了不同应用之间因争用共享资源对性能产生的不利影响,同时利用线程相似度高的特点提高指令缓存的命中率,从而提升系统的整体吞吐率。初步的模拟评估表明:该算法在混合负载情况下性能明显优于现有算法的,在测试的混合负载中平均优于现有算法20%。
关键词 众核处理器大数据应用高通量任务调度    
Abstract:Big data applications with high-volume throughputs have become the most common applications in datacenters. The efficiencies of these applications running on traditional processors are very low for various reasons, one of which is the low-efficiency task scheduling. This paper presents a task scheduling framework that identifies program behavior and the running environment and then partitions the cores with hierarchical task scheduling though hardware and software co-design to reduce the negative effect of shared resource contention and improving the instruction cache hit rate using thread similarity. Tests show this algorithm improves performance by 20% on average over the legacy work-stealing scheduling algorithm.
Key wordsmany-core processor    big data applications    high-volume throughput    task scheduling
收稿日期: 2016-10-26      出版日期: 2017-03-15
ZTFLH:  TP316  
徐远超, 杨璐. 面向高通量应用的众核处理器任务调度[J]. 清华大学学报(自然科学版), 2017, 57(3): 244-249.
XU Yuanchao, YANG Lu. Task scheduling on a many-core processor for high-volume throughput applications. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 244-249.
  图1 程序感知与环境感知的任务调度机制
  图2 环状拓扑的高通量众核结构
  图3 程序段离线分析流程
  图4 异构众核设计
  图5 两级任务分发
  表1 测试程序
  图6 单一负载时的性能比较
  图7 混合负载时的性能比较
