Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2020, Vol. 60 Issue (10): 804-813    DOI: 10.16511/j.cnki.qhdxxb.2020.22.002
  专题:容错计算 本期目录 | 过刊浏览 | 高级检索 |
多线程程序数据竞争随机森林指令级检测模型
孙家泽1,2, 阳伽伟1, 杨子江3
1. 西安邮电大学 计算机学院, 西安 710121, 中国;
2. 西安邮电大学 陕西省网络数据分析与智能处理重点实验室, 西安 710121, 中国;
3. 西密歇根大学 计算机系, 卡拉马祖 49008-5466, 美国
Random forest instruction level detection model for data race in multithreaded programs
SUN Jiaze1,2, YANG Jiawei1, YANG Zijiang3
1. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China;
2. Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China;
3. Department of Computer Science, Western Michigan University, Kalamazoo 49008-5466, USA
全文: PDF(1931 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 数据竞争是典型的多线程程序并发缺陷。由于多线程程序中存在不确定性的交织,数据竞争很难被检测出来。该文以多线程数据竞争的5个相关属性作为特征,构建了多线程程序数据竞争随机森林指令级检测模型。首先基于happens-before关系与lockset算法指令级检测数据竞争,同时用汇编源码信息来剔除隐形同步对,然后利用happens-before关系与lockset算法的分析结果训练多线程程序数据竞争随机森林检测模型。在Pin上实现了多线程程序数据竞争检测工具AIRaceTest。利用GitHub中多线程程序的插桩结果作为样本集来训练随机森林模型,模型精度可达92.1%。对Google data-race-test、Parsec基准程序3.1中的经典多线程程序的检测结果表明:AIRaceTest与Eraser、Djit+以及Thread Sanitizer这3种目前常用的数据竞争检测工具相比,数据竞争的误报和漏报分别降低了约10.6%和12.3%,在线程数较多的情况下,时间和内存开销分别降低了41.8%和22.4%。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
孙家泽
阳伽伟
杨子江
关键词 数据竞争并发缺陷随机森林隐形同步对    
Abstract:Data race is a typical concurrency bug in multithreaded programs. Data race is difficult to detect due to the uncertain interleaving in multithreaded programs. A random forest instruction level data race detection model is developed for multithread programs using five attributes to identify the data race features. Firstly, data race detection at the instruction level is based on the happens-before relationship and the lockset algorithm. At the same time, the assembly source code is used to eliminate implicit synchronization pairs. Then, the analysis results from the happens-before relationship and the lockset algorithm are used to train a random forest detection model for multithreaded program data race detection. This data race detection tool for multithreaded programs, AIRaceTest, is implemented on Pin. The model is trained with the results of the multithreaded program instrumentation in GitHub as a sample set. The model accuracy reaches 92.1%. Test results on the classic multithreaded programs, Google data-race-test and Parsec benchmark 3.1, show that the false positives are reduced by about 10.6% and the false negatives are reduced by about 12.3% compared with Eraser, Djit+and Thread Sanitizer. For a large number of threads, the time overhead is reduced by 41.8% while the memory overhead is reduced by 22.4%.
Key wordsdata race    concurrency bug    random forest    implicit synchronization pairs
收稿日期: 2019-09-12      出版日期: 2020-07-09
引用本文:   
孙家泽, 阳伽伟, 杨子江. 多线程程序数据竞争随机森林指令级检测模型[J]. 清华大学学报(自然科学版), 2020, 60(10): 804-813.
SUN Jiaze, YANG Jiawei, YANG Zijiang. Random forest instruction level detection model for data race in multithreaded programs. Journal of Tsinghua University(Science and Technology), 2020, 60(10): 804-813.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.22.002  或          http://jst.tsinghuajournals.com/CN/Y2020/V60/I10/804
  
  
  
  
  
  
  
  
  
  
  
  
  
  
[1] NETZER R H B, MILLER B P. What are race conditions? Some issues and formalizations[J]. ACM Letters on Programming Languages and Systems, 1992, 1(1):74-88.
[2] VON PRAUN C, GROSS T R. Static detection of atomicity violations in object-oriented programs[J]. Journal of Object Technology, 2004, 3(2):1-12.
[3] ENGLER D, ASHCRAFT K. RacerX:Effective, static detection of race conditions and deadlocks[J]. ACM SIGOPS Operating Systems Review, 2003, 37(5):237-252.
[4] LU S, PARK S, SEO E, et al. Learning from mistakes:A comprehensive study on real world concurrency bug characteristics[J]. ACM SIGARCH Computer Architecture News, 2008, 36(1):329-339.
[5] DINNING A, SCHONBERG E. An empirical comparison of monitoring algorithms for access anomaly detection[C]//Proceedings of the 2nd ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming. Seattle, USA, 1990.
[6] PERKOVIC D, KELEHER P J. Online data-race detection via coherency guarantees[C]//Proceedings of the 2nd USENIX Symposium on Operating Systems Design and Implementation. Seattle, USA, 1996:47-57.
[7] LAMPORT L. Time, clocks, and the ordering of events in a distributed system[J]. Communications of the ACM, 1978, 21(7):558-565.
[8] SAVAGE S, BURROWS M, NELSON G, et al. Eraser:A dynamic data race detector for multithreaded programs[J]. ACM Transactions on Computer Systems (TOCS), 1997, 15(4):391-411.
[9] POZNIANSKY E, SCHUSTER A. Efficient on-the-fly data race detection in multithreaded C++ programs[C]//Proceedings International Parallel and Distributed Processing Symposium. Nice, France, 2003.
[10] PRATIKAKIS P, FOSTER J S, HICKS M. LOCKSMITH:Context-sensitive correlation analysis for race detection[C]//Proceedings of the 27th ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation. Ottawa, Canada, 2006:320-331.
[11] VOUNG J W, JHALA R, LERNER S. RELAY:Static race detection on millions of lines of code[C]//Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. Dubrovnik, Croatia, 2007:205-214.
[12] SEREBRYANY K, ISKHODZHANOV T. ThreadSanitizer:Data race detection in practice[C]//Proceedings of the Workshop on Binary Instrumentation and Applications. New York, USA, 2009:62-71.
[13] YANG Z, YU Z, SU X H, et al. RaceTracker:Effective and efficient detection of data races[C]//Proceedings of the 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). Shanghai, China, 2016.
[14] GUO Y, CAI Y, YANG Z J. AtexRace:Across thread and execution sampling for in-house race detection[C]//Proceedings of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn, Germany, 2017:315-325.
[15] BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1):5-32.
[16] JANNESARI A, TICHY W F. Library-independent data race detection[J]. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(10):2606-2616.
[17] XIONG W W, PARK S, ZHANG J Q, et al. Ad hoc synchronization considered harmful[C]//Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. Vancouver, Canada, 2010.
[18] 禹振, 苏小红, 齐鹏, 等. 基于未来锁集的死锁规避[J]. 计算机研究与发展, 2017, 54(2):428-445. YU Z, SU X H, QI P, et al. Deadlock avoiding based on future lockset[J]. Computer Research and Development, 2017, 54(2):428-445. (in Chinese)
[19] BIENIA C. Benchmarking modern multiprocessors[D]. Princeton, USA:Princeton University, 2011.
[1] 代鑫, 黄弘, 汲欣愉, 王巍. 基于机器学习的城市暴雨内涝时空快速预测模型[J]. 清华大学学报(自然科学版), 2023, 63(6): 865-873.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn