计算机科学与技术

PDF阅读器字体解析引擎的测试方法

  • 赵刚 ,
  • 于悦 ,
  • 黄敏桓 ,
  • 王玉迎 ,
  • 王嘉捷 ,
  • 孙晓霞
展开
  • 1. 信息系统安全技术国家重点实验室, 北京 100101;
    2. 北京邮电大学 计算机学院, 北京 100876;
    3. 中国信息安全测评中心, 北京 100085
赵刚(1969-),男,研究员。E-mail:zhao-go3@tsinghua.edu.cn

收稿日期: 2017-08-14

  网络出版日期: 2018-03-15

Test method for the font parser in PDF viewers

  • ZHAO Gang ,
  • YU Yue ,
  • HUANG Minhuan ,
  • WANG Yuying ,
  • WANG Jiajie ,
  • SUN Xiaoxia
Expand
  • 1. National Key Laboratory of Science and Technology on Information System Security, Beijing 100101, China;
    2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    3. China Information Technology Security Evaluation Center, Bejing 100085, China

Received date: 2017-08-14

  Online published: 2018-03-15

摘要

PDF文档具有良好的移植性且应用广泛,常被用作恶意代码的载体。PDF文档具有严格的格式校验,对结构复杂的PDF阅读器进行模糊测试时,传统随机模糊测试效率较低。现有基于文件格式的灰盒模糊测试,由于模型描述语言能力不足,难以针对某种文件格式构建统一的数据模型。该文针对PDF阅读器字体解析引擎提出一种批量化构造测试用例的方法。通过对字体文件重构和添加辅助信息方式,构造格式统一的测试用例,对TrueType格式文件构造统一数据模型。在此基础上,开发了模糊测试工具并对20余款PDF阅读器进行了测试,触发了大量崩溃。结果表明:该方法可以有针对性地构造测试用例,并有效地挖掘PDF阅读器中的缺陷。

本文引用格式

赵刚 , 于悦 , 黄敏桓 , 王玉迎 , 王嘉捷 , 孙晓霞 . PDF阅读器字体解析引擎的测试方法[J]. 清华大学学报(自然科学版), 2018 , 58(3) : 266 -271 . DOI: 10.16511/j.cnki.qhdxxb.2018.26.013

Abstract

PDF files are portable and widely used, so they often host malware. Traditional PDF viewers fuzzing algorithms cannot work well due to their strict format validation. Also, existing file-format based grey-box fuzzing cannot be easily used to build a uniform data model because of the limits of its descrition language. This paper presents a method for generating test cases to test the font parser of PDF viewers. The system reconstructs the font files and adds supportive information to build a uniform data model for TrueType files. A fuzzer is built into the method and evaluated on more than twenty PDF viewers to identify several vulnerabilies. Tests show that this method can effectively generate test cases and detect bugs in PDF viewers.

参考文献

[1] US-CERT Security Operations Center. National vulnerability database.. https://nvd.nist.gov/.
[2] WANG T L, WEI T, LIN Z Q, et al. IntScope:Automatically detecting integer overflow vulnerability in X86 binary using symbolic execution[C]//Proceedings of the 16th Network and Distributed System Security Symposium. San Diego, USA:Internet Society, 2009:1-14.
[3] WANG T L, WEI T, GU G F, et al. TaintScope:A checksum-aware directed fuzzing tool for automatic software vulnerability detection[C]//Proceedings of 2010 IEEE Symposium on Security and Privacy. Berkeley/Oakland, USA:IEEE, 2010:497-512.
[4] GODEFROID P, LEVIN M Y, MOLNAR D A. Automated whitebox fuzz testing[C]//Proceedings of the 15th Annual Network and Distributed System Security Symposium. San Diego, USA:Internet Society, 2008:1-16.
[5] GODEFROID P, KIEZUN A, LEVIN M Y. Grammar-based whitebox fuzzing[C]//Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation. Tucson, USA:ACM, 2008:206-215.
[6] GODEFROID P, LEVIN M Y, MOLNAR D A. SAGE:Whitebox fuzzing for security testing[J]. Communications of the ACM, 2012, 55(3):40-44.
[7] WANG X F, MA H T, JING L S. A dynamic marking method for implicit information flow in dynamic taint analysis[C]//Proceedings of the 8th International Conference on Security of Information and Networks. Sochi, Russia:ACM, 2015:275-282.
[8] ISAEV I K, SIDOROV D V. The use of dynamic analysis for generation of input data that demonstrates critical bugs and vulnerabilities in programs[J]. Programming and Computer Software, 2010, 36(4):225-236.
[9] STEPHENS N, GROSEN J, SALLS C, et al. Driller:Augmenting fuzzing through selective symbolic execution[C]//Proceedings of the Network and Distributed System Security Symposium. San Diego, USA:Internet Society, 2016:21-24.
[10] HOUSEHOLDER A D, FOOTE J M. Probability-based parameter selection for black-box fuzz testing[R]. Pittsburgh:CMU, 2012.
[11] CHEN T, ZHANG X S, GUO S Z, et al. State of the art:Dynamic symbolic execution for automated test generation[J]. Future Generation Computer Systems, 2013, 29(7):1758-1773.
[12] YIN H, GAI K K. An empirical study on preprocessing high-dimensional class-imbalanced data for classification[C]//Proceedings of the 17th International Conference on High Performance Computing and Communications. New York, USA:IEEE, 2015:1314-1319.
[13] REBERT A, CHA S K, AVGERINOS T, et al. Optimizing seed selection for fuzzing[C]//Proceedings of the 23rd USENIX Conference on Security Symposium. San Diego, USA:USENIX Association Berkeley, 2014:861-875.
[14] YIN H, GAI K K, WANG Z J. A classification algorithm based on ensemble feature selections for imbalanced-class dataset[C]//Proceedings of the 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). New York, USA:IEEE, 2016:245-249.
[15] KARGEÉN U, SHAHMEHRI N. Turning programs against each other:High coverage fuzz-testing using binary-code mutation and dynamic slicing[C]//Proceedings of the 10th Joint Meeting on Foundations of Software Engineering. Bergamo, Italy:ACM, 2015:782-792.
文章导航

/