清华大学学报(自然科学版)  2020, Vol. 60 Issue (10): 829-836    DOI: 10.16511/j.cnki.qhdxxb.2020.25.002
陈乐乐1, 黄松1,2, 孙金磊1, 惠战伟1, 吴开舜1
1. 中国人民解放军陆军工程大学 指挥控制工程学院, 南京 210007;
2. 全军军事软件测评中心, 南京 210007
Bug report quality detection based on the BM25 algorithm
CHEN Lele1, HUANG Song1,2, SUN Jinlei1, HUI Zhanwei1, WU Kaishun1
1. College of Command&Control Engineering, Army Engineering University of PLA, Nanjing 210007, China;
2. PLA Military Software Testing and Evaluation Center, Nanjing 210007, China
摘要 问题报告作为记录和跟踪缺陷的载体,为解决软件质量问题提供依据。目前软件测试常以多人、并行的方式进行,海量问题报告的去假与去重等整合过程正面临严峻的挑战。因此,该文提出一种基于BM25算法的问题报告自动化检测方法,在对问题报告进行预处理后,依据测试需求和测试报告样本建立匹配库,利用BM25算法计算两者的相似度得分,并以此为依据检测问题报告的正确性。在软件测试大赛的数据上进行实验,结果表明该文提出的方法能够正确评判大部分问题报告,有效提高了去假与去重效率。
关键词 软件测试BM25算法问题报告自然语言处理    
Abstract:Bug reports are used to identify and track defects for improving software quality. Software testing often uses multiple users and parallel testing. The resulting numerous bug reports must then be integrated while removing fake or duplicate bug reports. This paper presents an automatic detection method for bug reports based on the BM25 algorithm. After preprocessing the bug reports, a matching library is built based on the test requirements and test report samples. The BM25 algorithm is used to calculate the similarities between reports to identify accurate bug reports. Tests with software test contest data show that the model can correctly judge most bug reports to effectively improve the efficiency of identifying false negatives and duplicates.
Key wordssoftware testing    BM25 algorithm    bug report    natural language processing
收稿日期: 2019-09-02      出版日期: 2020-07-09
陈乐乐, 黄松, 孙金磊, 惠战伟, 吴开舜. 基于BM25算法的问题报告质量检测方法[J]. 清华大学学报(自然科学版), 2020, 60(10): 829-836.
CHEN Lele, HUANG Song, SUN Jinlei, HUI Zhanwei, WU Kaishun. Bug report quality detection based on the BM25 algorithm. Journal of Tsinghua University(Science and Technology), 2020, 60(10): 829-836.
