Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2020, Vol. 60 Issue (10) : 829-836     DOI: 10.16511/j.cnki.qhdxxb.2020.25.002
SPECIAL SECTION: FAULT TOLERANT COMPUTING |
Bug report quality detection based on the BM25 algorithm
CHEN Lele1, HUANG Song1,2, SUN Jinlei1, HUI Zhanwei1, WU Kaishun1
1. College of Command&Control Engineering, Army Engineering University of PLA, Nanjing 210007, China;
2. PLA Military Software Testing and Evaluation Center, Nanjing 210007, China
Download: PDF(1127 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Bug reports are used to identify and track defects for improving software quality. Software testing often uses multiple users and parallel testing. The resulting numerous bug reports must then be integrated while removing fake or duplicate bug reports. This paper presents an automatic detection method for bug reports based on the BM25 algorithm. After preprocessing the bug reports, a matching library is built based on the test requirements and test report samples. The BM25 algorithm is used to calculate the similarities between reports to identify accurate bug reports. Tests with software test contest data show that the model can correctly judge most bug reports to effectively improve the efficiency of identifying false negatives and duplicates.
Keywords software testing      BM25 algorithm      bug report      natural language processing     
Issue Date: 09 July 2020
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
CHEN Lele
HUANG Song
SUN Jinlei
HUI Zhanwei
WU Kaishun
Cite this article:   
CHEN Lele,HUANG Song,SUN Jinlei, et al. Bug report quality detection based on the BM25 algorithm[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(10): 829-836.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2020.25.002     OR     http://jst.tsinghuajournals.com/EN/Y2020/V60/I10/829
  
  
  
  
  
  
  
  
[1] Eclipse Foundation. Eclipse official website.. https://www.eclipse.org/.
[2] Mozilla official website... http://www.mozilla.org/en-US.
[3] BETTENBURG N, PREMRAJ R, ZIMMERMANN T, et al. Duplicate bug reports considered harmful … really?[C]//2008 IEEE International Conference on Software Maintenance. Beijing, China:IEEE, 2008.
[4] ANVIK J, HIEW L, MURPHY G C. Who should fix this bug?[C]//Proceedings of the 28th International Conference on Software Engineering. Shanghai, China:ICSE, 2006.
[5] THOMAS S W, NAGAPPAN M, BLOSTEIN D, et al. The impact of classifier configuration and classifier combination on bug localization[J]. IEEE Transactions on Software Engineering, 2013, 39(10):1427-1443.
[6] RUNESON P, ALEXANDERSSON M, NYHOLM O. Detection of duplicate defect reports using natural language processing[C]//Proceedings of the 29th International Conference on Software Engineering. Minneapolis, USA:IEEE, 2007.
[7] WANG X Y, ZHANG L, XIE T, et al. An approach to detecting duplicate bug reports using natural language and execution information[C]//2008 ACM/IEEE 30th International Conference on Software Engineering. Leipzig, Germany:IEEE, 2008.
[8] KAUSHIK N, TAHVILDARI L. A comparative study of the performance of IR models on duplicate bug detection[C]//Proceedings of the 2012 16th European Conference on Software Maintenance and Reengineering. Washington, USA:IEEE, 2012.
[9] DEERWESTER S, DUMAIS S T, FURNAS G W, et al. Indexing by latent semantic analysis[J]. Journal of the American Society for Information Science, 1990, 41(6):391-407.
[10] LANDAUER T K, MCNAMARA D S, DENNIS S, et al. Handbook of latent semantic analysis[M]. Mahwah, USA:Lawrence Erlbaum Associates Publishers, 2007.
[11] BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[12] KANERVA P, KRISTOFERSON J, HOLST A. Random indexing of text samples for latent semantic analysis[C]//Proceedings of the 22nd Annual Conference of the Cognitive Science Society. Philadelphia, USA:University of Pennsylvania, 2000:103-106.
[13] SUN C N, LO D, KHOO S C, et al. Towards more accurate retrieval of duplicate bug reports[C]//Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. Lawrence, USA:IEEE, 2011.
[14] NGUYEN A T, NGUYEN T T, NGUYEN T N, et al. Duplicate bug report detection with a combination of information retrieval and topic modeling[C]//2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. Essen, Germany:IEEE, 2012.
[15] Information Retrieval. Wikipedia for information retrieval... http://wikipedia.hk.wjbk.site/wiki/信息检索/.
[16] SALTON G, WONG A, YANG C S. A vector space model for automatic indexing[J]. Communications of the ACM, 1975, 18(11):613-620.
[17] PAPADIMITRIOU C H, RAGHAVAN P, TAMAKI H, et al. Latent semantic indexing:A probabilistic analysis[J]. Journal of Computer and System Sciences, 2000, 61(2):217-235.
[18] ZHENG B, MCLEAN JR D C, LU X H. Identifying biological concepts from a protein-related corpus with a probabilistic topic model[J]. BMC Bioinformatics, 2006, 7(1):58.
[19] WALLACH H M. Topic modeling:Beyond bag-of-words[C]//Proceedings of the 23rd International Conference on Machine Learning. New York, USA:ACM, 2006.
[20] ROBERTSON S E, ZARAGOZA H, TAYLOR M. Simple BM25 extension to multiple weighted fields[C]//Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management. Washington DC, USA:ACM, 2004.
[21] WANG J J, WANG S, CUI Q, et al. Local-based active classification of test report to assist crowdsourced testing[C]//Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. New York, USA:ACM, 2016.
[1] WANG Yun, HU Min, TA Na, SUN Haitao, GUO Yifeng, ZHOU Wuai, GUO Yu, ZHANG Wanzhe, FENG Jianhua. Large language models and their application in government affairs[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(4): 649-658.
[2] WANG Qingren, WANG Yinzi, ZHONG Hong, ZHANG Yiwen. Chinese-oriented entity recognition method of character vocabulary combination sequence[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(9): 1326-1338.
[3] LU Sicong, LI Chunwen. Human-machine conversation system for chatting based on scene and topic[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 952-958.
[4] HU Bin, GENG Tianyu, DENG Geng, DUAN Lei. Faster biomedical named entity recognition based on knowledge distillation[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 936-942.
[5] JIA Xudong, WANG Li. Text classification model based on multi-head attention capsule neworks[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 415-421.
[6] WANG Yuanlong, LI Ru, ZHANG Hu, WANG Zhiqiang. Causal options in Chinese reading comprehension[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 272-278.
[7] BAI Xiaoying, HUANG Jun. Case generation by constraints combinatorial testing[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(3): 225-233.
[8] LU Zhaolin, LI Shengbo, Schroeder Felix, ZHOU Jichen, CHENG Bo. Driving comfort evaluation of passenger vehicles with natural language processing and improved AHP[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(2): 137-143.
[9] ZHANG Xu, WANG Shengjin. Attributed object detection based on natural language processing[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(11): 1137-1142.
[10] CUI Baojiang, WANG Fuwei, GUO Tao, LIU Benjin. Research of taint-analysis based API in-memory fuzzing tests[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(1): 7-13.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd