SUN Jiaze, YANG Jiawei, YANG Zijiang
Data race is a typical concurrency bug in multithreaded programs. Data race is difficult to detect due to the uncertain interleaving in multithreaded programs. A random forest instruction level data race detection model is developed for multithread programs using five attributes to identify the data race features. Firstly, data race detection at the instruction level is based on the happens-before relationship and the lockset algorithm. At the same time, the assembly source code is used to eliminate implicit synchronization pairs. Then, the analysis results from the happens-before relationship and the lockset algorithm are used to train a random forest detection model for multithreaded program data race detection. This data race detection tool for multithreaded programs, AIRaceTest, is implemented on Pin. The model is trained with the results of the multithreaded program instrumentation in GitHub as a sample set. The model accuracy reaches 92.1%. Test results on the classic multithreaded programs, Google data-race-test and Parsec benchmark 3.1, show that the false positives are reduced by about 10.6% and the false negatives are reduced by about 12.3% compared with Eraser, Djit+and Thread Sanitizer. For a large number of threads, the time overhead is reduced by 41.8% while the memory overhead is reduced by 22.4%.