Chinese  |  English
Home Table of Contents

15 December 2018, Volume 58 Issue 12
    

  • Select all
    |
    COMPUTER SCIENCE AND TECHNOLOGY
  • HAO Shuang, LI Guoliang, FENG Jianhua, WANG Ning
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1037-1050. https://doi.org/10.16511/j.cnki.qhdxxb.2018.22.053
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Data cleaning is the process of detecting and repairing dirty data which is often needed in data analysis and management. This paper classifies and summarizes the traditional and advanced data cleaning techniques and identifies potential directions for further work. This study first formally defines the cleaning problem for structured data and then describes error detection methods for missing data, redundant data, conflicting data and erroneous data. The data cleaning methods are then summarized based on their error elimination method, including constraint-based data cleaning, rule-based data cleaning, statistical data cleaning and human-in-the-loop data cleaning. Some important datasets and noise injection tools are introduced as well. Open research problems and future research directions are also discussed.
  • CAI Yuan, LUO Wei, XIANG Dong
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1051-1058. https://doi.org/10.16511/j.cnki.qhdxxb.2018.22.043
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Existing methods cannot easily determine whether a routing algorithm for a network-on-chip contains a deadlock and the traditional turn model has serious limitations. This paper presents a simple, intuitive algorithm to determine whether the routing algorithm is deadlock-free. This paper gives a proof of the algorithm's correctness. Then, a column-partition turn model is given to implement deadlock-free minimal partially adaptive routing for a virtual cut-through (VCT)-switched 2-D mesh without extra virtual channels. This algorithm avoids deadlocks by restricting the locations for certain turns, which is similar to the odd-even turn method. Simulations show that this routing algorithm reduces the average latency and increases the saturation points compared to routing algorithms based on the odd-even turn model for various traffic patterns. Therefore, this column-partition turn model improves the performance of the whole network.
  • LI Jianjiang, HUA Shuiliang, WU Jie, ZHANG Kai
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1059-1065. https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.022
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    MapReduce is getting much attention in academia and industry for use in cloud computing to quickly deal with huge amounts of data. However, when MapReduce deals with text-centric applications, the algorithm generates is large amount of duplicate data in the intermediate results that increases the run time. A frequency buffering (FB) algorithm was used to add a Hash table before the ring memory to store frequent keys in a Hash table. However, since the algorithm is implemented by sampling, the algorithm may not accurately estimate the overhead and the frequent keys. Therefore, this study added a performance optimization algorithm to MapReduce to obtain the frequent keys by adding a counting Bloom filter (CBF) and a Hash table to dynamically filter the frequent keys before storing them in the ring memory. This algorithm more accurately identifies the frequent keys and greatly reduces the data sorting overhead and the disk I/O. Tests show that this performance optimization algorithm for MapReduce for obtaining the frequent keys significantly improves the execution speed by 17.04% compared to the original MapReduce and 9.31% higher than the frequency buffering algorithm.
  • MIAO Zhuang, YUAN Ye, QIAO Baiyou, WANG Yishu, MA Yuliang, WANG Guoren
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1066-1071. https://doi.org/10.16511/j.cnki.qhdxxb.2018.26.050
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Similarity calculations have many real life applications. The research on similarity calculations have mainly been focused on static graphs with many similarity calculation models based on SimRank. In real life, many systems, such as communication networks, are modeled by temporal graphs. However, the traditional SimRank algorithm cannot be implemented in temporal graphs. Therefore, this study analyzes the similarity calculation problem for large temporal graphs. A temporal-aware SimRank (TaSimRank) algorithm was developed to compute the node similarity through an efficient iterative method based on the topological structure and time constraints of the graph. An approximate algorithm is then used to implement the similarity calculations using a tree-based index built by a random walk and the Monte Carlo method. The algorithm balances the calculational time and efficiency. Tests on real temporal graphs demonstrate the effectiveness and extensibility of these approaches.
  • LU Xiaofeng, ZHANG Shengfei, YI Shengwei
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1072-1078. https://doi.org/10.16511/j.cnki.qhdxxb.2018.26.048
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Personal keystroke input patterns are difficult to imitate and can be used for identity authentication. The personal keystroke input data for a free-text can be used to learn the unique keystroke mode of a person. Detection based on a user's free-text keystrokes can be used for continuous identity authentication without affecting the user input. This paper presents a model that divides the keystroke data into fixed-length keystroke sequences and converts the keystroke time data in the keystroke sequence into a keystroke vector according to the time characteristics of the keystrokes. A convolutional neural network and a recurrent neural network are then used to learn the sequences of the personal keystroke vectors for identity authentication. The model was tested on an open data set with an optimal false rejection rate (FRR) of 1.95%, a false acceptance rate (FAR) of 4.12%, and an equal error rate (EER) of 3.04%.
  • ZOU Quanchen, ZHANG Tao, WU Runpu, MA Jinxin, LI Meicong, CHEN Chen, HOU Changyu
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1079-1094. https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.025
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    In recent years, the increasing size and complexity of software packages has led to vulnerability discovery techniques gradually becoming more automatic and intelligent. This paper reviews the search characteristics of both traditional vulnerability discovery techniques and learning-based intelligent vulnerability discovery techniques. The traditional techniques include static and dynamic vulnerability discovery techniques which involve model checking, binary comparisons, fuzzing, symbolic execution and vulnerability exploitability analyses. This paper analyzes the problems of each technique and the challenges for realizing full automation of vulnerability discovery. Then, this paper also reviews machine learning and deep learning techniques for vulnerability discovery that include binary function identification, function similarity detection, test input generation, and path constraint solutions. Some challenges are the security and robustness of machine learning algorithms, algorithm selection, dataset collection, and feature selection. Finally, future research should focus on improving the accuracy and efficiency of vulnerability discovery algorithms and improving the automation and intelligence.
  • THERMAL ENGINEERING
  • LI Xuefang, HE Qian, CHRISTOPHER D M, CHENG Lin
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1095-1100. https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.021
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    High pressure hydrogen jets are a critical topic in hydrogen safety research. Numerical simulations validated by measurements are an essential way to study high pressure hydrogen jets. However, the complete modeling of high pressure hydrogen jets is inefficient, unstable and difficult to converge, while the existing simplified models are based on non-physical assumptions and result in inaccurate predictions. A flow partitioning model based on quantitative shock structure measurements was developed by combining a real gas equation of state with the flow and energy conservation equations. The flow partitioning model takes into account the different flow conditions in the core flow region and the mixing layer and avoids modeling the shock region where the gas state varies dramatically which significantly simplifies the calculation. The predicted velocity and concentration distributions using the flow partitioning model agree well with the predictions by the complete model and with measurements, with these predictions being superior to predictions using the canonical notional nozzle model. The present study provides a reduced order modeling approach that simplifies the simulations without sacrificing the accuracy which will benefit hydrogen safety research.
  • WANG Zhenchuan, XU Ruina, XIONG Chao, JIANG Peixue
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1101-1106. https://doi.org/10.16511/j.cnki.qhdxxb.2018.25.046
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    The heat transfer can deteriorate with supercritical pressure fluids flowing in vertical tubes due to buoyancy. This study used a helical insert in the tube to change the flow structure and improve fluid heat transfer. Convection heat transfer of supercritical pressure CO2 in a vertical bare tube and with a helical insert was investigated experimentally to identify the effects of the heat flux, inlet Re, and flow direction on the heat transfer for both cases. The wall temperature distribution is nonlinear due to the buoyancy effect with the peak wall temperature gradually moving towards the entrance as the heat flux increases. The helical structure inserted into the bare tube effectively suppresses the heat transfer deterioration caused by the buoyancy effect and significantly increases the convective heat transfer the supercritical pressure CO2 in vertical tubes. The buoyancy effect can still reduce the heat transfer with supercritical pressure CO2 upward flow even with the helical insert structure for high heat fluxes.
  • MECHANICAL ENGINEERING
  • CHEN Daiwei, WU Jun, ZHANG Binbin, WANG Liping, LIANG Jianhong
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1107-1114. https://doi.org/10.16511/j.cnki.qhdxxb.2018.21.024
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Spindle reliability tests characterized by the load spectrum are used to test spindle reliability. This paper presents a spindle load spectrum compilation method based on coefficient identification and cutting simulations. The cutting force coefficient in a cutting force model is identified and the cutting force predicted by the cutting model is compared with the measured cutting force to validate the model. The simulated cutting force on an S-shaped specimen was analyzed to obtain time histories of the cutting parameters. The cutting force model and the cutting parameters were then used to predict the dynamic cutting force with the cycles of the dynamic counted with the rain-flow counting method. Alternative probability distribution models were fit to the counting result with the best model selected based on the multi-fitting curve and the KS (Kolmogorov-Smirnov) test. The load spectrum was then used to guide a spindle loading test. The results show that the dynamic force is accurately predicted and can be used to predict the load spectrum and also guide loading tests. This method will improve spindle test reliability.
  • BUILDING SCIENCE
  • YAN Xiang, WANG Jianghua, LI Hui, CHEN Yuxiao
    Journal of Tsinghua University(Science and Technology). 2018, 58(12): 1115-1120. https://doi.org/10.16511/j.cnki.qhdxxb.2018.22.030
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    The influence of noise on human sleep was studied by comparing people's sleeping quality in a quiet room with close to 0 dB(A) background noise to that in common bedrooms with background noise levels of 22-48 dB(A). 35 subjects of different genders and ages wore sleep evaluation head bands when sleeping in the silent room and in their bedrooms. The results show improved sleeping quality in the silent room compared with the common bedrooms with the total sleep time averagely increased by 11.4%, the deep sleep time averagely increased by 9.3%, and the rapid eye movement (REM) sleep time averagely increased by 12.2%. The improved deep sleep time positively correlates with the background noise for noise levels above 30 dB(A).