ISSN 1000-0585
CN 11-1848/P
Started in 1982
  Table of Content
      Volume 58 Issue 4   
    Score domain speaking rate normalization for speaker recognition
    AISIKAER Rouzi, WANG Dong, LI Lantian, ZHENG Fang, ZHANG Xiaodong, JIN Panshi
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 337-341.   DOI: 10.16511/j.cnki.qhdxxb.2018.25.028
    Abstract   PDF (978KB) ( 544 )
    Speaking rate variations seriously degrade speaker recognition accuracy. This paper presents a normalization approach in the score domain that reduces the impact of speaking rate variations. The score distributions for each type of imposter in the cohort set (global and local sets which consist of speech utterances at different speaking rates) are computed against each enrolled speaker with the local cohort set obtained by splitting the utterances in the global cohort set according to the relative speaking rates. The scores for the test speech are normalized based on a self-recorded speaking rate database using a GMM-UBM (Gaussian mixture model-universal background model) framework with the data sparsity problem handled by augmenting the training data with a final relative EER (equal error rate) reduction of 33.33%. This study shows that global and local score normalization methods effectively reduce the impact of speaking rate variations on speaker recognition.
    Crosslingual acoustic modeling in Uyghur speech recognition
    NURMEMET Yolwas, LIU Junhua, WUSHOUR Silamu, REYIMAN Tursun, DAWEL Abilhayer
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 342-346.   DOI: 10.16511/j.cnki.qhdxxb.2018.22.020
    Abstract   PDF (998KB) ( 729 )
    The Uyghur language has a little speech data for training acoustic models due to various data acquisition and annotation difficulties. This paper describes a modeling method for crosslingual acoustic models based on long short-term memory models. Mass Chinese language training data is used to train a deep neural network acoustic model. The network output layer weights are then randomly modified to create the output layer for the Uyghur language. A Uyghur language acoustic model is then trained using Uyghur language speech data to update all the weights. Tests show that this method reduces the word error rates of the Uyghur language transcription and dictation recognition by 20% and 30% than the baseline system. Thus, this method improves the Uyghur language acoustic model with better initial weights from the Chinese language data to train hidden layers in the neural network, and enhances the network robustness.
    Joint subspace learning and feature selection method for speech emotion recognition
    SONG Peng, ZHENG Wenming, ZHAO Li
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 347-351.   DOI: 10.16511/j.cnki.qhdxxb.2018.26.014
    Abstract   PDF (1045KB) ( 522 )
    Traditional speech emotion recognition methods are trained and evaluated on a single corpus. However, when the training and testing use different corpora, the recognition performance drops drastically. A joint subspace learning and feature selection method is presented here to imprive recognition. In this method, the feature subspace is learned via a regression algorithm with the l2,1-norm used for feature selection. The maximum mean discrepancy (MMD) is then used to measure the feature divergence between different corpora. Tests show this algorithm gives satisfactory results for cross-corpus speech emotion recognition and is more robust and efficient than state-of-the-art transfer learning methods.
    Influence on tone perception from vowels with different formant distributions
    CAO Chong, XIE Yanlu, ZHANG Jinsong
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 352-356.   DOI: 10.16511/j.cnki.qhdxxb.2018.25.023
    Abstract   PDF (1067KB) ( 532 )
    Previous studies have demonstrated the important effect of vowels on tone perception. The vowels qualities are mainly determined by formants. Existing studies were surveyed to investigate the effect of vowel formant distribution on tone perception. A continuum of vowel sounds from low to high vowels were investigated with stimuli based on three different tone continua to evaluate the tone identification. The results show that higher vowel formant distributions are more likely to be perceived as relatively low tones in the tone continuum. This effect varies with the tones with stronger effects on second and third tones. The tone perception effect is mainly reflected in the category boundary instead of the category width.
    MRI analyses of the effects of relative tongue size on individual articulatory differences
    LU Wenhuan, FENG Xiaoyan, HONDA Kiyoshi, WEI Jianguo
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 357-361.   DOI: 10.16511/j.cnki.qhdxxb.2018.22.022
    Abstract   PDF (1654KB) ( 429 )
    The tongue size can be used to evaluate normal and pathological articulation since it can be used to predict the tongue mobility within the oropharyngeal cavity. This study analyzes the relationship between the relative tongue size and tongue movement based on magnetic resonance images (MRI). The cine-and tagged-MRI are combined to obtain a new dataset which has clear vocal tract and marker points. The synthesized images are then used to analyze the relative tongue size and the tongue movement. The relative tongue size is defined as the ratio of the tongue area to the tongue area plus the oropharyngeal cavity area. A few marker points are sampled on the oral and pharyngeal surfaces to calculate the mean velocity. Comparison for different genders shows that smaller tongues have faster movements.
    Psychological distance prediction in dialogue discourse
    LÜ Xueqiang, ZHANG Xuejing, ZHOU Qiang
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 362-367.   DOI: 10.16511/j.cnki.qhdxxb.2018.26.016
    Abstract   PDF (1012KB) ( 396 )
    Dialogue behavior and dialogue intention are the key focal points in dialogue discourse studies. The dialogue mechanism is analyzed here in terms of the psychological distance and dialogue behavior. A psychological distance prediction model was developed to study the corpus with differences identified by studying the dialogue behavior distribution. This study shows that psychological distance scores are higher for work and love topics, when the speakers have a close relationship and in question-answer dialogue. So the scores are lower for weather and traffic topics, alienated relationships, and statement dialogues. The psychological distance is then related to the response type distributions. A statistical analysis of the dialogue behavior confirms the feasibility of the psychological distance prediction modelfor the future studies of dialogue content.
    Prosodic encoding of focus and interrogative meaning in Lhasa Tibetan
    ZHANG Xiaxia, WANG Bei
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 368-373.   DOI: 10.16511/j.cnki.qhdxxb.2018.21.012
    Abstract   PDF (1410KB) ( 311 )
    The focus and the interrogative meaning are both important communication functions which can be encoded prosodically. This study considered two target sentences uttered by 8 native Lhasa Tibetan speakers as both questions and statements with four focus conditions (initial, medial, final and neutral focus). The prosodic encoding of the focus and the interrogative meaning in Lhasa Tibetan were investigated using acoustic and statistical analyses of F0 and the duration of the target sentences. The results showed that on-focus words exhibit significant increases in F0, pitch range and duration, while pre-focus words remain the same in both questions and statements. Post-focus words in statements show obvious compression in F0 and pitch, while post-focus compression was not uniform in questions. Interrogative intonation was higher and the F0 rise of the post-focus parts was more stable than the corresponding focus condition of statements. The F0 of on-focus words was the same in statements and questions. Therefore, the F0 of the post-focus constituents is the main part of the interrogative intonation rise, but was not compressed for focal prominence. Thus, the F0 of on-focus words could not be used to differentiate between statements and questions.
    CTR prediction for online advertising based on a features conjunction model
    SHEN Fangyao, DAI Guojun, DAI Chenglei, GUO Hongjie, ZHANG Hua
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 374-379.   DOI: 10.16511/j.cnki.qhdxxb.2018.26.021
    Abstract   PDF (1031KB) ( 631 )
    Click-through rate (CTR) predictions are important for internet companies. The CTR is closely related to the context, user attributes and advertising attributes, with effective CTR predictions essential for improving company revenue. The traditional LR model was optimized to predict the relationship between the user and advertiser characteristics for the CTR which were added to the Sigmoid function to obtain a new features conjunction model. The online optimization algorithm follow-the-regularized-leader (FTRL) was used to improve the efficiency of the parameter, and the mixed regularization was used to prevent over fitting. Tests on a real-world advertising dataset show that this method has better accuracy, efficiency, parameter sensitivity and reliability compared with previous algorithms.
    Intrusion detection for industrial control systems based on an improved SVM method
    CHEN Dongqing, ZHANG Puhan, WANG Huazhong
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 380-386.   DOI: 10.16511/j.cnki.qhdxxb.2018.25.019
    Abstract   PDF (1251KB) ( 455 )
    Industrial control system intrusion detection models based on the support vector machine (SVM) optimized by Kalman particle swarm optimization (KPSO) can become trapped in a local minimum. This paper presents a multi-innovation theory based KPSO that not only considers the current time observation information, but also uses previously useful information for predicting the particle states. Therefore, the algorithm provides sufficient momentum for updating the particle position so that the algorithm can jump out of a local minimum for better optimization accuracy. The algorithm was used to optimize the parameters for an SVM based intrusion detection model with the simulation results evaluated using the industrial intrusion detection standard dataset. The results show that the detection rate, false negative rate and false positive rate are significantly better with the SVM intrusion detection model optimized by this algorithm than with the KPSO, PSO and genetic algorithms.
    Highly-descriptive chain of trust in trusted computing
    LONG Yu, WANG Xin, XU Xian, HONG Xuan
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 387-394.   DOI: 10.16511/j.cnki.qhdxxb.2018.25.017
    Abstract   PDF (1599KB) ( 309 )
    The trusted boot process in trusted computing verifies the next boot module from the root of trust to establish a chain of trust. The classic chain of trust is a simple single-branch tree, but this may not satisfy complete user demands. This paper presents a multi-module chain of trust model based on HIBS (hierarchical identity-based signature) and a multi-pattern chain of trust model based on FIBS (fuzzy identity based signature) that overcome the limitations of single module expectations in a traditional chain so that the user can dynamically choose the module. The two chains of trust models are then combined to improve the results.
    Anomaly detection based on IO sequences in a virtual machine with the Markov mode
    CHEN Xingshu, CHEN Jiaxin, ZHAO Dandan, JIN Xin
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 395-401,410.   DOI: 10.16511/j.cnki.qhdxxb.2018.25.018
    Abstract   PDF (1324KB) ( 618 )
    A abnormal IO behavior in virtual machines is monitored to discover known and unknown virtual machine escape attacks. Hardware-assisted virtualization is used here in an anomaly detection method for IO sequences in virtual machines including asynchronous acquisition to efficiently collect the IO sequences of the virtual machine, relating the IO sequences with the processes running in the virtual machine for a fine-grained description of the virtual machine's IO behavior, and an algorithm for generating short IO sequences in virtual machines based on a double-layer hash table and a Markov chain model to detect the IO sequences of malicious virtual machines. A virtual machine detection system was implemented on a Kernel-based virtual machine (KVM) to evaluate the effectiveness of this system. The results show that the system can effectively detect some IO based on security threats and some known and unknown virtual machine escape attacks with an acceptable false alarm rate and performance overhead.
    Influence of cooling/lubrication conditions on the drilling quality and thrust force of CFRP/Al stacks
    ZHANG Yuxi, WU Dan, YANG Yapeng, MA Xinguo, LIANG Xiong
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 402-410.   DOI: 10.16511/j.cnki.qhdxxb.2018.22.016
    Abstract   PDF (4203KB) ( 597 )
    The influence of cooling/lubrication conditions on the tool wear, hole quality, and thrust force was analyzed using minimal quantity lubrication (MQL) and dry drilling for drilling carbon fiber reinforced plastic (CFRP) and aluminum stacks. A mechanical model was developed to analyze the increase in the thrust force and the hole quality characteristics with the number of holes when drilling CFRP layers. The relationships between the built-up edge (BUE), chip separation mechanism, and thrust force were also analyzed. The results show that the BUE influences the thrust force for the CFRP layer by changing the chisel edge real working length, BUE angle, friction angle, and stress states in the real shear plane of the carbon fiber. The differences between the thrust forces at the chisel edge with MQL and dry drilling are the main effects influencing the resultant thrust force. The carbon fibers with the high hardness extruded cutting edges break the BUE continuity which worsens the stack holes quality. MQL improves the machining quality for CFRP/Al stack drilling compared with dry drilling.
    Calibration of 3-D measurement system based on a double position sensitive detectors
    ZHENG Jun, LI Wenqing
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 411-416.   DOI: 10.16511/j.cnki.qhdxxb.2018.26.023
    Abstract   PDF (3125KB) ( 871 )
    Binocular vision systems have been widely used in many areas. Traditional calibration methods for binocular vision systems commonly use many complicated mathematical models, which result in low precision and speed. This paper presents a fast measurement method based on double position sensitive detectors (PSDs). Two detectors are aimed from different angles to detect the position of the laser point for the 3D measurement. The 3-D measurement is greately simplified by replacing a charge coupled device (CCD) with a PSD. Since this method is fundamentally different from traditional methods, the normal calibration methods are no longer applicable. Thus, this article presents two calibration methods respectively using an improved Faugeras calibration combined with Levenberg-Marquardt (LM) arithmetic optimization and a back propagation (BP) neural network. Tests show that the LM optimization gives better accuracy and stability.
    Control algorithm for a seamless shifting 2-speed transmission based on the optimal trajectory
    TAI Yuzhuo, SONG Jian, LU Zhenghong, FANG Shengnan, NGUYEN Truong Sinh
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 417-423.   DOI: 10.16511/j.cnki.qhdxxb.2018.26.022
    Abstract   PDF (2083KB) ( 474 )
    To improve the dynamics and energy efficiencies of electric vehicles (EVs), many researchers have decided to equip EVs with seamless shifting transmission. This paper focuses on the control algorithm for a seamless shifting transmission that reduces the shock and friction losses during shifting. An EV powertrain model is developed that includes the seamless shifting transmission and then discretized. The discrete time algebraic equation and the objective function are then used to calculate the optimal control vectors and reference trajectories of the rotational speeds. The control vectors and trajectories are then used to develop a control algorithm which contains both feed forward and feed backward parts to control the transmission. Simulations show that the algorithm not only reduces the friction loss, but also lessens the shock during shifting.
    Numerical analyses of the RCS characteristics of a vehicle body with concave/convex surface features
    SUN Honghai, LÜ Zhenhua
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 424-431.   DOI: 10.16511/j.cnki.qhdxxb.2018.22.019
    Abstract   PDF (1998KB) ( 472 )
    The influence of minor geometrical features on vehicle surfaces on the radar stealth characteristics was analyzed in simulations. Numerical analyses of the electromagnetic scattering characteristics were conducted using a simplified vehicle side door with minor seams, grooves and convex geometrical features (bullet-proof glass and a metal bar for a door-seal). The results show that these minor uneven surface features negatively affect the body's radar cross section (RCS) characteristics. The numbers of incidence angles for the RCS area greater than 10 dBsm for the simplified vehicle side doors with minor seams and grooves are up to 2.2 and 2.8 times that of a smooth metal plate. The number of incidence angles for the RCS area greater than 10 dBsm for an armored vehicle side door with bullet-proof glass is 5.4 times greater while that with a metal bar seal is 1.7 times greater. Electromagnetic scattering experiments using a rectangular aluminum plate with a rectangular aluminum bar frame on one side show that the bar frame increases the plate's RCS area. Thus, even minor uneven surface features on a vehicle body increase the numbers of incidence angles for the RCS area greater than 10 dBsm, although they have less influences on the maximum peak RCS area. Therefore, exterior surface seams, grooves and convex features should be avoided to improve the radar stealth characteristics.
    Automatic driving control based on time delay dynamic predictions
    ZHAO Jianhui, GAO Hongbo, ZHANG Xinyu, ZHANG Yinglin
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 432-437.   DOI: 10.16511/j.cnki.qhdxxb.2018.21.011
    Abstract   PDF (2280KB) ( 1018 )
    Signal delays, limited frontal view distances and other factors during self-driving limit the ability of self-driving cars to accurately track their planning trajectory. A simplified bicycle model was used to optimize a classical pure tracking model in an automatic driving control method based on dynamic delay prediction. A vehicle kinematics model is used to predict the vehicle motion direction and position after the delay. The optimal front sight distance is obtained according to difference between driving the actual direction and the tracking direction. MATLAB simulations show that this algorithm can track the planning trajectory at a maximum speed of 7 m/s with the average error controlled to within 0.3 m. Thus, the tracking performance is better than the traditional pure pursuit method.
    Overview of deep learning intelligent driving methods
    ZHANG Xinyu, GAO Hongbo, ZHAO Jianhui, ZHOU Mo
    Journal of Tsinghua University(Science and Technology). 2018, 58 (4): 438-444.   DOI: 10.16511/j.cnki.qhdxxb.2018.21.010
    Abstract   PDF (2102KB) ( 3836 )
    This paper introduces target recognition and detection methods based on the convolutional neural network (CNN) model, the improved regions with convolutional neural network (R-CNN) and the task-assistant convolutional neural network (TA-CNN) model for pedestrian detection. This paper also describes stereo matching based on a deep learning model for stereo matching using the Siamese network. Multi-source data fusion is also introduced based on a vision sensor, a radar sensor and a camera using a deep learning network. The CNN is used for end-to-end horizontal and vertical control of autonomous vehicles. Deep learning is widely used in the perception level, decision-making level and control level in automatic driving systems to continuously improve the perception, detection, decision-making and control accuracy. Analyses show that deep learning will improve of autonomous driving systems.
