Journal of Tsinghua University(Science and Technology)

Table of Content

, Volume 56 Issue 11

Previous Issue Next Issue

For Selected:

View Abstracts

Download Citations
EndNote Reference Manager ProCite BibTeX RefWorks

Toggle Thumbnails

ELECTRICAL ENGINEERING

Select

Attributed object detection based on natural language processing

ZHANG Xu, WANG Shengjin

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1137-1142. DOI: 10.16511/j.cnki.qhdxxb.2016.26.001

Abstract

PDF (3539KB) ( 912 )

This paper addresses the problem of localizing an attributed object, such as "abandoned car", in images. Since one object may have tens or even hundreds of non-exclusive attributes, the main difficulties of attributed object detection are manually collecting training images and labeling the bounding boxes for a large number of attributed objects. This attributed object detector extends the object detector with an attributed object classifier. The attributed object classifier is trained by images from the Internet and labeling information gathered by the object detector and a natural language processing tool. An attributed object detection dataset was developed to evaluate the attributed object detectors. Tests show that this attributed object detector has good performance gains of 30% for the mean average precision compared to generic object detectors.

Figures and Tables | References | Related Articles | Metrics

ELECTRONIC ENGINEERING

Select

Speaker recognition system based on deep neural networks and bottleneck features

TIAN Yao, CAI Meng, HE Liang, LIU Jia

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1143-1148. DOI: 10.16511/j.cnki.qhdxxb.2016.26.002

Abstract

PDF (1083KB) ( 3395 )

A hybrid model combining the deep neural network (DNN) for speech recognition and the i-vector model for speaker recognition has been shown effective for speaker recognition. The system performance is further improved by using the DNN with speaker labels to extract bottleneck features to replace the original short-term spectral features for statistics extractions to make the statistics contain more speaker-specific information to improve the speaker recognition. Tests on the NIST SRE 2008 female telephone-telephone-English task demonstrate the effectiveness of this method. The relative improvements of the bottleneck features are 7.65% for the equal error rate(EER) and 5.71% for the minium detection function(minDCF) compared with the short-term spectral features.

Figures and Tables | References | Related Articles | Metrics

Select

Fundamental frequency characteristics of “dearing” as emotional speech

KONG Jiangping, LIN Youran

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1149-1153. DOI: 10.16511/j.cnki.qhdxxb.2016.26.003

Abstract

PDF (1129KB) ( 702 )

Dearing is a special kind of emotional speech. For emotion classification, dearing is not a mood or attitude, but a mode of speech which demonstrates a strong emotional activity. This study analysizes the dearing characteristics in terms of the fundamental frequency (f₀) with the most obvious characteristic of "dearing" being the raised f₀, which is not a constant increment, but is related to the tones, genders and vowels, with changes in the shapes of the f₀ graphs and the tone register. This study also examines how the f₀ transformation is related to dearing with sample syntheses and perceptional recognition, and demonstrates that the pitch increment typically shows the activity of dearing in the arousal dimension of emotional speech. The increment of f₀ is crucial to dearing yet it is not the only feature nor the sufficient condition of recognition.

Figures and Tables | References | Related Articles | Metrics

Select

GSOM-based modeling study of phoneme acquisition

CAO Mengxue, LI Aijun, FANG Qiang

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1154-1160. DOI: 10.16511/j.cnki.qhdxxb.2016.26.004

Abstract

PDF (1673KB) ( 699 )

Neural network models of child language acquisition are used to simulate children's phoneme acquisition for selected vowels and consonants of Standard German based on the growing self-organizing map (GSOM) modeling algorithm. An optimized growing strategy and a "cyclical reinforcing and reviewing training" procedure are integrated into the traditional GSOM algorithm.Simulations show that the "cyclical reinforcing and reviewing training" procedure significantly improves the learning quality of the network with the algorithm recognizing the vowel and manner of articulation categories to build the corresponding knowledge network. The modeling result reveals that during language acquisition, children have the ability to utilize acoustic features to acquire vowels and articulation categories, and to build acoustic space relations among different vowels.

Figures and Tables | References | Related Articles | Metrics

Select

Visualized correction of English monophthongs for Tibetan speakers

FENG Hui, SONG Rui, GAO Xiaodong, WU Tongyu, DANG Jianwu

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1161-1165. DOI: 10.16511/j.cnki.qhdxxb.2016.26.005

Abstract

PDF (1477KB) ( 650 )

The need for Tibetan speakers to improve their English requires effective tools to help them. Data extracted from "the Corpus of Chinese Mandarin, English, and Tibetan by Tibetan Speakers" (CETTS) was used to design a visualized model of tongue positions to help Tibetan speakers better understand the features of their production of English monophthongs. The easy-to-use software developed in this study provides real-time feedback which can help Tibetan speakers improve their English production and their English communication. The visualized model and the software can be used to improve the efficiency of English phonetics teaching.

Figures and Tables | References | Related Articles | Metrics

Select

Error patterns in fundamental frequency contours of L2 Mandarin utterances by Cantonese and English learners

GU Wentao

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1166-1172. DOI: 10.16511/j.cnki.qhdxxb.2016.26.006

Abstract

PDF (1554KB) ( 608 )

Natural speech strongly depends on the prosodic features such as tone, intonation and stress, which are very different in Mandarin, Cantonese and English. This study compares the fundamental frequency (f₀) contours of Mandarin speech between native speakers and 2 groups of L2 learners who were native in HK Cantonese and American English. The f₀ manifestations in tone, intonation and emphatic stress as well as their interactions are evaluated for a set of controlled sentences varying in sentence type, tone identity, and focus position. The results show that most L2 errors can be ascribed to negative transfers from the L1. The findings have pedagogical implications for learners with particular L1s to improve their L2 Mandarin prosody.

Figures and Tables | References | Related Articles | Metrics

Select

Sub-band adaptive noise reduction algorithm to improve speech intelligibility

LIANG Weiqian, ZHENG Fang, ZHENG Jiachun, PIAO Zhigang

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1173-1178. DOI: 10.16511/j.cnki.qhdxxb.2016.26.007

Abstract

PDF (1414KB) ( 676 )

Noise reduction algorithms to improve speech intelligibility are needed when sounds are compressed and amplified in hearing aids. A sub-band adaptive noise reduction algorithm was developed with a weighted overlap-add filter bank and psycho-acoustic model for the sub-band splitting. The non-linear noise reduction gains are computed with an estimated a posteriori signal to noise ratio (SNR) and an a priori SNR. The gain floors are determined based on the estimated noise level expressed as the dB sound pressure level (SPL). The final gains are smoothed between the frames by a peak detector with carefully selected attack and release time constants. Listening tests show 12% to 45% improvements in intelligibility by this algorithm for noise corrupted speech. A quantified gain table is also used to replace the non-linear gain computing when the algorithm is implemented on the EZAIRO5900 digital signal processor, with the execution cycle reduced by about 30%.

Figures and Tables | References | Related Articles | Metrics

Select

Cross-corpus speech emotion recognition based on a feature transfer learning method

SONG Peng, ZHENG Wenming, ZHAO Li

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1179-1183. DOI: 10.16511/j.cnki.qhdxxb.2016.26.008

Abstract

PDF (1081KB) ( 1424 )

Speech emotion recognition systems offen use training data and testing data from different corpora, so the recognition rates decrease drastically. This paper presents a feature transfer learning method for cross-corpora speech emotion recognition. The maximum mean discrepancy (MMD) is used to describe the similarities between the emotional feature distributions of the different corpora, then the latent close low dimensional feature space is obtained via the maximum mean discrepancy embedding (MMDE) and dimension reduction algorithms, with the classifiers then trained in this space for emotion recognition. A semi-supervised discriminative analysis (SDA) algorithm is further used for dimension reduction to better ensure the class discrimination of the emotional features. Tests on two popular speech emotion datasets demonstrate that this method efficiently improves the recognition rates for cross-corpora speech emotion recognition.

Figures and Tables | References | Related Articles | Metrics

Select

Correlations between vocal tract parameters and body heights in adult humans

CAO Honglin, KONG Jiangping

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1184-1189,1195. DOI: 10.16511/j.cnki.qhdxxb.2016.26.009

Abstract

PDF (1336KB) ( 659 )

The relationship between the adult speaker's vocal tract (VT) and their height was assessed using acoustic reflections to measure the VT morphometric data of 109 male subjects and 105 female subjects, aged 19-30 years. The heights were correlated with eight VT parameters, including VT length, volume and proportions. Significant gender differences for all eight VT parameters were found with the VTs of males being longer and larger than those of females. The pharynxes of males are relatively longer and larger parts of the oral cavity. Some gender differences were also found for correlations between the VT parameters and height. Specifically, both genders had significant positive correlations between the pharyngeal length, pharyngeal volume, VT length, VT volume and height with the correlations for females generally stronger than those for males. Only the female subjects' VT lengths showed moderate correlations with height, while all of the other correlations are quite weak. These findings provide theoretical support for estimating an unknown speaker's height based on their voice in forensic phonetics.

Figures and Tables | References | Related Articles | Metrics

Select

Voice activity detection in complex noise environment

GUO Wu, MA Xiaokong

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1190-1195. DOI: 10.16511/j.cnki.qhdxxb.2016.26.010

Abstract

PDF (1031KB) ( 1687 )

A voice activity detection (VAD) algorithm was developed for robust voice detection in complex noise conditions. The energy, the most dominant component and the spectral entropy are used to form three dimensional features that have been demonstrated to strongly complement each of them in the presence of complex noise. The K-mean algorithm is used to adaptively select the feature and to calculate the utterance dependent thresholds, which are applied in the following speech detection process. Tests on the NIST SRE 2008 and 2012 corpus show that this algorithm gives better performance for different noise conditions and is more robust and efficient than conventional unsupervised and supervised methods.

Figures and Tables | References | Related Articles | Metrics

Select

Effects of focal accent on segmental articulation and acoustical properties in standard Chinese

LI Yinghao, KONG Jiangping

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1196-1201. DOI: 10.16511/j.cnki.qhdxxb.2016.26.011

Abstract

PDF (1069KB) ( 584 )

The focal accent in standard Chinese affects the articulatory and acoustical properties of the segments inside and outside the narrow focus domain. The linguopalatal contacts of the segments /t/ and /i/ were obtained using electropalatography (EPG), with electroglottographic (EGG) and acoustic signals simultaneously recorded. The results show that the narrow focus domain has larger linguopalatal contact and alveolar seal duration for /t/. The vocal folds during the consonantal closure interval tend to be tense. The tongue gesture of /i/ is raised and fronted. The duration and intensity of the vowels become longer and louder and their spectral properties are modified. The spill-over effect of the focal accent is not clear but the production of the segments inside the narrow focus domain are generally strengthened. The focal accent does not affect the articulation and acoustical properties of segments outside the domain.

Figures and Tables | References | Related Articles | Metrics

Select

Acoustic characteristics of Mandarin affricates

LI Shanpeng, GU Wentao

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1202-1208. DOI: 10.16511/j.cnki.qhdxxb.2016.26.012

Abstract

PDF (1057KB) ( 726 )

This study investigated the relationships between acoustic parameters and phonetic features for six Mandarin affricates. The nine acoustic parameters including the duration, amplitude, spectral energy distribution, and F2 onset of the following vowel were extracted by Praat. An ANOVA analysis was used to show which acoustic parameters that can statistically distinguish the three places of articulation, two states of aspiration, and two following vowels. A discriminant analysis showed that the combination of all nine acoustic parameters gave a 85.9% recognition rate for the six affricates. A principle component analysis showed that the first five components contributed 86.3% of the information for the affricates. The spectral energy distribution parameters of the frication are the most important acoustic parameters for Mandarin affricates, some of which mainly contribute to the articulation location while others mainly contribute to the state of aspiration. The normalized duration and amplitude of the frication are the next important parameters, contributing to both the state of aspiration and to the following vowel. The F2 onset of the following vowel is affected by the place of articulation of the affricate.

Figures and Tables | References | Related Articles | Metrics

HYDRAULIC ENGINEERING

Select

Large-scale direct shear tests of block-reinforced soil

WANG Teng, ZHANG Ga

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1209-1212. DOI: 10.16511/j.cnki.qhdxxb.2016.26.013

Abstract

PDF (2524KB) ( 660 )

Large soil blocks, produced by grouting the soil are a new reinforcement approach with significant potential. Large-scale direct shear tests were conducted on soil reinforced by blocks simulated with gravel and aluminum cylinders. The results showed that the blocks exhibited significant movement and rotation with interaction chains during the direct shear tests. These increased soil shear strength with significant dilatancy. As the block reinforcement ratio increased, the soil shear strength increased and the soil deformation exhibited increased strain softening and dilatancy from the strain hardening and volumetric contraction. The block shape significantly affected the strength of the reinforced soil.

Figures and Tables | References | Related Articles | Metrics

NUCLEAR AND NEW ENERGY ENGINEERING

Select

Single droplet phase transformation model during motion

ZHAO Fulong, ZHAO Chenru, BO Hanliang

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1213-1219. DOI: 10.16511/j.cnki.qhdxxb.2016.26.014

Abstract

PDF (1150KB) ( 1004 )

A single droplet phase transformation model was developed for moving droplets based on the physical evaporation mechanism of the droplet phase transformation while moving in a steam-water separation plant, the model combined a static droplet phase transformation model with pressure variations and a droplet motion model. The model gives mathematical expressions for the mechanisms during the fast evaporation stage and the thermally controlled evaporation stage during the droplet movement. The pressure decreases due to the flow resistance and local structural changes, which breaks the liquid-vapor phase equilibrium. The results agree with the existed theoretical analysis. This model can be applied to separation efficiency calculations for droplets moving in steam-water separators including gravity separation, cyclone and rotary vane separators and wave plate separators, and can predict the influence of the droplet phase transformation on the separation characteristics to guide structure optimization and design of separation equipment.

Figures and Tables | References | Related Articles | Metrics

COMPUTER SCIENCE AND TECHNOLOGY

Select

Mispronunciation tendency detection using deep neural networks

ZHANG Jinsong, GAO Yingming, XIE Yanlu

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1220-1225. DOI: 10.16511/j.cnki.qhdxxb.2016.26.015

Abstract

PDF (1208KB) ( 802 )

A previous computer aided pronunciation training (CAPT) system with instructive feedback used mispronunciation tendency labeling in a GMM-HMM based detection system. This system is improved here using a DNN-HMM to model the mispronunciation with comparisons of the effects of three kinds of acoustic features, the mel-frequency cepstral coefficient (MFCC), the perceptual linear predictive analysis (PLP) and the Mel filter bank (FBank). The lattice rescore method is also used with these three features. The results show that the DNN-HMM gives a better detection rate than the conventional approach based on the GMM-HMM. Different features behave differently in capturing the specific mispronunciation tendencies, so the integration of these three features based on the lattice rescore gives the best results with an FRR of 5.5%, FAR of 35.6%, and DA of 88.6%.

Figures and Tables | References | Related Articles | Metrics

Select

Combined load balancing and energy efficiency in Hadoop

TIAN Wenhong, LI Guozhong, CHEN Yu, HUANG Chaojie, YANG Wutong

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1226-1231. DOI: 10.16511/j.cnki.qhdxxb.2016.26.016

Abstract

PDF (1321KB) ( 695 )

Hadoop clusters are widely used in enterprises and research institutions but there are few tools in Hadoop to dynamically load balance and improve the energy efficiency. A dynamic load balancing method with negative feedback was developed for a dynamic management system for Hadoop systems and tested using classic Hadoop benchmark examples. This method reduces the total idle time of the Hadoop nodes by 25% and reduces energy consumption by 14% on average compared with other algorithms by improving the load balancing through reducing the load variations by 10%.

Figures and Tables | References | Related Articles | Metrics

Select

Eliminating hot-spots based on cold-spot virtual machine migration in the cloud

GUO Jun, YAN Yongming, MA Anxiang, ZHANG Bin

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1232-1236. DOI: 10.16511/j.cnki.qhdxxb.2016.26.017

Abstract

PDF (1442KB) ( 664 )

Initial allocations of virtual machine(VM) resources are often unable to meet the performance requirements of runtime services, resulting in excessive resource utilization, slow response times and other "hot spot" problems. The traditional approach to eliminating these hot-spots has mainly been to include resources extensions and virtual machine live migration, but there are still problems with insufficient resources and large migration costs. The paper describes a hot-spot elimination method based on cold-spot VMs which migrates the cold-spot VM and then distributes the released resources to the hot-spot VM. This approach maintains the hot-spot service performance and reduces the cost of hot-spot elimination to better meet the SLA constraints. Tests show that this method is feasible and effective.

Figures and Tables | References | Related Articles | Metrics

Select

Lip protrusion measurement based on facial skeleton data

PAN Xiaosheng, ZHANG Menghan, Liew Wee Chung

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1237-1241. DOI: 10.16511/j.cnki.qhdxxb.2016.26.018

Abstract

PDF (2594KB) ( 555 )

The paper presents a method to measure lip protrusion. The upper and low lip movement patterns differ, so the lip protrusion is defined for the upper or lower lips as the Euclidean distance between the lip edge and the incisor. Three-dimensional lip coordinates were obtained by observing the trajectories of reference markers on human faces. The singular value decomposition (SVD) method was used to eliminate the head rigid-body movement and mouth opening movement. Then, the coordinates for the upper and lower incisors were obtained by calculating the coordinates of the reference markers pasted on the facial bony structure. Finally, lip edge coordinates were introduced to calculate the lip protrusion. The method gives good results with three-dimensional lip data and is also applicable for analyzing two-dimensional lip data.

Figures and Tables | References | Related Articles | Metrics

Select

SPH simulations of aeroacoustic problems in vocal tracts

WEI Jianguo, HAN Jiang, HOU Qingzhi, WANG Song, DANG Jianwu

Journal of Tsinghua University(Science and Technology). 2016, 56 (11): 1242-1248. DOI: 10.16511/j.cnki.qhdxxb.2016.26.019

Abstract

PDF (1455KB) ( 642 )

Simulation of human sound wave propagation need to take into account the moving boundaries and fluid flow within the vocal tract for accurate realistic models. Traditional mesh-based methods that are widely used to study human sound production have many problems due to mesh reconstruction and distortion, so they are not as effective as meshless methods. The aeroacoustic wave equations in the Eulerian framework are transformed to the governing equations for wave propagation in the Lagrangian form and discretized using the smoothed particle hydrodynamics (SPH) method. The accuracy and reliability of SPH for wave propagation in a static media are shown by comparisons with finite difference time domain (FDTD) results. This method is validated against the Doppler effect based theoretical solutions for one-and two-dimensional aeroacoustics to verify the ability of SPH to solve complex aeroacoustic problems.

Figures and Tables | References | Related Articles | Metrics