Loading...
首页
期刊介绍
期刊订阅
联系我们
快速检索
引用检索
图表检索
高级检索
最新录用
|
预出版
|
当期目录
|
过刊浏览
|
阅读排行
|
下载排行
|
引用排行
|
百年期刊
ISSN 1000-0585
CN 11-1848/P
Started in 1982
About the Journal
»
About Journal
»
Editorial Board
»
Indexed in
»
Rewarded
Authors
»
Online Submission
»
Guidelines for Authors
»
Templates
»
Copyright Agreement
Reviewers
»
Guidelines for Reviewers
»
Online Peer Review
Office
»
Editor-in-chief
»
Office Work
»
Production Centre
Table of Content
, Volume 64 Issue 11
Previous Issue
Next Issue
For Selected:
View Abstracts
Download Citations
EndNote
Reference Manager
ProCite
BibTeX
RefWorks
Toggle Thumbnails
SPECIAL SECTION: PUBLIC SAFETY SCIENCE AND TECHNOLOGY
Select
Quantitative methods for landslide subsurface deformation based on acoustic emission monitoring
DENG Lizheng, YUAN Hongyong, CHEN Jianguo, SU Guofeng, ZHANG Mingzhi, CHEN Yang, PAN Rui
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1849-1859. DOI: 10.16511/j.cnki.qhdxxb.2024.26.015
Abstract
HTML
PDF
(6102KB) (
110
)
[Significance] Slope instability early warning systems are used to monitor landslide deformation to ensure proper stakeholders make timely safety decisions and take emergency actions. Acoustic emission signals are constantly produced during landslide deformation and can be employed to monitor slope stability. Acoustic emission technology using an active waveguide has gradually become an effective monitoring method for subsurface deformation of soil landslides. It has the characteristics of low cost and high sensitivity for early detection of minor deformation within slopes. Therefore, acoustic emission technology is anticipated to increase the success rate of landslide risk early warning. [Progress] Based on large-scale landslide model experiments and field monitoring studies, the interpretation methods for acoustic emission monitoring data evolved from qualitative to quantitative. Many landslide on-site tests using acoustic emission monitoring revealed a proportional correlation between acoustic emission rate and landslide velocity. This paper described an empirical formula method for quantifying landslide subsurface deformation behavior using acoustic emission data, extracting landslide movement information, and examining the change pattern by inversely calculating landslide displacement, velocity, and acceleration. The threshold of acoustic emission parameters that trigger landslide warnings could be obtained based on the landslide velocity classification standard. However, challenges remained in developing widely applicable methods for quantifying acoustic emission data, such as the diversity of conditions in the monitoring equipment and the complexity of the interaction within the active waveguide. These challenges limited the accurate quantification of the deformation-acoustic emission response relationship, and thus, the reliability of landslide warning results could not be ensured. To overcome the above limitations, a machine learning approach was proposed, which could automatically interpret acoustic emission monitoring data and quantify the response relationship between deformation and acoustic emission. An automatic classification model for the landslide motion state and a prediction model for landslide displacement were developed to accurately measure the representative deformation characteristics, including landslide velocity, acceleration, and displacement. The classification model was trained using two acoustic emission parameters (ring down count, change rate of ring down count) and the actual labels of the landslide kinematic state. Only the two acoustic emission parameters were input to the trained classifier and kinematic labels were produced through model prediction. A machine learning-based landslide displacement prediction method was developed, where landslide displacement can be automatically measured using acoustic emission data and related parameters (e.g., rainfall). Based on the output results from machine learning classification and prediction, a method for landslide early warning with graded risk was then developed, considering the response to negative circumstances such as missing data. [Conclusions and Prospects] Finally, this article discusses the tendency to choose acoustic emission data interpretation methods for different application scenarios and alludes to the limitations and development trends of these interpretation methods. Machine learning is the current trend in acoustic emission data analysis methods, which can increase the reliability of landslide risk warning systems. In the future, a full waveform data-based acoustic emission analysis method will be introduced for landslide deformation monitoring. It is hoped that acoustic emission technology will be developed as a universal monitoring technique for soil landslide subsurface deformation.
References
|
Related Articles
|
Metrics
Select
Resistance resilience of railway passenger transport networks in urban agglomerations from a spatiotemporal perspective
WU Peng, LI Dewei
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1860-1869. DOI: 10.16511/j.cnki.qhdxxb.2025.26.003
Abstract
HTML
PDF
(2823KB) (
60
)
[Objective] The resilience of transportation networks is a prominent research area in transportation safety. However, current studies on transportation network resilience often inadequately measure the changes in spatiotemporal travel costs for passengers, primarily focusing on the recovery phase rather than the resistance phase in two-stage resilience. There is also insufficient identification and analysis of critical segments, and a lack of suitable resilience simulation and evaluation methods for urban agglomeration railway passenger transport networks. This paper proposes a resistance resilience assessment model and a resistance resilience simulation evaluation process for urban agglomeration railway passenger transport networks centered on spatiotemporal accessibility for passengers. The aim is to evaluate the resistance resilience of these networks and identify critical segments. [Methods] This paper explores the concepts of resistance resilience and recovery resilience within transportation networks. Utilizing the complex network Space L modeling method, this paper develops a spatiotemporal weighted urban agglomeration railway passenger transport network model that considers actual railway passenger stations as network nodes. Segment interruption scenarios were simulated using attack modes involving single segment deletion and multiple segment continuous deletion. A dynamic resistance resilience evaluation index termed the network performance retention rate, was introduced based on the performance response function and spatiotemporal accessibility of passengers. This paper devises a resistance resilience assessment model and simulation evaluation process to evaluate the substitutability of segments and the overall network resistance resilience. The Chengdu—Chongqing urban agglomeration was selected as a case study to identify and compare critical segments and resistance resilience across unweighted, spatially weighted, and temporally weighted railway networks. [Results] The results of this paper were as follows: (1) The interruption of critical segments near railway hub cities could lead to a maximum network performance loss of 12.23%. It was necessary to identify critical segments through predisaster simulations. (2) Significant differences were found in the critical segments identified through resistance resilience simulations across unweighted, spatially weighted, and temporally weighted railway networks. The Spearman correlation coefficient indicated a relatively poor correlation between the critical segment rankings of unweighted and weighted railway networks. (3) The resistance resilience indices of the three railway networks highlighted that single segment interruptions significantly affected travel time. (4) Continuous interruption of identified critical segments severely affected network performance, with temporally weighted railway networks experiencing a stronger impact than spatially weighted and unweighted railway networks. Predisaster simulations solely based on topological structure or spatial distance might underestimate the consequences of risk interference. [Conclusions] The methods proposed in this paper address the gap in targeted research on the resistance resilience of railway passenger transport networks in urban agglomerations. Simulations of single segment interruption and multiple segment continuous interruption enable the identification and verification of key network segments. Additionally, analyzing the network resistance to interruptions provides a scientific foundation for transportation network planning and decision-making. Furthermore, analyzing the network's resilience evaluation index of the network performance retention rate proposed in this paper offsets the impact of disturbance time uncertainty, providing a scientific foundation for transportation network resilience research.
References
|
Related Articles
|
Metrics
Select
A CA-SRGAN-based fire-hazardous item detection model in ancient architecture
LI Yueming, HUANG Guozhong, WANG Bo, GAO Xuehong, SUN Zhanhui, PAN Rui
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1870-1879. DOI: 10.16511/j.cnki.qhdxxb.2024.26.013
Abstract
HTML
PDF
(14176KB) (
41
)
[Objective] Ancient buildings are not just architectural structures but valuable cultural heritage sites that, when destroyed by fires, cause irreplaceable losses to history. In many instances, the root cause of these fires is not identified, although fire-hazardous items (such as cigarettes) often emerge as possible culprits. At present, fire risk assessment methods often fall short, offering subjective results and low detection rates that do not often meet the needs of practical application requirements. Therefore, we adopt a approach: real-time detection of fire-hazardous items such as cigarettes for effective monitoring and early warning. We aim to establish a model that provides accurate detection in real time using existing monitoring systems and video images. Considerably, we also plan a method for reconstructing blurred images. [Methods] To detect fire-hazardous items, five categories are selected: lighters, matches, candles, cigarettes, and incense. We independently create a dataset in VOC format, including 7 157 and 6 536 images of hazardous and non-hazardous items. Regarding the model, we choose the you only look once v5 (YOLOv5) model as the starting point and made two key enhancements. First, we improve the generalized intersection over union (GIoU) loss function used in the YOLOv5 model by replacing it with complete intersection over union (CIoU), a new loss function. Unlike GIoU, CIoU considers the distance of the center point and aspect ratio, providing an accurate assessment of the quality of the predicted bounding box and leading to enhanced target box regression. Second, we tackle the issue of mismatched weight distribution between feature maps and channels by incorporating coordinate attention (CA) into the YOLOv5 backbone. CA reduces the number of channels in the feature map to increase the receptive field, learns the weight distribution of channels, and reallocates channel features based on these parameters. Ultimately, to reconstruct images, we utilize a super-resolution reconstruction algorithm based on the generative adversarial network, known as super-resolution generative adversarial network (SRGAN). SRGAN effectively removes blur from the original image, resulting in a more natural-looking reconstructed image and improving the overall quality of the dataset images. [Results] We implemented the ADAM optimizer to train the fire hazard detection model using a batch size of 16 for over 100 epochs. SRGAN-YOLOv5 performed exceptionally well in key metrics such as precision and recall, achieving impressive scores of 0.940 and 0.770, respectively. In terms of average precision for individual targets, we observed scores of 0.839, 0.743, 0.949, 0.851, and 0.767 for lighters, cigarettes, candles, incense, and matches, respectively. This resulted in a mean average precision (mAP) of 0.830 across all categories. This model was tested against mainstream object detection networks for comparison, including faster regions-convolutional neural network (faster-RCNN), YOLOv3, YOLOv4, YOLOv5, and YOLOX. SRGAN-YOLOv5 ranked second in precision and mAP but boasted the highest frames per second (fps) rate. Considering the overall performance, SRGAN-YOLOv5 demonstrated significant advantages and extensive prospects in practical applications. In this paper, we also provided visualizations of the detection results achieved by the SRGAN-YOLOv5 model. [Conclusions] This work has led to the creation of a custom-built fire hazard dataset, can detect fire-hazardous items effectively. Utilizing SRGAN for image reconstruction, we successfully enhance the image resolution, which in turn improved the accuracy of the SRGAN-YOLOv5 model. As for the model itself, we refine the loss function and integrated the CA, leading to further enhancements in the precision of the SRGAN-YOLOv5 model. As a result, this model can achieve rapid and high-precision detection, making it a valuable tool for fire hazard detection.
References
|
Related Articles
|
Metrics
Select
Urban large-scale evacuation zoning planning methods for flooding disasters
LÜ Wei, JIANG Huihua, WANG Jinghui, YANG Xiaoting
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1880-1892. DOI: 10.16511/j.cnki.qhdxxb.2024.26.014
Abstract
HTML
PDF
(6301KB) (
41
)
[Objective] Safe and effective evacuation of individuals during extreme weather conditions is critical in evacuation planning. Compared with traditional personnel evacuation and emergency transportation studies, evacuation zone planning is in its early stages. It lacks comprehensive consideration of major urban disaster scenarios, particularly beyond hurricanes. Additionally, there is no unified system for defining problems or measuring urban population hotspots, spatiotemporal disaster impacts, and exit distribution in evacuation planning. To address the practical issues of evacuating affected individuals during heavy rain and flood disasters, this paper proposes a model for delineating evacuation zones. [Methods] Starting with establishing evacuation needs and quantifying the impact of disasters on road segments, this study considers urban population distribution hotspots and the characteristics of heavy rain and flood disasters. Through modeling analysis, geographic information system (GIS) visualization, and other methods, a model is developed for the integrated delineation of evacuation zones and the allocation of evacuees at exits. The main components include the following: (1) To identify hotspot areas in urban functional zones and establish evacuation needs based on the city’s road network. (2) To assess the risk of heavy rain and flood disasters, establish a risk indicator system for flood risk assessment (including causative factors, disaster-prone environments, disaster-prone bodies, and disaster prevention and mitigation capabilities), and develop a road damage model to determine road network damage. (3) To construct a two-tier planning optimization model to determine evacuation paths and exit allocations. (4) To use Wuhan’s Wuchang district as an example, the effectiveness of the proposed method for large-scale urban evacuation zone planning under flood disasters is validated. The upper-level model provides the proportion of evacuees that each evacuation point should accommodate, with these allocation ratios stored in chromosomes as input for the lower level. The lower-level problem uses the incoming allocation ratios to calculate the evacuation flow for each OD pair and evaluates the fitness of the upper-level chromosomes. This is achieved using the Frank-Wolfe algorithm. The two-tier framework allows for detailed treatment of complex evacuation planning problems, ensuring the global minimization of total evacuation time and individual minimization of evacuee travel time. [Results] The innovative aspects included identifying evacuation needs in urban hotspots and constructing road damage levels under risk zoning for heavy rain and flood disasters. The two-tier planning optimization model minimized overall evacuation time and individual travel time, making the evacuation plan more realistic and reasonable. [Conclusions] The proposed method for large-scale urban evacuation zone planning is feasible, risk assessment is essential in actual evacuation planning. Significant differences exist in day and night population distribution with daytime populations primarily concentrated in commercial and work areas and nighttime populations concentrated in residential areas. Emergency management departments should develop varied evacuation plans for different periods. Due to potential road damage during disasters, preplan alternative evacuation routes and make real-time dynamic adjustments during evacuations.
References
|
Related Articles
|
Metrics
Select
Assessment of earthquake stress levels and influencing factors for the public
LI Jing, ZHU Jingzheng, SHEN Tong
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1893-1901. DOI: 10.16511/j.cnki.qhdxxb.2024.26.021
Abstract
HTML
PDF
(6664KB) (
31
)
[Objective] Over the years, earthquake accidents have emerged as significant geological disasters, causing substantial and persistent human casualties and property losses that are difficult to avoid. Despite the growing public awareness regarding the importance of acquiring earthquake self-rescue knowledge and skills, practical improvements are not yet evident. This is attributed to the inherent fear in humans in response to intense stimuli from earthquake disasters, leading to severe stress reactions that hinder the execution of self-rescue actions, making the effective application of earthquake self-rescue knowledge challenging. [Methods] This study focuses on assessing and improving stress levels among the public while facilitating the smooth execution of self-rescue actions during real earthquake scenarios. This study involved 16 male and 16 female participants who conducted simulated experiments at an earthquake experience center. The experimental process included baseline measurements, measurements during progressive seismic events of magnitudes 3-7, and measurements during seismic intermissions. Respiratory and electromyographic measurements were used to collect data on the neural and behavioral dimensions of stress during baseline and stress periods. Significance analysis was conducted using the paired samples Wilcoxon signed rank test to time-domain and frequency-domain indicators that exhibited differences. Individual seismic stress level reference values were calculated on the basis of fluctuations in individual data compared with the overall data. Using the
K
-means clustering method, the distribution of different stress response levels was determined. In addition, the paper reviewed previous research findings and constructed the “S-O-R-A” earthquake “stimulus-response” model, which encompasses earthquake risk information perception, risk information understanding, self-rescue decision-making, and self-rescue execution stages. The study identified 13 factors related to seismic stress levels, including basic qualities, emergency knowledge, skills, experience, awareness, personality, emotional stability, visual reactivity, auditory reactivity, attention, memory, thinking, and physical fitness. An experimental plan was established for measurement based on the Jinshuju platform, PsyLAB, and the BCS-400 digital backforce gauge. By correlating seismic stress level reference values with influencing factors using Kendall's Tau-b method and employing multiple linear regression analysis, we ranked the importance of each influencing factor and provided recommendations based on significant factors for improving earthquake stress responses. [Results] Physiological measurements revealed that indicators such as minimum value, mean value, standard deviation, variance, and mean frequency in respiratory signals, as well as maximum value, minimum value, mean value, standard deviation, variance, root mean square, and mean absolute value in electromyographic signals, exhibited differences during the experimental process, indicating their effectiveness as indicators for calculating stress levels. Based on regression analysis results, among the influencing factors, emergency skills (67.9%), earthquake training (58.4%), emotional stability (44.8%), visual reaction power (39.8%), auditory reaction power (39.0%), and memory (30.5%) were the six most significant factors affecting seismic stress levels in the public. Furthermore, to help the public overcome stress responses, the study proposed 12 recommendations, including “establishing popular science channels, innovating educational works, enhancing training participation, promoting practical training, improving memory capacity, strengthening cognitive memory, emphasizing technological empowerment, optimizing visual training, conducting specialized courses, enhancing auditory reaction, prioritizing psychological counseling, and conducting psychological construction,” and presented a mind map. [Conclusions] This study provides researchers with experimental design and calculation methods for assessing seismic stress levels while identifying factors that effectively improve earthquake stress responses and offering recommendations to the public. By effectively improving stress responses, the public can utilize their acquired self-rescue knowledge and skills, thereby enhancing their response to earthquakes.
References
|
Related Articles
|
Metrics
SPECIAL SECTION: MAN-MACHINE SPEECH COMMUNICATION
Select
Real time estimation and tracking method for the direction of arrival of single sound source based on Kalman filtering and frequency focusing
ZHOU Jing, BAO Changchun, DUAN Haiwei
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1902-1910. DOI: 10.16511/j.cnki.qhdxxb.2024.26.043
Abstract
HTML
PDF
(1894KB) (
61
)
[Objective] Estimation of direction of arrival (DOA) is critical in spatial audio coding, speech enhancement, sound field synthesis, and sound source imaging. Commonly used signal model-based DOA estimation methods, such as the multiple signal classification method, can effectively estimate DOA information in noise-free and anechoic scenarios. However, real-world environments always have noise and reverberation, particularly in far-field speech communication scenarios characterized by low signal-to-noise ratios and strong reverberation. Furthermore, the sound source may be in motion. These factors considerably impair the performance of DOA estimation methods based on signal models. To address this issue, this paper introduces a real-time estimation and tracking method for the DOA of a single sound source, using Kalman filtering and frequency focusing. [Methods] The proposed method consists of three procedures: denoising, dereverberation, and DOA estimation. With regard to the denoising procedure, an objective optimization function to minimize the error of the denoised signal is established. This function is solved using a Kalman filter, which leads to obtaining the denoised signal through Kalman gain-based posterior estimation. For the dereverberation procedure, based on the autoregressive coefficients of the late reverberation components, an objective optimization function to minimize the error of the multichannel linear prediction (MCLP) coefficients is established. This function is also solved through another Kalman filter to obtain the MCLP coefficients. The DOA estimation procedure is implemented by using a frequency focusing based steered response power (FF-SRP) method, which can circumvent signal component diffusion within subspace decomposition. In particular, a structure that effectively intertwines these three procedures, enhancing the contribution of denoising and dereverberation results to DOA estimation. In this structure, a propagation matrix is utilized to integrate the denoising and dereverberation procedures, creating a causative iteration between them. Subsequently, a minimum variance distortionless response (MVDR) beamforming method is used to replace the multichannel Wiener filtering method. This is to obtain a prior estimation of the covariance matrix of the target signal. The MVDR beamforming method offers two advantages: it reduces the distortion of the target signal and integrates the DOA estimation procedure with the denoising procedure, thereby promoting a causal and orderly iteration among the three procedures. [Results] Experiments were conducted using a microphone array signal simulator and the TIMIT corpus. The mean absolute error (MAE) of the estimated DOA, along with the DOA track of the moving speaker, served as the evaluation measures. Experimental results revealed several key findings: (1) As RT
60
increased, the MAE of all methods increased, clearly demonstrating that reverberation significantly affects DOA estimation performance. (2) Compared with the reference methods, the proposed method consistently delivered the lowest MAE values under different RT
60
s and SNRs. This suggests that the proposed method has higher accuracy in DOA estimation. (3) In terms of DOA trajectory, the proposed method again outperformed the reference methods by producing the smallest error. This indicates that the proposed method has better performance in DOA tracking. [Conclusions] By integrating denoising, dereverberation, and DOA estimation through a causal and recursive iteration structure, the performance of DOA estimation and tracking can be significantly enhanced. The proposed method effectively mitigates the detrimental impact of noise and reverberation on DOA estimation and tracking accuracy in single sound source scenarios.
References
|
Related Articles
|
Metrics
Select
Effects of language redundancy and prosodic structure on syllable duration in Mandarin Chinese
LIU Xiaowang, HAO Yun, ZHANG Jinsong
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1911-1918. DOI: 10.16511/j.cnki.qhdxxb.2024.26.025
Abstract
HTML
PDF
(2137KB) (
25
)
[Objective] Information theory in phonetics primarily investigates the relationship between language redundancy and acoustic features. Language redundancy refers to the predictability of linguistic information, which arises from lexical, syntactic, and semantic contextual factors. The more predictable the information, the higher its redundancy. Numerous studies suggest that when spoken, linguistic units with higher redundancy tend to be shorter in duration. The smooth signal redundancy hypothesis posits that the influence of language redundancy on duration is modulated by prosodic structures. These structures adjust acoustic features by assigning stress and boundaries to elements with lower language redundancy, thus achieving an inverse relationship between language redundancy and duration. However, these conclusions are predominantly based on Indo-European languages, leaving a research gap for Mandarin Chinese. Moreover, there is a lack of research on the correspondence between linguistic redundancy and prosodic structure. Thus, this study aims to investigate the relationships among language redundancy, prosodic structure, and syllable duration, specifically within the context of Mandarin Chinese. [Methods] This study quantifies language redundancy using the concept of surprisal, a principle derived from information theory. A large-scale textual corpus was used to train a 2-gram Chinese character-level language model, which was used to estimate unigram and bigram surprisal. Additionally, The corpus employed in this study comprises Annotated Speech Corpus of Chinese Discourse (ASCCD). The Chinese Tone and Break Index(C-ToBI) annotation system is employed to represent prosodic structures in terms of boundaries and stress. Concurrently, the duration of each syllable and its corresponding stress and boundary levels were recorded. A linear mixed-effect model was employed to explore the effects of language redundancy factors and prosodic structure on syllable duration. To verify whether language redundancy directly explains changes in syllable duration, prosodic structure factors were initially introduced as control variables in the baseline model. Subsequently, the factors of language redundancy were added. By comparing changes in the model's log-likelihood values, any substantial effects of language redundancy on syllable duration can be identified. [Results] The experimental findings revealed a consistent relationship between language redundancy and syllable duration across different Mandarin speakers. Moreover, a moderate correspondence between language redundancy and prosodic structure was observed. However, different redundancy factors were associated with distinct aspects of the prosodic structure. Based on these experimental results, a correlation existed between forward surprisal and stress levels, whereas backward surprisal correlated with boundary levels. Specifically, higher forward surprisal indicated lower redundancy, leading to more salient syllables during speech production. Conversely, elevated backward surprisal corresponded to higher boundary levels. The successive inclusion of prosodic structure factors and language redundancy factors when examining the effects on Mandarin syllable duration enhanced the model's fit. This indicated that controlling for prosodic structure factors allowed language redundancy factors to independently account for changes in syllable duration. [Conclusions] The experimental results of this study support a weak version of the smooth signal redundancy hypothesis. Prosodic structures are confirmed to modulate language redundancy, whereas language redundancy directly accounts for changes in syllable duration. Given that this study relies on read speech data, it opens up an avenue for future research on spontaneous speech. It will also be beneficial to explore the relationship between different methods of measuring language redundancy and prosodic structure. Moreover, understanding their effect on other acoustic features, such as the fundamental frequency, presents another promising research direction.
References
|
Related Articles
|
Metrics
Select
Based on audio-video evoked auditory attention detection electroencephalogram dataset
ZHANG Hongyu, ZHANG Jingjing, DONG Xingguang, LÜ Zhao, TAO Jianhua, ZHOU Jian, WU Xiaopei, FAN Cunhang
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1919-1926. DOI: 10.16511/j.cnki.qhdxxb.2024.26.024
Abstract
HTML
PDF
(2723KB) (
41
)
[Objective] Deep learning technology is actively explored in auditory attention detection tasks based on electroencephalogram (EEG) signals. However, past research in this area mainly focused on the sensory domain of human hearing, and relatively few studies investigated the effect of vision on auditory attention. In addition, mature public datasets like KUL and DTU are commonly used; however, they contain only EEG data and audio data, while in daily life, people's auditory attention is usually accompanied by visual information. To more comprehensively study people's auditory attention in a combined audio-visual state, this work integrates EEG, audio, and video data to conduct auditory attention detection studies. [Methods] To simulate a real-world perceptual environment, this paper constructs an audio-video EEG dataset to realize an in-depth exploration of auditory attention. The dataset contains two stimulus scenarios: audio-video and audio. In the audio-video stimulus scenario, subjects pay attention to the voice corresponding to the speaker in the video and ignore the voice of the other speaker; that is, subjects receive visual and auditory information input simultaneously. In the audio stimulus scenario, subjects focus on only one of the two speaker voices, i.e., the subjects receive only auditory input. Based on the EEG data of subjects in the above two scenarios, this paper verifies and compares the effectiveness of this dataset through existing methods. [Results] The results show the following: 1) Under various decision windows, the average accuracy of receiving only audio stimuli was significantly higher than that of receiving audio-video stimuli. Under a 2-s decision window, the detection performance of audio-video stimuli and audio stimuli reached only 70.5% and 75.2%, respectively. 2) Through experiments on EEG signals of various frequency bands in the two public datasets and the audio-video EEG datasets constructed in this paper, the detection performance of the gamma frequency band in the DTU dataset and audio-video scenario was better than other bands. In the KUL dataset, the detection performance of the alpha frequency band was higher than that of other bands. In the audio-only scenario, although the average classification accuracy of the 2-s decision window in the alpha frequency band was lower than that in the theta frequency band, it was still higher than that in other bands. [Conclusions] This paper proposes an audio-video EEG dataset that simulates the real scene more closely. Through experiments, it is found that in the audio-video stimulation scenario, the subjects need to process two sensory information simultaneously, which distracts their attention and leads to performance degradation. In addition, EEG signals in the alpha and gamma frequency bands carry important information when performing auditory spatial attention. Compared with the existing public auditory attention detection datasets, the audio-video EEG dataset proposed in this paper introduces video information and simulates the real scene more realistically. This dataset design provides richer modal information for the research and application of the brain-computer interface. This information is helpful for the deep study of auditory attention patterns and neural mechanisms of people under simultaneous stimulation of audio-visual information and has important research and application significance. This paper is expected to promote further research and application in auditory attention.
References
|
Related Articles
|
Metrics
Select
Open-set learning for a robust small-footprint keyword spotting systemwith limited training data
HUANG Zijun, ZHANG Xiaolei
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1927-1935. DOI: 10.16511/j.cnki.qhdxxb.2025.26.004
Abstract
HTML
PDF
(2067KB) (
31
)
[Objective] Keyword spotting (KWS) aims to detect recognizable keywords from speech. Deep neural networks have provided effective solutions for KWS in small-scale applications. However, most KWS methods employ Softmax-based cross-entropy loss, assuming that the test and training samples have identical distributions. These methods focus on maximizing the classification accuracy of the training set, often neglecting unknown speech data outside the training samples. This approach can lead to significant challenges in real-world scenarios where limited training data is available and individuals frequently encounter unfamiliar speech. [Methods] This paper introduces a approach to KWS by exploring open-set learning methods that can accommodate the open vocabulary of KWS tasks. These methods combine deep feature encoders with classifiers based on convolutional prototype learning and reciprocal point learning. For convolutional prototype learning, this paper first replaces the Softmax network with the prototype network to eliminate the closed-world assumption. Subsequently, constructs prototypes for each keyword that represent class-level features in the feature space. This paper uses a distance-based method to represent the similarity between the sample and the keyword for classification, maximizing the likelihood probability of the sample. To effectively reject non-keywords, this paper applies a regularization constraint on the boundary of the prototypes, which improves the robustness of the system. For reciprocal point learning, this paper constructs reciprocal points that represent features not associated with the keyword class. This paper assumes that the probability of a sample belonging to a keyword is proportional to the distance between this point and the reciprocal point, and uses this as a classification criterion. To detect non-keywords, this paper restricts the boundary range of reciprocal points. In addition, this paper explores variants of reciprocal point learning, such as adversarial reciprocal point learning, which uses a more effective distance function and an adequate boundary constraint to further improve system performance. The backbone network used for training the small-footprint KWS systems is ResNet 15. The KWS system developed from these methods not only enhances the classification accuracy but also improves the detection of non-keyword categories. This paper employs classification accuracy (ACC), macro-averaged
F
1
score, and area under the receiver operating characteristic curve (AUC) to measure the performance of the proposed methods. [Results] This paper conducted experiments on Google Speech Command (GSC) datasets V0.01 and V0.02, as well as the LibriWords dataset derived from LibriSpeech, to evaluate the performance of the proposed method. The results showed that the proposed method outperforms the baseline approaches in most evaluation metrics. The proposed method, which was grounded on reciprocal point learning, achieved the best performance in terms of classification ACC. In addition, methods based on generalized convolution prototype learning and adversarial reciprocal point learning equaled or even surpassed the performance of the baseline methods. When detecting non-keywords, the method based on adversarial reciprocal point learning exhibited the best performance on the GSC dataset. As the number of non-keywords in the LibriWords dataset increases, the method employing generalized convolutional prototype loss achieved optimal detection performance. [Conclusions] By introducing generalized convolution prototype learning and reciprocal point learning, this paper significantly improves the performance of the KWS system in open scenarios. The experimental results show that the proposed method significantly outperforms existing approaches on small-footprint systems with limited training data.
References
|
Related Articles
|
Metrics
Select
Semi-supervised speaker verification system based on pre-trained models
LI Yishuang, CHEN Zhicong, MIAO Shiyu, SU Qi, LI Lin, HONG Qingyang
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1936-1943. DOI: 10.16511/j.cnki.qhdxxb.2024.26.048
Abstract
HTML
PDF
(3869KB) (
25
)
[Objective] In the evolving landscape of speaker verification (SV) systems, pre-trained models (PTMs) have become a cornerstone, significantly enhancing system performance through the integration of a speaker classification network and subsequent fine-tuning. Despite the advancements, current research mainly focuses on fine-tuning with labeled datasets, which poses a challenge owing to the necessity for a large amount of annotated data in the target domain. Therefore, this paper proposes a semi-supervised SV system leveraging PTMs, designed to excel under conditions of limited annotated data. [Methods] The proposed semi-supervised SV framework based on PTMs consists of several main steps. Initially, the entire model is fine-tuned using a small amount of labeled data, approximately 100 h, to create a high-performance seed model, referred to as model J. This model serves to extract speaker embeddings from a large unlabeled audio dataset. Using these embeddings, a speaker embedding graph is constructed, which is processed by the Infomap clustering module to generate pseudo-labels for each audio sample based on their clustering category. Next, the original labeled data is combined with the newly pseudo-labeled data for a comprehensive retraining from scratch, resulting in model B and marking the completion of the first iteration. Finally, the parameters of model C are fixed, and model C is then re-established as the seed model. The above steps are iteratively repeated to refine and develop the final SV system, referred to as model F. [Results] The experiments, conducted on the VoxCeleb dataset, demonstrated the efficacy of the PTMs-based system in low-resource scenarios, specifically with 100 h of labeled VoxCeleb2 data. Notably, the semi-supervised framework showcased a remarkable improvement in speaker recognition performance, achieving a relative equal error rate (EER) reduction of 71.2% compared to the baseline SV system. Additionally, the semi-supervised system displayed competitive performance against fully supervised systems across all three VoxCeleb1 test sets, with EERs of 1.25%, 1.29%, and 2.45% on the VoxCeleb1-O/E/H test sets, respectively. Following the second iteration, a significant enhancement in performance could be observed, which, after subsequent iterations, began to converge, ultimately achieving an EER of 1.02% on the VoxCeleb1-O test set. Compared to the baseline system, the EER decreased by 86.8%. [Conclusions] This paper proposes a semi-supervised SV system utilizing PTMs tailored for scenarios with limited resources. By incorporating unlabeled audio data, the system leverages PTMs, the Infomap clustering algorithm, and a pseudo-label correction technique. The experimental results underscore the efficiency of the proposed semi-supervised training framework. Remarkably, even when restricted to merely 100 h of labeled data, the system achieves performance levels comparable to those of traditional fully supervised baseline systems. Furthermore, through multiple rounds of iterative training, a notable improvement in the system performance can be observed.
References
|
Related Articles
|
Metrics
SPECIAL SECTION: WATER ENGINEERING AND GEOTECHNICS
Select
Formation mechanism of deep fractures in near-dam slope of Jinping I Hydropower Station and their influence on long-term slope deformation
WEI Chengyao, ZHUANG Wenyu, WANG Tao, GAO Chenfeng, LIU Yaoru, YANG Qiang
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1944-1954. DOI: 10.16511/j.cnki.qhdxxb.2024.26.044
Abstract
HTML
PDF
(16108KB) (
56
)
[Objective] There have been numerous cases of dam failures caused by slope instability in water conservancy projects domestically and internationally, leading to significant casualties and property losses. Therefore, the stability of the slopes near the river banks is crucial for the operational safety of dams and hydropower stations. The Jinping Dam, features a massive engineering slope on its left bank. For approximately 10 years following the commencement of operations at the Jinping I Hydropower Station, certain areas near the dam on the left bank have exhibited persistent deformation, and the underlying mechanisms are still not fully understood. The presence of deep cracks on the left bank significantly influences the selection of the arch dam axis during the design phase. [Methods] This paper utilizes numerical simulation to analyze the variation characteristics of
in-situ
stress during the evolution of the river valley, aiming to clarify the influencing factors and mechanical mechanisms behind the formation of deep fractures. Additionally, by leveraging monitoring data on slope deformation and conducting a creep analysis of river valley evolution, this paper examines the relationship between deep fractures and the continuous deformation of the slope during operation. [Results] This study found that the formation of deep fractures was closely related to valley incision, the complex geological conditions, and the tectonic stress of the valley slope. Owing to these factors, the unloading depth gradually decreased with a decrease in elevation. Consequently, when the rock mass at higher elevations experienced unloading at greater depths, the rapid release of strain energy occurred during the later stages of valley incision, which led to the unloading failure of the deep rock mass and the formation of deep fractures. Moreover, although the rock mass at lower elevations only experienced surface unloading failure, as the depth increased, the ratios of principal compressive stress and principal tensile stress decreased and the former remained at a high value. This resulted in compressive shear failure and the formation of deep fractures in the deep rock mass. Monitoring data showed that the arch thrust of the dam body hindered deformation in the empty direction, and displacement at high elevations was mainly due to gravity-driven tipping deformation. A comparison of apparent and deep deformation revealed that lamprophyre dikes, faults, and other weak zones were the main factors affecting slope deformation during operation, while the influence of deep fractures was not significant. Additionally, the tectonic stress of the mountain on the left bank had minimal impact on slope deformation. The results of the creep calculations were consistent with the observed deformation patterns of the slope during operation, further confirming that the influence of tectonic stress on slope deformation had dissipated. [Conclusions] In summary, this paper identifies the causes and mechanical mechanisms of deep fractures through finite element calculations and analysis of measured monitoring data, enhancing the engineering understanding of such slope issues. Compared with weak zones such as faults, deep fractures have less influence on the long-term deformation of the slope. The influence of tectonic stress on slope deformation has dissipated, which provides a foundation for further studies on the left bank slope of Jinping I Hydropower Station.
References
|
Related Articles
|
Metrics
Select
Analytical model for tunnel stability and support pressure prediction of viscoelastic-plastic rock masses under the influence of seepage
LIU Qian, WANG Huaning
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1955-1963. DOI: 10.16511/j.cnki.qhdxxb.2024.26.023
Abstract
HTML
PDF
(1826KB) (
29
)
[Objective] With the advancement of traffic infrastructure development in China, an increasing number of tunnels and underground projects are being built through soft rock strata characterized by notable rheological properties in high-water-pressure environments. However, the comprehensive consideration of the stress path and unsteady seepage field during the excavation and support process has not been incorporated in predicting tunnel stability and support pressure in rheological rock masses. To solve the aforementioned problem, an analytical method is proposed to rapidly and accurately predict the time-dependent variation of the surrounding rock stability and support pressure. [Methods] The viscoelastic-plastic constitutive model is employed to simulate the rheological behavior of the surrounding rock. Based on the Mohr-Coulomb yield criterion, an exact analytical solution for stress and displacement is derived for the entire construction process of a deep-buried circular tunnel with lining. This solution incorporates the influence of the stress path and unsteady seepage field, utilizing the principles of elasticity-viscoelasticity correspondence and Laplace transformation. The construction process is categorized into two distinct stages. During the excavation stage (0-
t
1
), the tunnel is rapidly excavated at the onset. This stage is characterized by the absence of any water flow; hence, no additional stress is generated. During this stage, the mechanical field under the release load from the excavation needs to be considered. In the support stage (
t
1
-
t
∞
), the lining is applied when
t
=
t
1
. After the support is subjected to stress, the stress of the surrounding rock reaches a safe state, leading to the complete unloading of the original plastic zone. Long-term mechanical action may cause cracks or damage in the supporting or waterproof layer, eventually leading to water leakage and gushing. In the support stage, the influence of the seepage field needs to be considered when analyzing the mechanical field. The unsteady pore pressure distribution in the entire domain and throughout the entire period can be obtained using the separation of variables method, which enables establishing an analytical model that considers the influence of seepage on the mechanical field of the surrounding rock and lining. Furthermore, the analytical solution for the aging process of the mechanical field in the surrounding rock and the supporting force of the lining is derived, with the effect of the stress path considered correctly. [Results] Given the influence of seepage flow and the loading and unloading processes, the precise analytical solution for stress and displacement in the excavation and support stages of a tunnel was derived. This solution considered the entire construction and operation process of a deep-buried circular tunnel with lining in viscoelastic-plastic surrounding rock. The displacements at
r
=3.0 m and
r
=3.5 m over time were obtained in both the loading and unloading regions, as well as solely under the loading conditions. [Conclusions] The comparison of the results of these two cases shows that the displacement values obtained without considering unloading are considerably smaller. This observation highlights the potential for misjudging the stability of the surrounding rock, as the displacement values may be overestimated. Furthermore, this finding highlights the importance of considering the unrecoverable plastic strain in predicting time-dependent displacement and stress in the unloading zone. The derived analytical solution can serve as a valuable reference for designing support systems and construction processes in practical applications.
References
|
Related Articles
|
Metrics
ELECTRONIC ENGINEERING
Select
Research on risk warning and prevention strategies for main distribution networks with high proportion distributed photovoltaics
LIANG Zhifeng, KANG Chongqing, SUI Lingfeng, YU Ruoying, JIA Yixiong, DU Yunlong, CHEN Wenjin
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1964-1978. DOI: 10.16511/j.cnki.qhdxxb.2024.27.011
Abstract
HTML
PDF
(7925KB) (
32
)
[Objective] Compared with centralized photovoltaics, distributed photovoltaics have smaller individual capacities and dispersed access points, which are mainly connected through the low-voltage power grid towards the end. They have the characteristics of superimposed new energy fluctuations, intermittency, uncertainty. With the increase in the scale of distributed photovoltaic access, there will be a greater risk to the operation of the main distribution network. To ensure the safety and reliability of the system and optimize the resources of the main distribution network, we investigate the risk assessment methods and prevention strategies for the operation of the main distribution network of large-scale distributed photovoltaic access. [Methods] In terms of safety, balance and consumption of the main distribution grid, the problem of extracting risk characteristics and constructing a risk assessment index system for operating the main distribution grid after the high proportion of distributed new energy connected to the grid was evaluated. We proposed a risk feature extraction model based on a random forest and identified important power grid nodes. A large-scale survey based on actual production was conducted on a power grid in Jiangsu region of China, and a risk assessment index system for the coordinated operation of the main distribution network was proposed. To address the operational risks of the main distribution network caused by the high proportion of distributed new energy integration, research is being conducted on the risk prevention and control methods for the main distribution network. We proposed an auxiliary decision-making method for adjusting the main distribution network mode plan in non-emergency situations and established a two-stage risk scheduling model. Considering an emergency scenario where the risk level of the main distribution grid is high after the high proportion of distributed new energy is connected to the grid, a layered and partitioned emergency load-shedding strategy was studied for the main distribution grid. Consequently, a distributed new energy and load-coordinated precise control strategy for the emergency control scenario of the main distribution grid was proposed. We developed a large-scale distributed new energy grid connection risk intelligence analysis and prevention decision-making system based on the regulatory cloud and demonstrated its application in regional power grids. [Results] The proposed indicator system has been verified in a power grid in a certain region of China; this system can comprehensively measure the risks faced during operation and accurately determine the risk level, verifying the effectiveness of the indicator system. The proposed risk prevention and control scheduling for a high proportion of distributed new energy grid connections in the main distribution grid, as well as the layered and partitioned emergency load-shedding strategy, can effectively enhance the risk prevention and control capabilities of distributed new energy grid connection operation. In the decision-making system, the safety, balance and consumption risk assessment module of the main distribution network intelligently analyzes the operation status of conventional power sources, centralized new energy and distributed new energy in the main distribution network, achieving real-time calculation and diagnosis of possible power grid balance and distributed new energy consumption risks in the future. [Conclusions] Through case analysis and demonstration application, we constructed a risk assessment index system for the main and distribution networks by identifying weak links in the power grid. The evaluation indicators at the main grid level mainly reflect the risks caused by distributed photovoltaics to power supply, grid safety and new energy consumption. Meanwhile, the evaluation indicators at the distribution grid level mainly reflect the impact of distributed photovoltaics on node voltage and equipment safety. Moreover, it can reasonably quantify and effectively characterize the operational risks of the grid caused by large-scale distributed photovoltaic grid connections. Additionally, the constructed plan adjustment and scheduling model for the main distribution grid considering multiple types of risks in a high proportion of distributed new energy operations can effectively reduce the comprehensive operational risk value of the main distribution grid and further improve operational reliability.
References
|
Related Articles
|
Metrics
Select
Improved geometric positioning method with constellation configuration screening
YANG Xinyu, ZENG Xiangyuan, DU Huajun, LIU Tianci, YANG Haoan, LI Jie
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1979-1986. DOI: 10.16511/j.cnki.qhdxxb.2024.26.049
Abstract
HTML
PDF
(4795KB) (
34
)
[Objective] As the number of low earth orbit(LEO) satellites increases, the applicability of optical navigation technology, which uses these satellites as optical information sources, is continually improving. When three or more satellites are simultaneously observed within the optical field of view, the perspective-
n
-point (P
n
P) positioning method can be used for pose estimation and positioning. The P
n
P problem was initially used primarily for camera calibration. Currently the applications of the P
n
P problem have expanded to various engineering tasks, including simultaneous visual localization and mapping, spatial noncooperative target pose estimation, and the critical stages of rendezvous and docking. However, the P
n
P algorithm struggles with low positioning accuracy over long distances. Therefore, it is necessary to study the spatial geometric positioning problem when observing satellites at relative distances exceeding hundreds of kilometers. [Methods] This study begins by selecting positioning data sources and statistically investigating the impact of constellation configuration within the optical field of view on positioning errors. The relationships between the constellation area and positioning errors, the geometric angle and positioning errors, and the relationship between the orbit height distribution and positioning errors are analyzed. Through this analysis, the primary influencing indicators are identified as the area, angle, and distance indicators. The distance indicator, in particular, represents the three-dimensional information of the configuration, which cannot be characterized by position dilution of precision (PDOP). To unify the measurement space of each indicator, the indicators are normalized, and the entropy weight method is used to calculate the weight of each indicator. An evaluation function for the constellation configuration is established to assess the configuration availability. The availability distribution is statistically analyzed to determine the evaluation criteria. Finally, using the calculated availability, configurations that are less affected by image noise are selected for the P
n
P pose calculation. In addition, the difference between the PDOP and positioning error is presented, and the P
n
P pose is estimated after the configuration is evaluated. [Results] Taking the P3P problem as an example, the positioning error was smaller when the distribution area of the three satellites was larger and more dispersed. According to the proposed screening method, positioning accuracy was improved by more than 50% compared with the situation without screening. Additionally, the configuration positioning accuracy was improved by approximately 37% compared with that of the PDOP-optimized configuration. [Conclusions] Constellation satellites enhance space navigation information sources. Configuration screening effectively improves the accuracy of P
n
P geometric positioning over long distances, thereby introducing a new concept for long-distance optical navigation.
References
|
Related Articles
|
Metrics
HYDRAULIC ENGINEERING
Select
Fully coupled CFD-DEM model for hydraulic transport of dense particles and its application in inclined pipe
ZHOU You, CHEN Gengfa, CHEN Minghong, CHEN Xin, HE Xi
Journal of Tsinghua University(Science and Technology). 2024,
64
(11): 1987-1996. DOI: 10.16511/j.cnki.qhdxxb.2024.21.014
Abstract
HTML
PDF
(6308KB) (
30
)
[Objective] This study proposed a fully coupled computational fluid dynamics-discrete element method (CFD-DEM) model based on a diffusion averaging algorithm for the hydraulic transport of dense particles, integrally considering particle--liquid interphase force and complex particle-turbulence interaction. The proposed model overcame the limitation that fluid mesh needs to be several times the size of the particles in the traditional CFD-DEM model. Moreover, experimental and numerical studies were conducted mainly on horizontal and vertical pipes, and few were conducted on inclined pipes. [Methods] Calculation of particle volume fraction was divided into two steps. First, each particle was randomly and uniformly divided into several feature points, and the initial value of the particle volume fraction was calculated based on the number of feature points occupied in each mesh. Subsequently, a diffusion-based averaging method was employed to solve the particle volume fraction with the initial field and no-flux condition on all physical boundaries in the computational domain. Furthermore, the source terms were added to the k-ε turbulence model to account for the modulation of the turbulence from particles, and the discrete random walk model was used to calculate the stochastic effect of turbulence on particle motion. A drag force considering porosity modification was applied to the two-phase flow through densely packed particle beds. Other particle-liquid forces and particle torques caused by the fluid were also included in the model. The fully coupled CFD-DEM model predicted the hydraulic conveying of dense particles in the pipeline system well. Moreover, this model was used to investigate the effects of pipe inclination on the hydraulic transport of coarse particles (2 mm), including the effects on the spatial distribution of particles, axial velocity of each phase, fluid turbulent kinetic energy, and pressure drop. [Results] The results are summarized as follows: 1) The spatial distribution of particles gradually transformed from a relatively densely packed distribution at the bottom of the horizontal pipe to a nearly uniform distribution in the vertical pipe with increasing inclination angle. The distributions of axial liquid velocity and turbulent kinetic energy along the vertical direction were gradually asymmetric and then returned to symmetry, reaching the maximum degree of asymmetry at 60°. 2) In the inclined pipes, the axial velocity of particles was lower and higher at the bottom and top of the pipe, respectively. Meanwhile, the axial velocity of the particles in the vertical pipe was parabolically distributed, with higher velocity at the center of the pipe and lower velocity near the wall. 3) The number of collisions between particles and between particles and walls increased slightly and then decreased rapidly with increasing inclination angle. 4) Moreover, pressure drop in the two-phase flow initially increased and then decreased with increasing inclination angle, reaching the maximum at 60°. [Conclusions] This study demonstrates that the inclination angle significantly affects the distributions of particles, the number of collisions between particles and between particles and walls, liquid turbulent kinetic energy, and pressure drop. A small or large inclined angle is suggested for the hydraulic transport of particles, and a 60° inclined pipe should be avoided to reduce energy consumption.
References
|
Related Articles
|
Metrics
News
More
»
aaa
2024-12-26
»
2023年度优秀论文、优秀审稿人、优秀组稿人评选结果
2023-12-12
»
2022年度优秀论文、优秀审稿人、优秀组稿人评选结果
2022-12-20
»
2020年度优秀论文、优秀审稿人评选结果
2021-12-01
»
aa
2020-11-03
»
2020年度优秀论文、优秀审稿人评选结果
2020-10-28
»
第十六届“清华大学—横山亮次优秀论文奖”暨2019年度“清华之友—日立化成学术交流奖”颁奖仪式
2020-01-17
»
a
2019-01-09
»
a
2018-12-28
»
a
2018-01-19
Links
More
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd