在基于脑电信号的听觉注意检测任务中, 运用较成熟的公开数据集仅包含脑电信号和音频数据, 均缺少对视觉信息的关注。为了模拟真实世界的感知环境, 该文引入一个创新性的脑电数据集, 其中包含同时提供音视频刺激及仅有音频作为刺激的情境, 并通过现有方法验证了该数据集的有效性。研究结果表明: 不同频段的脑电信号对听觉注意的选择产生了差异影响, 特别是Alpha和Gamma频段, 在大脑处理听觉注意时发挥重要作用。与现有的公开听觉注意检测数据集相比, 该文提出的音视频数据集引入了视频信息, 更真实地模拟了日常场景。这种数据集设计为脑机接口的研究和应用提供了更丰富的模态信息, 具有重要的研究和应用意义。
Abstract
[Objective] Deep learning technology is actively explored in auditory attention detection tasks based on electroencephalogram (EEG) signals. However, past research in this area mainly focused on the sensory domain of human hearing, and relatively few studies investigated the effect of vision on auditory attention. In addition, mature public datasets like KUL and DTU are commonly used; however, they contain only EEG data and audio data, while in daily life, people's auditory attention is usually accompanied by visual information. To more comprehensively study people's auditory attention in a combined audio-visual state, this work integrates EEG, audio, and video data to conduct auditory attention detection studies. [Methods] To simulate a real-world perceptual environment, this paper constructs an audio-video EEG dataset to realize an in-depth exploration of auditory attention. The dataset contains two stimulus scenarios: audio-video and audio. In the audio-video stimulus scenario, subjects pay attention to the voice corresponding to the speaker in the video and ignore the voice of the other speaker; that is, subjects receive visual and auditory information input simultaneously. In the audio stimulus scenario, subjects focus on only one of the two speaker voices, i.e., the subjects receive only auditory input. Based on the EEG data of subjects in the above two scenarios, this paper verifies and compares the effectiveness of this dataset through existing methods. [Results] The results show the following: 1) Under various decision windows, the average accuracy of receiving only audio stimuli was significantly higher than that of receiving audio-video stimuli. Under a 2-s decision window, the detection performance of audio-video stimuli and audio stimuli reached only 70.5% and 75.2%, respectively. 2) Through experiments on EEG signals of various frequency bands in the two public datasets and the audio-video EEG datasets constructed in this paper, the detection performance of the gamma frequency band in the DTU dataset and audio-video scenario was better than other bands. In the KUL dataset, the detection performance of the alpha frequency band was higher than that of other bands. In the audio-only scenario, although the average classification accuracy of the 2-s decision window in the alpha frequency band was lower than that in the theta frequency band, it was still higher than that in other bands. [Conclusions] This paper proposes an audio-video EEG dataset that simulates the real scene more closely. Through experiments, it is found that in the audio-video stimulation scenario, the subjects need to process two sensory information simultaneously, which distracts their attention and leads to performance degradation. In addition, EEG signals in the alpha and gamma frequency bands carry important information when performing auditory spatial attention. Compared with the existing public auditory attention detection datasets, the audio-video EEG dataset proposed in this paper introduces video information and simulates the real scene more realistically. This dataset design provides richer modal information for the research and application of the brain-computer interface. This information is helpful for the deep study of auditory attention patterns and neural mechanisms of people under simultaneous stimulation of audio-visual information and has important research and application significance. This paper is expected to promote further research and application in auditory attention.
关键词
视听诱发 /
脑机接口 /
听觉注意检测 /
脑电图
Key words
audio-video evoked /
brain-computer interfaces /
auditory attention detection /
electroencephalogram
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] CHERRY E C. Some experiments on the recognition of speech, with one and with two ears [J]. The Journal of the Acoustical Society of America, 1953, 25(5): 975-979.
[2] 黄雅婷, 石晶, 许家铭, 等. 鸡尾酒会问题与相关听觉模型的研究现状与展望 [J]. 自动化学报, 2019, 45(2): 234-251. HUANG Y T, SHI J, XU J M, et al. Research advances and perspectives on the cocktail party problem and related auditory models [J]. Acta Automatica Sinica, 2019, 45(2): 234-251. (in Chinese)
[3] CICCARELLI G, NOLAN M, PERRICONE J, et al. Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods [J]. Scientific Reports, 2019, 9(1): 11538.
[4] PUFFAY C, ACCOU B, BOLLENS L, et al. Relating EEG to continuous speech using deep neural networks: A review [J]. Journal of Neural Engineering, 2023, 20(4): 041003.
[5] GEIRNAERT S, VANDECAPPELLE S, ALICKOVIC E, et al. Electroencephalography-based auditory attention decoding: Toward neurosteered hearing devices [J]. IEEE Signal Processing Magazine, 2021, 38(4): 89-102.
[6] 陈小刚, 陈菁菁, 刘冰川, 等. 基于脑电的脑机接口技术在医学领域中的应用 [J]. 人工智能, 2021 (6): 6-14. CHEN X G, CHEN J J, LIU B C, et al. Application of brain-computer interface technology based on EEG in medical field [J]. Artificial Intelligence VIEW, 2021 (6): 6-14. (in Chinese)
[7] MESGARANI N, CHANG E F. Selective cortical representation of attended speaker in multi-talker speech perception [J]. Nature, 2012, 485(7397): 233-236.
[8] DING N, SIMON J Z. Neural coding of continuous speech in auditory cortex during monaural and dichotic listening [J]. Journal of Neurophysiology, 2012, 107(1): 78-89.
[9] O'SULLIVAN J A, POWER A J, MESGARANI N, et al. Attentional selection in a cocktail party environment can be decoded from single-trial EEG [J]. Cerebral Cortex, 2015, 25(7): 1697-1706.
[10] KURUVILA I, MUNCKE J, FISCHER E, et al. Extracting the auditory attention in a dual-speaker scenario from EEG using a joint CNN-LSTM model [J]. Frontiers in Physiology, 2021, 12: 700655.
[11] SU E Z, CAI S Q, XIE L H, et al. STAnet: A spatiotemporal attention network for decoding auditory spatial attention from EEG [J]. IEEE Transactions on Biomedical Engineering, 2022, 69(7): 2233-2242.
[12] FAGHIHI F, CAI S Q, MOUSTAFA A A. A neuroscience-inspired spiking neural network for EEG-based auditory spatial attention detection [J]. Neural Networks, 2022, 152: 555-565.
[13] CAI S Q, SU P C, SCHULTZ T, et al. Low-latency auditory spatial attention detection based on spectro-spatial features from EEG [C]//Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society. Piscataway, USA: IEEE Press, 2021: 5812-5815.
[14] DAS N, FRANCART T, BERTRAND A. Auditory attention detection dataset KULeuven (1.1.0) [DB/OL]. (2019-08-30) [2023-12-21]. https://doi.org/10.5281/zenodo.3997352.
[15] FUGLSANG S A, WONG D D E, HJORTKJæR J. EEG and audio dataset for auditory attention decoding (version 1) [DB/OL]. (2018-03-15) [2023-12-21]. https://doi.org/10.5281/zenodo.1199011.
[16] WONG D D E, FUGLSANG S A, HJORTKJæR J, et al. A comparison of regularization methods in forward and backward models for auditory attention decoding [J]. Frontiers in Neuroscience, 2018, 12: 531.
[17] JIANG Y F, CHEN N, JIN J. Detecting the locus of auditory attention based on the spectro-spatial-temporal analysis of EEG [J]. Journal of Neural Engineering, 2022, 19(5): 056035.
基金
科技创新2030(2021ZD0201500);国家自然科学基金资助项目(62201002,61972437);安徽省杰出青年基金资助项目(2208085J05)