Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2020, Vol. 60 Issue (5) : 430-439     DOI: 10.16511/j.cnki.qhdxxb.2020.21.003
SPECIAL SECTION:COMPUTATIONAL LINGUISTICS |
Deep learning multi-language topic alignment model across domains
YU Chuanming1, YUAN Sai2, HU Shasha1, AN Lu3
1. School of Information and Safety Engineering, Zhongnan University of Economics and Law, Wuhan 430073, China;
2. School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430073, China;
3. School of Information Management, Wuhan University, Wuhan 430072, China
Download: PDF(1105 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Deep representation learning of domain topics was used to build a topic alignment model (TAM) with integrated bilingual word embedding. The semantic alignment lexicon was extended to include bilingual word embedding. A traditional bilingual topic model was used to develop an auxiliary distribution to improve the word distribution semantic sharing to improve the topic alignments in the cross-lingual and cross-domain contexts. A bilingual topic similarity (BTS) indicator and a bilingual alignment similarity (BAS) indicator were developed to evaluate the supplementary alignment. The bilingual alignment similarity improved the cross-language topic matching by about 1.5% compared to a traditional multi-language common cultural theme analysis and improved F1 by about 10% for cross-domain topic alignment. These results can improve cross language and cross domain information processing.
Keywords cross-lingual topic alignment      cross-domain topic alignment      deep learning      bilingual word embedding      knowledge alignment     
Issue Date: 26 April 2020
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
YU Chuanming
YUAN Sai
HU Shasha
AN Lu
Cite this article:   
YU Chuanming,YUAN Sai,HU Shasha, et al. Deep learning multi-language topic alignment model across domains[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 430-439.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2020.21.003     OR     http://jst.tsinghuajournals.com/EN/Y2020/V60/I5/430
  
  
  
  
  
  
  
  
  
[1] PAPADIMITRIOU C H, RAGHAVAN P, TAMAKI H, et al. Latent semantic indexing:A probabilistic analysis[J]. Journal of Computer and System Sciences, 2000, 61(2):217-235.
[2] 夏青, 严馨, 余正涛, 等. 融合要素及主题的汉越双语新闻话题分析[J]. 计算机工程, 2016, 42(9):186-191.XIA Q, YAN X, YU Z T, et al. Analysis of sino-Vietnamese bilingual news topics mixing elements and themes[J]. Computer Engineering, 2016, 42(9):186-191. (in Chinese)
[3] 唐莫鸣, 朱明玮, 余正涛, 等. 基于双语主题和因子图模型的汉语-越南语双语事件关联分析[J]. 中文信息学报, 2017, 31(6):125-131, 139.TANG M M, ZHU M W, YU Z T, et al. Chinese-Vietnamese bilingual event correlation analysis based on bilingual topic and factor graph[J]. Journal of Chinese Information Processing, 2017, 31(6):125-131, 139. (in Chinese)
[4] 司莉, 陈雨雪, 曾粤亮. 基于多语言本体的中英跨语言信息检索模型及实现[J]. 图书情报工作, 2017, 61(1):100-108.SI L, CHEN Y X, ZENG Y L. A study on cross-language information retrieval model based on multilingual ontology[J]. Library and Information Service, 2017, 61(1):100-108. (in Chinese)
[5] 余传明, 冯博琳, 田鑫, 等. 基于深度表示学习的多语言文本情感分析[J]. 山东大学学报(理学版), 2018, 53(3):13-23.YU C M, FENG B L, TIAN X, et al. Deep representative learning based sentiment analysis in the cross-lingual environment[J]. Journal of Shandong University (Natural Science), 2018, 53(3):13-23. (in Chinese)
[6] 许海云, 董坤, 刘春江, 等. 文本主题识别关键技术研究综述[J]. 情报科学, 2017, 35(1):153-160.XU H Y, DONG K, LIU C J, et al. A review on topic identification of scientific text files[J]. Information Science, 2017, 35(1):153-160. (in Chinese)
[7] 余传明, 安璐. 从小数据到大数据——观点检索面临的三个挑战[J]. 情报理论与实践, 2016, 39(2):13-19.YU C M, AN L. From small data to big data:Three challenges for opinion retrieval[J]. Information Studies (Theory & Application), 2016, 39(2):13-19. (in Chinese)
[8] WEI X, CROFT W B. LDA-based document models for ad-hoc retrieval[C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Seattle, USA:ACM Press, 2006:178-185.
[9] LI S H, CHUA T S, ZHU J, et al. Generative topic embedding:A continuous representation of documents[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Berlin, Germany:Association for Computational Linguistics, 2016:666-675.
[10] LIU Y, LIU Z Y, CHUA T S, et al. Topical word embeddings[EB/OL].[2018-02-19]. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/viewPaper/9314.
[11] ZHANG H, ZHONG G Q. Improving short text classification by learning vector representations of both words and hidden topics[J]. Knowledge-Based Systems, 2016, 102:76-86.
[12] MOODY C E. Mixing dirichlet topic models and word embeddings to make lda2vec[EB/OL].[2018-05-06]. https://arxiv.org/abs/1605.02019.
[13] LI D Y, LI Y, WANG S G. Topic enhanced word vectors for documents representation[M]//CHENG X, MA W, LIU H, et al. Social Media Processing. SMP 2017. Singapore:Springer, 2017:166-177.
[14] 杨奇奇. 基于多主题空间的跨领域文本分类方法研究[D]. 合肥:合肥工业大学, 2017.YANG Q Q. Research on cross-domain text classification based on multi-topic spaces[D]. Hefei:Hefei University of Technology, 2017. (in Chinese)
[15] WU T X, ZHANG L, QI G L, et al. Encoding category correlations into bilingual topic modeling for cross-lingual taxonomy alignment[M]//D'AMATO C. The Semantic Web-ISWC 2017. ISWC 2017. Cham:Springer, 2017:728-744.
[16] TAMURA A, SUMITA E. Bilingual segmented topic model[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Berlin, Germany:Association for Computational Linguistics, 2016:1266-1276.
[17] ZHANG D, MEI Q Z, ZHAI C X. Cross-lingual latent topic extraction[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden:Association for Computational Linguistics, 2010:1128-1137.
[18] ZHANG T, LIU K, ZHAO J. Cross lingual entity linking with bilingual topic model[EB/OL].[2013-06-30]. https://www.aaai.org/ocs/index.php/IJCAI/IJCAI13/paper/viewPaper/6268.
[19] WU T X, QI G L, WANG H F, et al. Cross-lingual taxonomy alignment with bilingual biterm topic model[EB/OL].[2018-06-21]. https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/viewPaper/12011.
[20] HEYMAN G, VULIĆ I, MOENS M F. C-BiLDA extracting cross-lingual topics from non-parallel texts by distinguishing shared from unshared content[J]. Data Mining and Knowledge Discovery, 2016, 30(5):1299-1323.
[21] SAKATA Y, EGUCHI K. Cross-lingual link prediction using multimodal relational topic models[C]//2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS). Okayama, Japan:IEEE, 2016:1-8.
[22] LI L H, JIN X M, LONG M S. Topic correlation analysis for cross-domain text classification[C]//Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence. Toronto, Ontario, Canada:AAAI Press, 2012.
[23] YANG P, GAO W, TAN Q, et al. A link-bridged topic model for cross-domain document classification[J]. Information Processing & Management, 2013, 49(6):1181-1193.
[24] 杨奇奇, 张玉红, 胡学钢. 一种基于多桥映射的跨领域文本分类方法[J]. 计算机应用研究, 2018, 35(4):996-1000.YANG Q Q, ZHANG Y H, HU X G. Cross-domain text classification approach based on multi-bridge mapping[J]. Application Research of Computers, 2018, 35(4):996-1000. (in Chinese)
[25] ARTETXE M, LABAKA G, AGIRRE E. Learning bilingual word embeddings with (almost) no bilingual data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Vancouver, Canada:Association for Computational Linguistics, 2017:451-462.
[26] SHI B, LAM W, BING L D, et al. Detecting common discussion topics across culture from news reader comments[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Berlin, Germany:Association for Computational Linguistics, 2016:676-685.
[1] HUANG Ben, KANG Fei, TANG Yu. A real-time detection method for concrete dam cracks based on an object detection algorithm[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1078-1086.
[2] MIAO Xupeng, ZHANG Minxu, SHAO Yingxia, CUI Bin. PS-Hybrid: Hybrid communication framework for large recommendation model training[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1417-1425.
[3] MEI Jie, LI Qingbin, CHEN Wenfu, WU Kun, TAN Yaosheng, LIU Chunfeng, WANG Dongmin, HU Yu. Overtime warning of concrete pouring interval based on object detection model[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(7): 688-693.
[4] GUAN Zhibin, WANG Xiaomeng, XIN Wei, WANG Jiajie. Data generation and annotation method for source code defect detection[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(11): 1240-1245.
[5] HAN Kun, PAN Haiwei, ZHANG Wei, BIAN Xiaofei, CHEN Chunling, HE Shuning. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(8): 664-671,682.
[6] WANG Zhiguo, ZHANG Yujin. Anomaly detection in surveillance videos: A survey[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(6): 518-529.
[7] JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao. Hybrid computational strategy for deep learning based on AVX2[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 408-414.
[8] SONG Xinrui, ZHANG Xianqi, ZHANG Zhan, CHEN Xinhao, LIU Hongwei. Multi-sensor data fusion for complex human activity recognition[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(10): 814-821.
[9] ZHANG Sicong, XIE Xiaoyao, XU Yang. Intrusion detection method based on a deep convolutional neural network[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(1): 44-52.
[10] LU Xiaofeng, JIANG Fangshuo, ZHOU Xiao, CUI Baojiang, YI Shengwei, SHA Jing. API based sequence and statistical features in a combined malware detection architecture[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 500-508.
[11] ZHANG Xinyu, GAO Hongbo, ZHAO Jianhui, ZHOU Mo. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 438-444.
[12] ZOU Quanchen, ZHANG Tao, WU Runpu, MA Jinxin, LI Meicong, CHEN Chen, HOU Changyu. From automation to intelligence: Survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(12): 1079-1094.
[13] ZHANG Min, DING Biyuan, MA Weizhi, TAN Yunzhi, LIU Yiqun, MA Shaoping. Hybrid recommendation approach enhanced by deep learning[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1014-1021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd